From angel at miami.edu Mon Dec 1 10:25:34 2003
From: angel at miami.edu (Angel Li)
Date: Mon, 01 Dec 2003 13:25:34 -0500
Subject: [Rocks-Discuss]cluster-fork
Message-ID: <3FCB879E.8050905@miami.edu>

Hi,

I recently installed Rocks 3.0 on a Linux cluster and when I run the
command "cluster-fork" I get this error:

apple* cluster-fork ls
Traceback (innermost last):
  File "/opt/rocks/sbin/cluster-fork", line 88, in ?
    import rocks.pssh
  File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
    import gmon.encoder
ImportError: Bad magic number in
/usr/lib/python1.5/site-packages/gmon/encoder.pyc

Any thoughts? I'm also wondering where to find the python sources for
files in /usr/lib/python1.5/site-packages/gmon.

Thanks,

Angel



From jghobrial at uh.edu Mon Dec 1 11:35:06 2003
From: jghobrial at uh.edu (Joseph)
Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <3FCB879E.8050905@miami.edu>
References: <3FCB879E.8050905@miami.edu>
Message-ID: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>

On Mon, 1 Dec 2003, Angel Li wrote:
Hello Angel, I have the same problem and so far there is no response when
I posted about this a month ago.

Is your frontend an AMD setup??

I am thinking this is an AMD problem.

Thanks,
Joseph


>   Hi,
>
>   I recently installed Rocks 3.0 on a Linux cluster and when I run the
>   command "cluster-fork" I get this error:
>
>   apple* cluster-fork ls
>   Traceback (innermost last):
>     File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>       import rocks.pssh
>     File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>       import gmon.encoder
>   ImportError: Bad magic number in
>   /usr/lib/python1.5/site-packages/gmon/encoder.pyc
>
>   Any thoughts? I'm also wondering where to find the python sources for
>   files in /usr/lib/python1.5/site-packages/gmon.
>
>   Thanks,
>
>   Angel
>


From tim.carlson at pnl.gov Mon Dec 1 14:58:54 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 01 Dec 2003 14:58:54 -0800 (PST)
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <76AC0F5E-2025-11D8-804D-000393A4725A@sdsc.edu>
Message-ID: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>

Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get the
following error in /var/log/httpd/error_log


Traceback (innermost last):
  File "/opt/rocks/sbin/kgen", line 530, in ?
    app.run()
  File "/opt/rocks/sbin/kgen", line 497, in run
    doc = FromXmlStream(file)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
386, in FromXmlStream
    return reader.fromStream(stream, ownerDocument)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
372, in fromStream
    self.parser.parse(s)
  File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 58,
in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line 125,
in parse
    self.close()
  File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
154, in close
    self.feed("", isFinal = 1)
  File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
148, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line
340, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found


Doing a wget of http://frontend-0/install/kickstart.cgi?
arch=i386&np=2&project=rocks
on one of the working internal nodes yields the same error.

Any thoughts on this?
I've also done a fresh
rocks-dist dist

Tim



From sjenks at uci.edu Mon Dec 1 15:35:54 2003
From: sjenks at uci.edu (Stephen Jenks)
Date: Mon, 1 Dec 2003 15:35:54 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
Message-ID: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>

FYI, I have a dual Athlon frontend and didn't have that problem. I know
that doesn't exactly help you, but at least it doesn't fail on all AMD
machines.

It looks like the .pyc file might be corrupt in your installation. The
source .py file (encoder.py) is in the
/usr/lib/python1.5/site-packages/gmon directory, so perhaps removing
the .pyc file would regenerate it (if you run cluster-fork as root?)

The md5sum for encoder.pyc on my system is:
459c78750fe6e065e9ed464ab23ab73d encoder.pyc
So you can check if yours is different.

Steve Jenks


On Dec 1, 2003, at 11:35 AM, Joseph wrote:

> On Mon, 1 Dec 2003, Angel Li wrote:
> Hello Angel, I have the same problem and so far there is no response
> when
> I posted about this a month ago.
>
> Is your frontend an AMD setup??
>
> I am thinking this is an AMD problem.
>
> Thanks,
> Joseph
>
>
>> Hi,
>>
>> I recently installed Rocks 3.0 on a Linux cluster and when I run the
>> command "cluster-fork" I get this error:
>>
>> apple* cluster-fork ls
>> Traceback (innermost last):
>>   File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>>     import rocks.pssh
>>   File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>>     import gmon.encoder
>> ImportError: Bad magic number in
>>   /usr/lib/python1.5/site-packages/gmon/encoder.pyc
>>
>>   Any thoughts? I'm also wondering where to find the python sources for
>>   files in /usr/lib/python1.5/site-packages/gmon.
>>
>>   Thanks,
>>
>>   Angel
>>



From mjk at sdsc.edu Mon Dec 1 19:03:16 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 1 Dec 2003 19:03:16 -0800
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov>
Message-ID: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu>

You'll need to run the kpp and kgen steps (what kickstart.cgi does for
your) manually to find if this is an XML error.

       # cd /home/install/profiles/current
       # kpp compute

This will generate a kickstart file for a compute nodes, although some
information will be missing since it isn't specific to a node (not like
what ./kickstart.cgi --client=node-name generates). But what this does
do is traverse the XML graph and build a monolithic XML kickstart
profile. If this step works you can then "|" pipe the output into kgen
to convert the XML to kickstart syntax. Something in this procedure
should fail and point to the error.

       -mjk

On Dec 1, 2003, at 2:58 PM, Tim Carlson wrote:

>   Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get
>   the
>   following error in /var/log/httpd/error_log
>
>
>   Traceback (innermost last):
>     File "/opt/rocks/sbin/kgen", line 530, in ?
>        app.run()
>     File "/opt/rocks/sbin/kgen", line 497, in run
>        doc = FromXmlStream(file)
>     File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
>   line
>   386, in FromXmlStream
>        return reader.fromStream(stream, ownerDocument)
>     File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
>   line
>   372, in fromStream
>        self.parser.parse(s)
>     File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
>   58,
>   in parse
>        xmlreader.IncrementalParser.parse(self, source)
>     File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line
>   125,
>   in parse
>        self.close()
>     File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
>   154, in close
>        self.feed("", isFinal = 1)
>     File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line
>   148, in feed
>        self._err_handler.fatalError(exc)
>     File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py",
>   line
>   340, in fatalError
>        raise exception
>   xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found
>
>
>   Doing a wget of
>   http://frontend-0/install/kickstart.cgi?
>   arch=i386&np=2&project=rocks
>   on one of the working internal nodes yields the same error.
>
>   Any thoughts on this?
>
>   I've also done a fresh
>   rocks-dist dist
>
>   Tim



From tim.carlson at pnl.gov Mon Dec 1 20:42:51 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 01 Dec 2003 20:42:51 -0800 (PST)
Subject: [Rocks-Discuss]odd kickstart problem
In-Reply-To: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu>
Message-ID: <Pine.GSO.4.44.0312012040250.3148-100000@paradox.emsl.pnl.gov>

On Mon, 1 Dec 2003, Mason J. Katz wrote:

> You'll need to run the kpp and kgen steps (what kickstart.cgi does for
> your) manually to find if this is an XML error.
>
>     # cd /home/install/profiles/current
>     # kpp compute

That was the trick. This sent me down the correct path. I had uninstalled
SGE on the frontend (I was having problems with SGE and wanted to start
from scratch)

Adding the 2 SGE XML files back to /home/install/profiles/2.3.2/nodes/
fixed everything

Thanks!

Tim
From landman at scalableinformatics.com Tue Dec 2 04:15:07 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 02 Dec 2003 07:15:07 -0500
Subject: [Rocks-Discuss]supermicro based MB's
Message-ID: <3FCC824B.5060406@scalableinformatics.com>

Folks:

  Working on integrating a Supermicro MB based cluster. Discovered early
on that all of the compute nodes have an Intel based NIC that RedHat
doesn't know anything about (any version of RH). Some of the
administrative nodes have other similar issues. I am seeing simply a
suprising number of mis/un detected hardware across the collection of MBs.

  Anyone have advice on where to get modules/module source for Redhat
for these things? It looks like I will need to rebuild the boot CD,
though the several times I have tried this previously have failed to
produce a working/bootable system. It looks like new modules need to be
created/inserted into the boot process (head node and cluster nodes)
kernels, as well as into the installable kernels.

   Has anyone done this for a Supermicro MB based system?   Thanks .

Joe

--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615




From jghobrial at uh.edu Tue Dec 2 08:28:08 2003
From: jghobrial at uh.edu (Joseph)
Date: Tue, 2 Dec 2003 10:28:08 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
 <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
Message-ID: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>

Indeed my md5sum is different for encoder.pyc. However, when I pulled the
file and run "cluster-fork" python responds about an import problem. So it
seems that regeneration did not occur. Is there a flag I need to pass?

I have also tried to figure out what package provides encoder and
reinstall the package, but an rpm query reveals nothing.

If this is a generated file, what generates it?

It seems that an rpm file query on ganglia show that files in the
directory belong to the package, but encoder.pyc does not.

Thanks,
Joseph



On Mon, 1 Dec 2003, Stephen Jenks wrote:
> FYI, I have a dual Athlon frontend and didn't have that problem. I know
> that doesn't exactly help you, but at least it doesn't fail on all AMD
> machines.
>
> It looks like the .pyc file might be corrupt in your installation. The
> source .py file (encoder.py) is in the
> /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing
> the .pyc file would regenerate it (if you run cluster-fork as root?)
>
> The md5sum for encoder.pyc on my system is:
> 459c78750fe6e065e9ed464ab23ab73d encoder.pyc
> So you can check if yours is different.
>
> Steve Jenks
>
>
> On Dec 1, 2003, at 11:35 AM, Joseph wrote:
>
> > On Mon, 1 Dec 2003, Angel Li wrote:
> > Hello Angel, I have the same problem and so far there is no response
> > when
> > I posted about this a month ago.
> >
> > Is your frontend an AMD setup??
> >
> > I am thinking this is an AMD problem.
> >
> > Thanks,
> > Joseph
> >
> >
> >> Hi,
> >>
> >> I recently installed Rocks 3.0 on a Linux cluster and when I run the
> >> command "cluster-fork" I get this error:
> >>
> >> apple* cluster-fork ls
> >> Traceback (innermost last):
> >>   File "/opt/rocks/sbin/cluster-fork", line 88, in ?
> >>     import rocks.pssh
> >>   File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
> >>     import gmon.encoder
> >> ImportError: Bad magic number in
> >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc
> >>
> >> Any thoughts? I'm also wondering where to find the python sources for
> >> files in /usr/lib/python1.5/site-packages/gmon.
> >>
> >> Thanks,
> >>
> >> Angel
> >>
>
From angel at miami.edu Tue Dec 2 09:02:55 2003
From: angel at miami.edu (Angel Li)
Date: Tue, 02 Dec 2003 12:02:55 -0500
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
Message-ID: <3FCCC5BF.3030903@miami.edu>

Joseph wrote:

>Indeed my md5sum is different for encoder.pyc. However, when I pulled the
>file and run "cluster-fork" python responds about an import problem. So it
>seems that regeneration did not occur. Is there a flag I need to pass?
>
>I have also tried to figure out what package provides encoder and
>reinstall the package, but an rpm query reveals nothing.
>
>If this is a generated file, what generates it?
>
>It seems that an rpm file query on ganglia show that files in the
>directory belong to the package, but encoder.pyc does not.
>
>Thanks,
>Joseph
>
>
>
>
I have finally found the python sources in the HPC rolls CD, filename
ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
seems python "compiles" the .py files to ".pyc" and then deletes the
source file the first time they are referenced? I also noticed that
there are two versions of python installed. Maybe the pyc files from one
version won't load into the other one?

Angel




From mjk at sdsc.edu Tue Dec 2 15:52:52 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 2 Dec 2003 15:52:52 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <3FCCC5BF.3030903@miami.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
<3FCCC5BF.3030903@miami.edu>
Message-ID: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>

Python creates the .pyc files for you, and does not remove the original
.py file. I would be extremely surprised it two "identical" .pyc files
had the same md5 checksum. I'd expect this to be more like C .o file
which always contain random data to pad out to the end of a page and
32/64 bit word sizes. Still this is just a guess, the real point is
you can always remove the .pyc files and the .py will regenerate it
when imported (although standard UNIX file/dir permission still apply).

What is the import error you get from cluster-fork?

     -mjk

On Dec 2, 2003, at 9:02 AM, Angel Li wrote:

> Joseph wrote:
>
>> Indeed my md5sum is different for encoder.pyc. However, when I pulled
>> the file and run "cluster-fork" python responds about an import
>> problem. So it seems that regeneration did not occur. Is there a flag
>> I need to pass?
>>
>> I have also tried to figure out what package provides encoder and
>> reinstall the package, but an rpm query reveals nothing.
>>
>> If this is a generated file, what generates it?
>>
>> It seems that an rpm file query on ganglia show that files in the
>> directory belong to the package, but encoder.pyc does not.
>>
>> Thanks,
>> Joseph
>>
>>
>>
> I have finally found the python sources in the HPC rolls CD, filename
> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
> seems python "compiles" the .py files to ".pyc" and then deletes the
> source file the first time they are referenced? I also noticed that
> there are two versions of python installed. Maybe the pyc files from
> one version won't load into the other one?
>
> Angel
>
>



From vrowley at ucsd.edu Mon Dec 1 14:27:03 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Mon, 01 Dec 2003 14:27:03 -0800
Subject: [Rocks-Discuss]PXE boot problems
Message-ID: <3FCBC037.5000302@ucsd.edu>

We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to
install a compute node via PXE. We are getting an error similar to the
one mentioned in the archives, e.g.

> Loading initrd.img....
> Ready
>
> Failed to free base memory
>
We have upgraded to syslinux-2.07-1, per the suggestion in the archives,
but continue to get the same error. Any ideas?

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From naihh at imcb.a-star.edu.sg Tue Dec 2 18:50:55 2003
From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)
Date: Wed, 3 Dec 2003 10:50:55 +0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for
Itanium?
Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>


Hi Laurence,

I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
still not working.

Any idea?

Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: Laurence Liew [mailto:laurence at scalablesys.com]
Sent: Thursday, November 20, 2003 2:53 PM
To: Nai Hong Hwa Francis
Cc: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
inRocks 3 for Itanium?

Hi Francis

GridEngine roll is ready for ia32. We will get a ia64 native version
ready as soon as we get back from SC2003. It will be released in a few
weeks time.

Globus GT2.4 is included in the Grid Roll

Cheers!
Laurence


On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>
> Hi,
>
> Does anyone have any idea when will Sun Grid Engine be included as
part
> of Rocks 3 distribution.
>
> I am a newbie to Grid Computing.
> Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>
> Regards
>
> Nai Hong Hwa Francis
>
> Institute of Molecular and Cell Biology (A*STAR)
> 30 Medical Drive
> Singapore 117609
> DID: 65-6874-6196
>
> -----Original Message-----
> From: npaci-rocks-discussion-request at sdsc.edu
> [mailto:npaci-rocks-discussion-request at sdsc.edu]
> Sent: Thursday, November 20, 2003 4:01 AM
> To: npaci-rocks-discussion at sdsc.edu
> Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>
> Send npaci-rocks-discussion mailing list submissions to
>     npaci-rocks-discussion at sdsc.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> or, via email, send a message with subject or body 'help' to
>     npaci-rocks-discussion-request at sdsc.edu
>
> You can reach the person managing the list at
>     npaci-rocks-discussion-admin at sdsc.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of npaci-rocks-discussion digest..."
>
>
> Today's Topics:
>
>    1. top500 cluster installation movie (Greg Bruno)
>    2. Re: Running Normal Application on Rocks Cluster -
>         Newbie Question (Laurence Liew)
>
> --__--__--
>
> Message: 1
> To: npaci-rocks-discussion at sdsc.edu
> From: Greg Bruno <bruno at rocksclusters.org>
> Date: Tue, 18 Nov 2003 13:41:15 -0800
> Subject: [Rocks-Discuss]top500 cluster installation movie
>
> here's a crew of 7, installing the 201st fastest supercomputer in the
> world in under two hours on the showroom floor at SC 03:
>
> http://www.rocksclusters.org/rocks.mov
>
> warning: the above file is ~65MB.
>
>    - gb
>
>
> --__--__--
>
> Message: 2
> Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
Cluster
> -
>      Newbie Question
> From: Laurence Liew <laurenceliew at yahoo.com.sg>
> To: Leong Chee Shian <chee-shian.leong at schenker.com>
> Cc: npaci-rocks-discussion at sdsc.edu
> Date: Wed, 19 Nov 2003 12:31:18 +0800
>
> Chee Shian,
>
> Thanks for your call. We will take this off list and visit you next
week
> in your office as you requested.
>
> Cheers!
> laurence
>
>
>
> On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > I have just installed Rocks 3.0 with one frontend and two compute
> > node.
> >
> > A normal file based application is installed on the frontend and is
> > NFS shared to the compute nodes .
> >
> > Question is : When run 5 sessions of my applications , the CPU
> > utilization is all concentrated on the frontend node , nothing is
> > being passed on to the compute nodes . How do I make these 3
computers
> > to function as one and share the load ?
> >
> > Thanks everyone as I am really new to this clustering stuff..
> >
> > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > intel machines to replace our existing multi CPU sun server,
> > suggestions and recommendations are greatly appreciated.
> >
> >
> > Leong
> >
> >
> >
>
>
>
> --__--__--
>
> _______________________________________________
> npaci-rocks-discussion mailing list
> npaci-rocks-discussion at sdsc.edu
> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>
>
> End of npaci-rocks-discussion Digest
>
>
> DISCLAIMER:
> This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its contents to any
other person as it may be an offence under the Official Secrets Act.
Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel   : 65 6827 3953
Fax    : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com



DISCLAIMER:
This email is confidential and may be privileged. If you are not the intended
recipient, please delete it and notify us immediately. Please do not copy or use it
for any purpose, or disclose its contents to any other person as it may be an
offence under the Official Secrets Act. Thank you.


From laurence at scalablesys.com Tue Dec 2 19:10:08 2003
From: laurence at scalablesys.com (Laurence Liew)
Date: Wed, 03 Dec 2003 11:10:08 +0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included
      inRocks 3 for Itanium?
In-Reply-To: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>
References:
       <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg>
Message-ID: <1070421007.2452.51.camel@scalable>

Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!
laurence



On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:
> Hi Laurence,
>
>   I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
>   still not working.
>
>   Any idea?
>
>   Nai Hong Hwa Francis
>   Institute of Molecular and Cell Biology (A*STAR)
>   30 Medical Drive
>   Singapore 117609.
>   DID: (65) 6874-6196
>
>   -----Original Message-----
>   From: Laurence Liew [mailto:laurence at scalablesys.com]
>   Sent: Thursday, November 20, 2003 2:53 PM
>   To: Nai Hong Hwa Francis
>   Cc: npaci-rocks-discussion at sdsc.edu
>   Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
>   inRocks 3 for Itanium?
>
>   Hi Francis
>
>   GridEngine roll is ready for ia32. We will get a ia64 native version
>   ready as soon as we get back from SC2003. It will be released in a few
>   weeks time.
>
>   Globus GT2.4 is included in the Grid Roll
>
>   Cheers!
>   Laurence
>
>
>   On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>   >
>   > Hi,
>   >
>   > Does anyone have any idea when will Sun Grid Engine be included as
>   part
>   > of Rocks 3 distribution.
>   >
>   > I am a newbie to Grid Computing.
>   > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>   >
>   > Regards
>   >
>   > Nai Hong Hwa Francis
>   >
>   > Institute of Molecular and Cell Biology (A*STAR)
>   > 30 Medical Drive
>   > Singapore 117609
>   > DID: 65-6874-6196
>   >
>   > -----Original Message-----
>   > From: npaci-rocks-discussion-request at sdsc.edu
>   > [mailto:npaci-rocks-discussion-request at sdsc.edu]
>   > Sent: Thursday, November 20, 2003 4:01 AM
>   > To: npaci-rocks-discussion at sdsc.edu
>   > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>   >
>   > Send npaci-rocks-discussion mailing list submissions to
>   >   npaci-rocks-discussion at sdsc.edu
>   >
>   > To subscribe or unsubscribe via the World Wide Web, visit
>   >
>   > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>   > or, via email, send a message with subject or body 'help' to
>   >   npaci-rocks-discussion-request at sdsc.edu
>   >
>   > You can reach the person managing the list at
>   >   npaci-rocks-discussion-admin at sdsc.edu
>   >
>   > When replying, please edit your Subject line so it is more specific
>   > than "Re: Contents of npaci-rocks-discussion digest..."
>   >
>   >
>   > Today's Topics:
>   >
>   >     1. top500 cluster installation movie (Greg Bruno)
>   >     2. Re: Running Normal Application on Rocks Cluster -
>   >         Newbie Question (Laurence Liew)
>   >
>   > --__--__--
>   >
>   > Message: 1
>   > To: npaci-rocks-discussion at sdsc.edu
>   > From: Greg Bruno <bruno at rocksclusters.org>
>   > Date: Tue, 18 Nov 2003 13:41:15 -0800
>   > Subject: [Rocks-Discuss]top500 cluster installation movie
>   >
>   > here's a crew of 7, installing the 201st fastest supercomputer in the
>   > world in under two hours on the showroom floor at SC 03:
>   >
>   > http://www.rocksclusters.org/rocks.mov
>   >
>   > warning: the above file is ~65MB.
>   >
>   >    - gb
>   >
>   >
>   > --__--__--
>   >
>   > Message: 2
>   > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
>   Cluster
>   > -
>   >   Newbie Question
>   > From: Laurence Liew <laurenceliew at yahoo.com.sg>
>   > To: Leong Chee Shian <chee-shian.leong at schenker.com>
>   > Cc: npaci-rocks-discussion at sdsc.edu
>   > Date: Wed, 19 Nov 2003 12:31:18 +0800
>   >
>   > Chee Shian,
>   >
>   > Thanks for your call. We will take this off list and visit you next
>   week
>   > in your office as you requested.
>   >
>   > Cheers!
>   > laurence
> >
> >
> >
> > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
> > > I have just installed Rocks 3.0 with one frontend and two compute
> > > node.
> > >
> > > A normal file based application is installed on the frontend and is
> > > NFS shared to the compute nodes .
> > >
> > > Question is : When run 5 sessions of my applications , the CPU
> > > utilization is all concentrated on the frontend node , nothing is
> > > being passed on to the compute nodes . How do I make these 3
> computers
> > > to function as one and share the load ?
> > >
> > > Thanks everyone as I am really new to this clustering stuff..
> > >
> > > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > > intel machines to replace our existing multi CPU sun server,
> > > suggestions and recommendations are greatly appreciated.
> > >
> > >
> > > Leong
> > >
> > >
> > >
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> > npaci-rocks-discussion at sdsc.edu
> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> >
> >
> > End of npaci-rocks-discussion Digest
> >
> >
> > DISCLAIMER:
> > This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its contents to any
> other person as it may be an offence under the Official Secrets Act.
> Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel   : 65 6827 3953
Fax    : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com
From DGURGUL at PARTNERS.ORG Wed Dec 3 07:24:29 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Wed, 3 Dec 2003 10:24:29 -0500
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo
      cks 3 for Itanium?
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>

Where do we find the SGE roll?   Under Lhoste at http://rocks.npaci.edu/Rocks/
there is a "Grid" roll listed.   Is SGE in that? The userguide doesn't mention
SGE.

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


-----Original Message-----
From: npaci-rocks-discussion-admin at sdsc.edu
[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Laurence Liew
Sent: Tuesday, December 02, 2003 10:10 PM
To: Nai Hong Hwa Francis
Cc: npaci-rocks-discussion at sdsc.edu
Subject: RE: [Rocks-Discuss]RE: When will Sun Grid Engine be included
inRocks 3 for Itanium?


Hi,

SGE is in the SGE roll.

You need to download the base, hpc and sge roll.

The install is now different from V2.3.x

Cheers!
laurence



On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote:
> Hi Laurence,
>
> I just downloaded the Rocks3.0 for IA32 and installed it but SGE is
> still not working.
>
> Any idea?
>
> Nai Hong Hwa Francis
> Institute of Molecular and Cell Biology (A*STAR)
> 30 Medical Drive
> Singapore 117609.
> DID: (65) 6874-6196
>
> -----Original Message-----
> From: Laurence Liew [mailto:laurence at scalablesys.com]
> Sent: Thursday, November 20, 2003 2:53 PM
>   To: Nai Hong Hwa Francis
>   Cc: npaci-rocks-discussion at sdsc.edu
>   Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included
>   inRocks 3 for Itanium?
>
>   Hi Francis
>
>   GridEngine roll is ready for ia32. We will get a ia64 native version
>   ready as soon as we get back from SC2003. It will be released in a few
>   weeks time.
>
>   Globus GT2.4 is included in the Grid Roll
>
>   Cheers!
>   Laurence
>
>
>   On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote:
>   >
>   > Hi,
>   >
>   > Does anyone have any idea when will Sun Grid Engine be included as
>   part
>   > of Rocks 3 distribution.
>   >
>   > I am a newbie to Grid Computing.
>   > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid?
>   >
>   > Regards
>   >
>   > Nai Hong Hwa Francis
>   >
>   > Institute of Molecular and Cell Biology (A*STAR)
>   > 30 Medical Drive
>   > Singapore 117609
>   > DID: 65-6874-6196
>   >
>   > -----Original Message-----
>   > From: npaci-rocks-discussion-request at sdsc.edu
>   > [mailto:npaci-rocks-discussion-request at sdsc.edu]
>   > Sent: Thursday, November 20, 2003 4:01 AM
>   > To: npaci-rocks-discussion at sdsc.edu
>   > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs
>   >
>   > Send npaci-rocks-discussion mailing list submissions to
>   >   npaci-rocks-discussion at sdsc.edu
>   >
>   > To subscribe or unsubscribe via the World Wide Web, visit
>   >
>   > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>   > or, via email, send a message with subject or body 'help' to
>   >   npaci-rocks-discussion-request at sdsc.edu
>   >
>   > You can reach the person managing the list at
>   >   npaci-rocks-discussion-admin at sdsc.edu
>   >
>   > When replying, please edit your Subject line so it is more specific
>   > than "Re: Contents of npaci-rocks-discussion digest..."
>   >
>   >
>   > Today's Topics:
>   >
>   >     1. top500 cluster installation movie (Greg Bruno)
>   >     2. Re: Running Normal Application on Rocks Cluster -
>   >         Newbie Question (Laurence Liew)
>   >
>   > --__--__--
>   >
>   > Message: 1
>   > To: npaci-rocks-discussion at sdsc.edu
>   > From: Greg Bruno <bruno at rocksclusters.org>
>   > Date: Tue, 18 Nov 2003 13:41:15 -0800
>   > Subject: [Rocks-Discuss]top500 cluster installation movie
>   >
>   > here's a crew of 7, installing the 201st fastest supercomputer in the
>   > world in under two hours on the showroom floor at SC 03:
>   >
>   > http://www.rocksclusters.org/rocks.mov
>   >
>   > warning: the above file is ~65MB.
>   >
>   >    - gb
>   >
>   >
>   > --__--__--
>   >
>   > Message: 2
>   > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks
>   Cluster
>   > -
>   >   Newbie Question
>   > From: Laurence Liew <laurenceliew at yahoo.com.sg>
>   > To: Leong Chee Shian <chee-shian.leong at schenker.com>
>   > Cc: npaci-rocks-discussion at sdsc.edu
>   > Date: Wed, 19 Nov 2003 12:31:18 +0800
>   >
>   > Chee Shian,
>   >
>   > Thanks for your call. We will take this off list and visit you next
>   week
>   > in your office as you requested.
>   >
>   > Cheers!
>   > laurence
>   >
>   >
>   >
>   > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote:
>   > > I have just installed Rocks 3.0 with one frontend and two compute
>   > > node.
>   > >
>   > > A normal file based application is installed on the frontend and is
>   > > NFS shared to the compute nodes .
>   > >
>   > > Question is : When run 5 sessions of my applications , the CPU
>   > > utilization is all concentrated on the frontend node , nothing is
>   > > being passed on to the compute nodes . How do I make these 3
>   computers
> > > to function as one and share the load ?
> > >
> > > Thanks everyone as I am really new to this clustering stuff..
> > >
> > > PS : The idea of exploring rocks cluster is to use a few inexpensive
> > > intel machines to replace our existing multi CPU sun server,
> > > suggestions and recommendations are greatly appreciated.
> > >
> > >
> > > Leong
> > >
> > >
> > >
> >
> >
> >
> > --__--__--
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> > npaci-rocks-discussion at sdsc.edu
> > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
> >
> >
> > End of npaci-rocks-discussion Digest
> >
> >
> > DISCLAIMER:
> > This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its contents to any
> other person as it may be an offence under the Official Secrets Act.
> Thank you.
--
Laurence Liew
CTO, Scalable Systems Pte Ltd
7 Bedok South Road
Singapore 469272
Tel   : 65 6827 3953
Fax    : 65 6827 3922
Mobile: 65 9029 4312
Email : laurence at scalablesys.com
http://www.scalablesys.com


From bruno at rocksclusters.org Wed Dec 3 07:32:14 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 3 Dec 2003 07:32:14 -0800
Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for
Itanium?
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu>
Message-ID: <DF132702-25A5-11D8-86E6-000A95C4E3B4@rocksclusters.org>

>   Where do we find the SGE roll?   Under Lhoste at
>   http://rocks.npaci.edu/Rocks/
>   there is a "Grid" roll listed.   Is SGE in that?   The userguide doesn't
>   mention
>   SGE.
the SGE roll will be available in the upcoming v3.1.0 release.
scheduled release date is december 15th.

  - gb



From jlkaiser at fnal.gov Wed Dec 3 08:35:18 2003
From: jlkaiser at fnal.gov (Joe Kaiser)
Date: Wed, 03 Dec 2003 10:35:18 -0600
Subject: [Rocks-Discuss]supermicro based MB's
In-Reply-To: <3FCC824B.5060406@scalableinformatics.com>
References: <3FCC824B.5060406@scalableinformatics.com>
Message-ID: <1070469318.12324.13.camel@nietzsche.fnal.gov>

Hi,

You don't say what version of Rocks you are using. The following is for
the X5DPA-GG board and Rocks 3.0. It requires modifying only the
pcitable in the boot image on the tftp server. I believe the procedure
for 2.3.2 requires a heck of a lot more work, (but it may not). I would
have to dig deep for the notes about the changing 2.3.2.

This should be done on the frontend:

cd /tftpboot/X86PC/UNDI/pxelinux/
cp initrd.img initrd.img.orig
cp initrd.img /tmp
cd /tmp
mv initrd.img initrd.gz
gunzip initrd.gz
mkdir /mnt/loop
mount -o loop initrd /mnt/loop
cd /mnt/loop/modules/
vi pcitable

Search for the e1000 drivers and add the following line:

0x8086 0x1013    "e1000" "Intel Corp.|82546EB Gigabit Ethernet
Controller"

write the file

cd /tmp
umount /mnt/loop
gzip initrd
mv initrd.gz initrd.img
mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/

Then boot the node.

Hope this helps.

Thanks,

Joe

On Tue, 2003-12-02 at 06:15, Joe Landman wrote:
> Folks:
>
>   Working on integrating a Supermicro MB based cluster. Discovered early
> on that all of the compute nodes have an Intel based NIC that RedHat
> doesn't know anything about (any version of RH). Some of the
> administrative nodes have other similar issues. I am seeing simply a
> suprising number of mis/un detected hardware across the collection of MBs.
>
>   Anyone have advice on where to get modules/module source for Redhat
> for these things? It looks like I will need to rebuild the boot CD,
> though the several times I have tried this previously have failed to
> produce a working/bootable system. It looks like new modules need to be
> created/inserted into the boot process (head node and cluster nodes)
> kernels, as well as into the installable kernels.
>
>     Has anyone done this for a Supermicro MB based system?  Thanks .
>
> Joe
--
===================================================================
Joe Kaiser - Systems Administrator

Fermi Lab
CD/OSS-SCS                Never laugh at live dragons.
630-840-6444
jlkaiser at fnal.gov
===================================================================



From jghobrial at uh.edu Wed Dec 3 08:59:15 2003
From: jghobrial at uh.edu (Joseph)
Date: Wed, 3 Dec 2003 10:59:15 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
 <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
 <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
Message-ID: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>

Here is the error I receive when I remove the file encoder.pyc and run the
command cluster-fork

Traceback (innermost last):
  File "/opt/rocks/sbin/cluster-fork", line 88, in ?
    import rocks.pssh
  File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
    import gmon.encoder
ImportError: No module named encoder

Thanks,
Joseph


On Tue, 2 Dec 2003, Mason J. Katz wrote:

> Python creates the .pyc files for you, and does not remove the original
>   .py file. I would be extremely surprised it two "identical" .pyc files
>   had the same md5 checksum. I'd expect this to be more like C .o file
>   which always contain random data to pad out to the end of a page and
>   32/64 bit word sizes. Still this is just a guess, the real point is
>   you can always remove the .pyc files and the .py will regenerate it
>   when imported (although standard UNIX file/dir permission still apply).
>
>   What is the import error you get from cluster-fork?
>
>      -mjk
>
>   On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>
>   > Joseph wrote:
>   >
>   >> Indeed my md5sum is different for encoder.pyc. However, when I pulled
>   >> the file and run "cluster-fork" python responds about an import
>   >> problem. So it seems that regeneration did not occur. Is there a flag
>   >> I need to pass?
>   >>
>   >> I have also tried to figure out what package provides encoder and
>   >> reinstall the package, but an rpm query reveals nothing.
>   >>
>   >> If this is a generated file, what generates it?
>   >>
>   >> It seems that an rpm file query on ganglia show that files in the
>   >> directory belong to the package, but encoder.pyc does not.
>   >>
>   >> Thanks,
>   >> Joseph
>   >>
>   >>
>   >>
>   > I have finally found the python sources in the HPC rolls CD, filename
>   > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>   > seems python "compiles" the .py files to ".pyc" and then deletes the
>   > source file the first time they are referenced? I also noticed that
>   > there are two versions of python installed. Maybe the pyc files from
>   > one version won't load into the other one?
>   >
>   > Angel
>   >
>   >
>


From mjk at sdsc.edu Wed Dec 3 15:19:38 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 3 Dec 2003 15:19:38 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
<3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
<Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
Message-ID: <2A332131-25E7-11D8-A641-000A95DA5638@sdsc.edu>

This file come from a ganglia package, what does
# rpm -q ganglia-receptor

Return?

     -mjk


On Dec 3, 2003, at 8:59 AM, Joseph wrote:

> Here is the error I receive when I remove the file encoder.pyc and run
> the
> command cluster-fork
>
> Traceback (innermost last):
>    File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>      import rocks.pssh
>    File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>      import gmon.encoder
> ImportError: No module named encoder
>
> Thanks,
> Joseph
>
>
> On Tue, 2 Dec 2003, Mason J. Katz wrote:
>
>> Python creates the .pyc files for you, and does not remove the
>> original
>> .py file. I would be extremely surprised it two "identical" .pyc
>> files
>> had the same md5 checksum. I'd expect this to be more like C .o file
>> which always contain random data to pad out to the end of a page and
>> 32/64 bit word sizes. Still this is just a guess, the real point is
>> you can always remove the .pyc files and the .py will regenerate it
>> when imported (although standard UNIX file/dir permission still
>> apply).
>>
>> What is the import error you get from cluster-fork?
>>
>>     -mjk
>>
>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>>
>>> Joseph wrote:
>>>
>>>> Indeed my md5sum is different for encoder.pyc. However, when I
>>>> pulled
>>>> the file and run "cluster-fork" python responds about an import
>>>> problem. So it seems that regeneration did not occur. Is there a
>>>> flag
>>>> I need to pass?
>>>>
>>>> I have also tried to figure out what package provides encoder and
>>>> reinstall the package, but an rpm query reveals nothing.
>>>>
>>>> If this is a generated file, what generates it?
>>>>
>>>> It seems that an rpm file query on ganglia show that files in the
>>>> directory belong to the package, but encoder.pyc does not.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>> I have finally found the python sources in the HPC rolls CD, filename
>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>>> seems python "compiles" the .py files to ".pyc" and then deletes the
>>> source file the first time they are referenced? I also noticed that
>>> there are two versions of python installed. Maybe the pyc files from
>>> one version won't load into the other one?
>>>
>>> Angel
>>>
>>>
>>



From csamuel at vpac.org Wed Dec 3 18:09:26 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Thu, 4 Dec 2003 13:09:26 +1100
Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL
trademark removal ?
Message-ID: <200312041309.27986.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

Can someone confirm that the next Rocks release will support Opteron please ?

Also, I noticed that the current Rocks release on Itanium based on RHEL still
has a lot of mentions of RedHat in it, which from my reading of their
trademark guidelines is not permitted, is that fixed in the new version ?

cheers!
Chris
- --
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/zpdWO2KABBYQAh8RAqB8AJ9FG+IjIeem21qlFS6XYIHamIMPmwCghVTV
AgjAlVHWgdv/KzYQinHGPxs=
=IAWU
-----END PGP SIGNATURE-----



From bruno at rocksclusters.org Wed Dec 3 18:46:30 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 3 Dec 2003 18:46:30 -0800
Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL
trademark removal ?
In-Reply-To: <200312041309.27986.csamuel@vpac.org>
References: <200312041309.27986.csamuel@vpac.org>
Message-ID: <10AD9827-2604-11D8-86E6-000A95C4E3B4@rocksclusters.org>

> Can someone confirm that the next Rocks release will support Opteron
> please ?

yes, it will support opteron.

>   Also, I noticed that the current Rocks release on Itanium based on
>   RHEL still
>   has a lot of mentions of RedHat in it, which from my reading of their
>   trademark guidelines is not permitted, is that fixed in the new
>   version ?

and yes, (even though it doesn't feel like the right thing to do, as
redhat has offered to the community some outstanding technologies that
we'd like to credit), all redhat trademarks will be removed from 3.1.0.

    - gb



From fds at sdsc.edu Thu Dec 4 06:46:32 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Thu, 4 Dec 2003 06:46:32 -0800
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-
A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
<3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
<Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
Message-ID: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>

Please install the
http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1
-2.i386.rpm package, which includes the correct encoder.py file. (This
package is listed on the 3.0.0 errata page)

-Federico

On Dec 3, 2003, at 8:59 AM, Joseph wrote:

>   Here is the error I receive when I remove the file encoder.pyc and run
>   the
>   command cluster-fork
>
>   Traceback (innermost last):
>     File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>       import rocks.pssh
>     File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>       import gmon.encoder
>   ImportError: No module named encoder
>
>   Thanks,
>   Joseph
>
>
> On Tue, 2 Dec 2003, Mason J. Katz wrote:
>
>> Python creates the .pyc files for you, and does not remove the
>> original
>> .py file. I would be extremely surprised it two "identical" .pyc
>> files
>> had the same md5 checksum. I'd expect this to be more like C .o file
>> which always contain random data to pad out to the end of a page and
>> 32/64 bit word sizes. Still this is just a guess, the real point is
>> you can always remove the .pyc files and the .py will regenerate it
>> when imported (although standard UNIX file/dir permission still
>> apply).
>>
>> What is the import error you get from cluster-fork?
>>
>>    -mjk
>>
>> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>>
>>> Joseph wrote:
>>>
>>>> Indeed my md5sum is different for encoder.pyc. However, when I
>>>> pulled
>>>> the file and run "cluster-fork" python responds about an import
>>>> problem. So it seems that regeneration did not occur. Is there a
>>>> flag
>>>> I need to pass?
>>>>
>>>> I have also tried to figure out what package provides encoder and
>>>> reinstall the package, but an rpm query reveals nothing.
>>>>
>>>> If this is a generated file, what generates it?
>>>>
>>>> It seems that an rpm file query on ganglia show that files in the
>>>> directory belong to the package, but encoder.pyc does not.
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>>
>>>>
>>> I have finally found the python sources in the HPC rolls CD, filename
>>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>>> seems python "compiles" the .py files to ".pyc" and then deletes the
>>> source file the first time they are referenced? I also noticed that
>>> there are two versions of python installed. Maybe the pyc files from
>>> one version won't load into the other one?
>>>
>>> Angel
>>>
>>>
>>
>>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA
From jghobrial at uh.edu Thu Dec 4 07:14:21 2003
From: jghobrial at uh.edu (Joseph)
Date: Thu, 4 Dec 2003 09:14:21 -0600 (CST)
Subject: [Rocks-Discuss]cluster-fork
In-Reply-To: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>
References: <3FCB879E.8050905@miami.edu>
<Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu>
 <1B15A45F-2457-11D8-A374-00039389B580@uci.edu>
<Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu>
 <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu>
 <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu>
 <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu>
Message-ID: <Pine.LNX.4.56.0312040913110.13972@mail.tlc2.uh.edu>

Thank you very much this solved the problem.

Joseph


On Thu, 4 Dec 2003, Federico Sacerdoti wrote:

>   Please install the
>   http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1
>   -2.i386.rpm package, which includes the correct encoder.py file. (This
>   package is listed on the 3.0.0 errata page)
>
>   -Federico
>
>   On Dec 3, 2003, at 8:59 AM, Joseph wrote:
>
>   > Here is the error I receive when I remove the file encoder.pyc and run
>   > the
>   > command cluster-fork
>   >
>   > Traceback (innermost last):
>   >   File "/opt/rocks/sbin/cluster-fork", line 88, in ?
>   >     import rocks.pssh
>   >   File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
>   >     import gmon.encoder
>   > ImportError: No module named encoder
>   >
>   > Thanks,
>   > Joseph
>   >
>   >
>   > On Tue, 2 Dec 2003, Mason J. Katz wrote:
>   >
>   >> Python creates the .pyc files for you, and does not remove the
>   >> original
>   >> .py file. I would be extremely surprised it two "identical" .pyc
>   >> files
>   >> had the same md5 checksum. I'd expect this to be more like C .o file
>   >> which always contain random data to pad out to the end of a page and
>   >> 32/64 bit word sizes. Still this is just a guess, the real point is
>   >> you can always remove the .pyc files and the .py will regenerate it
>   >> when imported (although standard UNIX file/dir permission still
>   >> apply).
>   >>
>   >> What is the import error you get from cluster-fork?
>   >>
>   >> -mjk
>   >>
>   >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote:
>   >>
>   >>> Joseph wrote:
>   >>>
>   >>>> Indeed my md5sum is different for encoder.pyc. However, when I
>   >>>> pulled
>   >>>> the file and run "cluster-fork" python responds about an import
>   >>>> problem. So it seems that regeneration did not occur. Is there a
>   >>>> flag
>   >>>> I need to pass?
>   >>>>
>   >>>> I have also tried to figure out what package provides encoder and
>   >>>> reinstall the package, but an rpm query reveals nothing.
>   >>>>
>   >>>> If this is a generated file, what generates it?
>   >>>>
>   >>>> It seems that an rpm file query on ganglia show that files in the
>   >>>> directory belong to the package, but encoder.pyc does not.
>   >>>>
>   >>>> Thanks,
>   >>>> Joseph
>   >>>>
>   >>>>
>   >>>>
>   >>> I have finally found the python sources in the HPC rolls CD, filename
>   >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it
>   >>> seems python "compiles" the .py files to ".pyc" and then deletes the
>   >>> source file the first time they are referenced? I also noticed that
>   >>> there are two versions of python installed. Maybe the pyc files from
>   >>> one version won't load into the other one?
>   >>>
>   >>> Angel
>   >>>
>   >>>
>   >>
>   >>
>   Federico
>
>   Rocks Cluster Group, San Diego Supercomputing Center, CA
>


From vrowley at ucsd.edu Thu Dec 4 12:29:55 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Thu, 04 Dec 2003 12:29:55 -0800
Subject: [Rocks-Discuss]Re: PXE boot problems
In-Reply-To: <3FCBC037.5000302@ucsd.edu>
References: <3FCBC037.5000302@ucsd.edu>
Message-ID: <3FCF9943.1020806@ucsd.edu>

Uh, nevermind. We had upgraded syslinux on our frontend, not the node
we were trying to PXE boot. Sigh.

V. Rowley wrote:
> We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to
> install a compute node via PXE. We are getting an error similar to the
> one mentioned in the archives, e.g.
>
>> Loading initrd.img....
>> Ready
>>
>> Failed to free base memory
>>
>
> We have upgraded to syslinux-2.07-1, per the suggestion in the archives,
> but continue to get the same error. Any ideas?
>

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From cdwan at mail.ahc.umn.edu Fri Dec 5 08:16:07 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Fri, 5 Dec 2003 10:16:07 -0600 (CST)
Subject: [Rocks-Discuss]Private NIS master
Message-ID: <Pine.GSO.4.58.0312042305070.18193@lenti.med.umn.edu>

Hello all. Long time listener, first time caller.    Thanks for all the
great work.

I'm integrating a Rocks cluster into an existing NIS domain. I noticed
that while the cluster database now supports a PrivateNISMaster, that
variable doesn't make it into the /etc/yp.conf on the compute nodes. They
remain broadcast.

Assume that, for whatever reason, I don't want to set up a repeater
(slave) ypserv process on my frontend.    I added the option "--nisserver
<var name="Kickstart_PrivateNISMaster"/>" to the
"profiles/3.0.0/nodes/nis-client.xml" file, removed the ypserver on my
frontend, and it works like I want it to.

Am I missing anything fundamental here?

-Chris Dwan
 University of Minnesota


From wyzhong78 at msn.com Mon Dec 8 06:18:34 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 08 Dec 2003 22:18:34 +0800
Subject: [Rocks-Discuss]3.0.0 problem: not able to boot up
Message-ID: <BAY3-F14uFqD45TpNO40002c14c@hotmail.com>

Hi,everyone!
I installed rocks 3.0.0 defautly, There wasn't any trouble in the
installing. But I haven't be able to boot,it stopped at the beginning,the
message "GRUB" showed on the screen,and waiting....
   my hardware are double Xeon 2.4G,MSI 9138,Seagate SCSI disk.
   Any appreciate is welcome!

_________________________________________________________________
???? MSN Explorer:   http://explorer.msn.com/lccn/



From angelini at vki.ac.be Mon Dec 8 06:20:45 2003
From: angelini at vki.ac.be (Angelini Giuseppe)
Date: Mon, 08 Dec 2003 15:20:45 +0100
Subject: [Rocks-Discuss]How to use MPICH with ssh
Message-ID: <3FD488BD.3EBBDB8D@vki.ac.be>

Dear rocks folk,


I have recently installed mpich with Lahay Fortran and now that I can
compile and link,
I would like to run but it seems that I have another problem. In fact I
have the following
error message when I try to run:

[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
$DPT/hybflow
p0_13226: p4_error: Path to program is invalid while starting
/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
-1
    p4_error: latest msg from perror: No such file or directory
p0_13226: p4_error: Child process exited while making connection to
remote process on compute-0-6: 0
p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32

I am wondering why it is looking for /usr/bin/rsh for the communication,

I expected to use ssh and not rsh.

Any help will be welcome.


Regards.


Giuseppe Angelini



From casuj at cray.com Mon Dec 8 07:31:21 2003
From: casuj at cray.com (John Casu)
Date: Mon, 8 Dec 2003 07:31:21 -0800
Subject: [Rocks-Discuss]How to use MPICH with ssh
In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>; from Angelini Giuseppe on Mon, Dec 08,
2003 at 03:20:45PM +0100
References: <3FD488BD.3EBBDB8D@vki.ac.be>
Message-ID: <20031208073121.A10151@stemp3.wc.cray.com>
On Mon, Dec 08, 2003 at 03:20:45PM +0100, Angelini Giuseppe wrote:
>
> Dear rocks folk,
>
>
> I have recently installed mpich with Lahay Fortran and now that I can
> compile and link,
> I would like to run but it seems that I have another problem. In fact I
> have the following
> error message when I try to run:
>
> [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
> $DPT/hybflow
> p0_13226: p4_error: Path to program is invalid while starting
> /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
> -1
>     p4_error: latest msg from perror: No such file or directory
> p0_13226: p4_error: Child process exited while making connection to
> remote process on compute-0-6: 0
> p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
> p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32
>
> I am wondering why it is looking for /usr/bin/rsh for the communication,
>
> I expected to use ssh and not rsh.
>
> Any help will be welcome.
>


build mpich thus:

RSHCOMMAND=ssh ./configure .....


>
> Regards.
>
>
> Giuseppe Angelini

--
"Roses are red, Violets are blue,
 You lookin' at me ?
 YOU LOOKIN' AT ME ?!"    -- Get Fuzzy.
=======================================================================
John Casu
Cray Inc.                                           casuj at cray.com
411 First Avenue South, Suite 600                   Tel: (206) 701-2173
Seattle, WA 98104-2860                              Fax: (206) 701-2500
=======================================================================


From davidow at molbio.mgh.harvard.edu Mon Dec 8 08:12:53 2003
From: davidow at molbio.mgh.harvard.edu (Lance Davidow)
Date: Mon, 8 Dec 2003 11:12:53 -0500
Subject: [Rocks-Discuss]How to use MPICH with ssh
In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>
References: <3FD488BD.3EBBDB8D@vki.ac.be>
Message-ID: <p06002001bbfa51fea005@[132.183.190.222]>

Giuseppe,

Here's an answer from a newbie who just faced the same problem.

You are using the wrong flavor of mpich (and mpirun). There are
several different distributions which work differently in ROCKS. the
one you are using in the default path expects serv_p4 demons and
.rhosts files in your home directory. The different flavors may be
more compatible with different compilers as well.

[lance at rescluster2 lance]$ which   mpirun
/opt/mpich-mpd/gnu/bin/mpirun

the one you probably want is
/opt/mpich/gnu/bin/mpirun

[lance at rescluster2 lance]$ locate mpirun
...
/opt/mpich-mpd/gnu/bin/mpirun
...
/opt/mpich/myrinet/gnu/bin/mpirun
...
/opt/mpich/gnu/bin/mpirun

Cheers,
Lance


At 3:20 PM +0100 12/8/03, Angelini Giuseppe wrote:
>Dear rocks folk,
>
>
>I have recently installed mpich with Lahay Fortran and now that I can
>compile and link,
>I would like to run but it seems that I have another problem. In fact I
>have the following
>error message when I try to run:
>
>[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE
>$DPT/hybflow
>p0_13226: p4_error: Path to program is invalid while starting
>/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7:
>-1
>     p4_error: latest msg from perror: No such file or directory
>p0_13226: p4_error: Child process exited while making connection to
>remote process on compute-0-6: 0
>p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32
>p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32
>
>I am wondering why it is looking for /usr/bin/rsh for the communication,
>
>I expected to use ssh and not rsh.
>
>Any help will be welcome.
>
>
>Regards.
>
>Giuseppe Angelini


--
Lance Davidow, PhD
Director of Bioinformatics
Dept of Molecular Biology
Mass General Hospital
Boston MA 02114
davidow at molbio.mgh.harvard.edu
617.726-5955
Fax: 617.726-6893


From rscarce at caci.com Fri Dec 5 16:43:00 2003
From: rscarce at caci.com (Reed Scarce)
Date: Fri, 5 Dec 2003 19:43:00 -0500
Subject: [Rocks-Discuss]PXE and system images
Message-ID: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>

We want to initialize new hardware with a known good image from identical
hardware currently in use. The process imagined would be to PXE boot to a
disk image server, PXE would create a RAM system that would request the
system disk image from the server, which would push the desired system
disk image to the requesting system. Upon completion the system would be
available as a cluster member.

The lab configuration is a PC grade frontend with two 3Com 905s and a
single server grade cluster node with integrated Intel 82551 (10/100)(the
only PXE interface) and two integrated Intel 82546 (10/100/1000). The
cluster node is one of the stock of nodes for the expansion. The stock of
nodes have a Linux OS pre-installed, which would be eliminated in the
process.

Currently the node will PXE boot from the 10/100 and pickup an
installation boot from one of the g-bit interfaces. From there kickstart
wants to take over.

Any recommendations how to get kickstart to push an image to the disk?

Thanks,

Reed Scarce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031205/dad04521/attachment-0001.html

From wyzhong78 at msn.com Mon Dec 8 05:36:37 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 08 Dec 2003 21:36:37 +0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
Message-ID: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>

Hi,everyone!
I have installed Rocks 3.0.0 with default options successful,there was not
any trouble.But I boot it up,it stopped at beginning,just show "GRUB" on
the screen and waiting...
Thanks for your help!

_________________________________________________________________
???? MSN Explorer:   http://explorer.msn.com/lccn/



From daniel.kidger at quadrics.com Mon Dec 8 09:54:53 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 8 Dec 2003 17:54:53 -0000
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>

Dear all,
    Previously I have been installing a custom kernel on the compute nodes
with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).

However I am now trying to do it the 'proper' way. So I do (on :
# cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
  /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
# cd /home/install
# rocks-dist dist
# SSH_NO_PASSWD=1 shoot-node compute-0-0

Hence:
# find /home/install/ |xargs -l grep -nH qsnet
shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate
my rpm in that directory rocks-dist notices this and warns me.)

However the node always ends up with "2.4.20-20.7smp" again.
anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-
smp-2.4.20-20.7."

So my question is:
   It looks like my RPM has a name that Rocks doesn't understand properly.
   What is wrong with my name ?
   and what are the rules for getting the correct name ?
     (.i686.rpm is of course correct, but I don't have -smp. in the name Is this
the problem ?)

cf. Greg Bruno's wisdom:
  https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

>


From DGURGUL at PARTNERS.ORG Mon Dec 8 11:09:27 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Mon, 8 Dec 2003 14:09:27 -0500
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15840@phsexch7.mgh.harvard.edu>

I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and
then "cluster-fork service gschedule restart" (not sure I had to do the last).
I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one who
ssh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for
the user on 0-17):

17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03
17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10:   1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07
10: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
10: dennis   pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the correct
nodes.

Do the numbers on the left of the -mpd output correspond to the node names?

Thanks.

Dennis

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169



From DGURGUL at PARTNERS.ORG Mon Dec 8 11:28:30 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Mon, 8 Dec 2003 14:28:30 -0500
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>

Maybe this is a better description of the "strangeness".

I did "cluster-fork --mpd hostname":

1:   compute-0-0.local
2:   compute-0-1.local
3:   compute-0-3.local
4:   compute-0-13.local
5:   compute-0-11.local
6:   compute-0-15.local
7:   compute-0-16.local
8:   compute-0-19.local
9:   compute-0-21.local
10: compute-0-17.local
11: compute-0-5.local
12: compute-0-20.local
13: compute-0-18.local
14: compute-0-12.local
15: compute-0-9.local
16: compute-0-4.local
17: compute-0-8.local
18: compute-0-14.local
19: compute-0-2.local
20: compute-0-6.local
0: compute-0-7.local
21: compute-0-10.local

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


-----Original Message-----
From: npaci-rocks-discussion-admin at sdsc.edu
[mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
Dennis J.
Sent: Monday, December 08, 2003 2:09 PM
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness


I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
and
then "cluster-fork service gschedule restart" (not sure I had to do the
last).
I also put 3.0.1-2 and restarted gschedule on the frontend.

Now I run "cluster-fork --mpd w".

I currently have a user who ssh'd to compute-0-8 from the frontend and one
who
ssh'd into compute-0-17 from the front end.

But the return shows the users on lines for 17 (for the user on 0-8) and 10
(for
the user on 0-17):

17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03
17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash

10:   1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07
10: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU WHAT
10: dennis   pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash

When I do "cluster-fork w" (without the --mpd) the users show up on the
correct
nodes.

Do the numbers on the left of the -mpd output correspond to the node names?
Thanks.

Dennis

Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


From tim.carlson at pnl.gov Mon Dec 8 12:35:16 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 08 Dec 2003 12:35:16 -0800 (PST)
Subject: [Rocks-Discuss]PXE and system images
In-Reply-To:
 <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com>
Message-ID: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>

On Fri, 5 Dec 2003, Reed Scarce wrote:

>   We want to initialize new hardware with a known good image from identical
>   hardware currently in use. The process imagined would be to PXE boot to a
>   disk image server, PXE would create a RAM system that would request the
>   system disk image from the server, which would push the desired system
>   disk image to the requesting system. Upon completion the system would be
>   available as a cluster member.
>
>   The lab configuration is a PC grade frontend with two 3Com 905s and a
>   single server grade cluster node with integrated Intel 82551 (10/100)(the
>   only PXE interface) and two integrated Intel 82546 (10/100/1000). The
>   cluster node is one of the stock of nodes for the expansion. The stock of
>   nodes have a Linux OS pre-installed, which would be eliminated in the
>   process.
>
>   Currently the node will PXE boot from the 10/100 and pickup an
>   installation boot from one of the g-bit interfaces. From there kickstart
>   wants to take over.
>
>   Any recommendations how to get kickstart to push an image to the disk?

This sounds like you want to use Oscar instead of ROCKS.

http://oscar.openclustergroup.org/tiki-index.php

I'm not exactly sure why you think that the kickstart process won't give
you exactly the same image on ever machine. If the hardware is the same,
you'll get the same image on each machine.

We have boxes with the same setup, 10/100 PXE, and then dual gigabit. Our
method for installing ROCKS on this type of hardware is the following

1) Run insert-ethers and choose "manager" type of node.
2) Connect all the PXE interfaces to the switch and boot them all. Do not
   connect the gigabit interface
3) Once all of the nodes have PXE booted, exit insert-ethers. Start
   insert-ethers again and this time choose compute node
4) Hook up the gigabit interface and the PXE interface to your nodes. All
of your machines will now install.
5) In our case, we now quickly disconnect the PXE interface because we
   don't want to have the machine continually install. The real ROCKS
   method would have you choose (HD/net) for booting in the BIOS, but if you
already
   have an OS on your machine, you would have to go into the BIOS twice
   before the compute nodes were installed. We disable rocks-grub and just
   connect up the PXE cable if we need to reinstall.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From tim.carlson at pnl.gov Mon Dec 8 12:42:23 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Mon, 08 Dec 2003 12:42:23 -0800 (PST)
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
Message-ID: <Pine.LNX.4.44.0312081238270.19031-100000@scorpion.emsl.pnl.gov>

On Mon, 8 Dec 2003 daniel.kidger at quadrics.com wrote:

I've gotten confused from time to time as to where to place custom RPMS
(it's changed between releases), so my not-so-clean method is to just rip
out the kernels in /home/install/rocks-dist/7.3/en/os/i386/Redhat/RPMS
and drop my own in. Then do a

cd /home/install
rocks-dist dist
shoot-node

You are probably running into an issue where the "force" directory is more
of an "in addition to" directory and your 2.4.18 kernel is being noted,
but ignored since the 2.4.20 kernel is newer. I assume you nodes get both
and SMP and UP version of 2.4.20 and that your custom 2.4.18 is nowhere to
be found on the compute node.

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support

>       Previously I have been installing a custom kernel on the compute nodes
>   with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf).
>
>   However I am now trying to do it the 'proper' way. So I do (on :
>   # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
>     /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
>   # cd /home/install
>   # rocks-dist dist
>   # SSH_NO_PASSWD=1 shoot-node compute-0-0
>
>   Hence:
>   # find /home/install/ |xargs -l grep -nH qsnet
> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate
my rpm in that directory rocks-dist notices this and warns me.)
>
> However the node always ends up with "2.4.20-20.7smp" again.
> anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing
kernel-smp-2.4.20-20.7."
>
> So my question is:
>    It looks like my RPM has a name that Rocks doesn't understand properly.
>    What is wrong with my name ?
>    and what are the rules for getting the correct name ?
>      (.i686.rpm is of course correct, but I don't have -smp. in the name Is this
the problem ?)
>
> cf. Greg Bruno's wisdom:
>   https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html
>
>
> Yours,
> Daniel.



From fds at sdsc.edu Mon Dec 8 12:51:12 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 8 Dec 2003 12:51:12 -0800
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu>
Message-ID: <423D0494-29C0-11D8-804D-000393A4725A@sdsc.edu>

You are right, and I think this is a shortcoming of MPD. There is no
obvious way to force the MPD numbering to correspond to the order the
nodes were called out on the command line (cluster-fork --mpd actually
makes a shell call to mpirun and it calls out all the node names
explicitly). MPD seems to number the output differently, as you found
out.

So mpd for now may be more useful for jobs that are not sensitive to
this. If enough of you find this shortcoming to be a real annoyance, we
could work on putting the node name label on the output by explicitly
calling "hostname" or similar.

Good ideas are welcome :)
-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

>   Maybe this is a better description of the "strangeness".
>
>   I did "cluster-fork --mpd hostname":
>
>   1:   compute-0-0.local
>   2:   compute-0-1.local
>   3:   compute-0-3.local
>   4:   compute-0-13.local
>   5:   compute-0-11.local
>   6:   compute-0-15.local
>   7:   compute-0-16.local
>   8: compute-0-19.local
>   9: compute-0-21.local
>   10: compute-0-17.local
>   11: compute-0-5.local
>   12: compute-0-20.local
>   13: compute-0-18.local
>   14: compute-0-12.local
>   15: compute-0-9.local
>   16: compute-0-4.local
>   17: compute-0-8.local
>   18: compute-0-14.local
>   19: compute-0-2.local
>   20: compute-0-6.local
>   0: compute-0-7.local
>   21: compute-0-10.local
>
>   Dennis J. Gurgul
>   Partners Health Care System
>   Research Management
>   Research Computing Core
>   617.724.3169
>
>
>   -----Original Message-----
>   From: npaci-rocks-discussion-admin at sdsc.edu
>   [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>   Dennis J.
>   Sent: Monday, December 08, 2003 2:09 PM
>   To: npaci-rocks-discussion at sdsc.edu
>   Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
>   I just did "cluster-fork -Uvh
>   /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
>   and
>   then "cluster-fork service gschedule restart" (not sure I had to do the
>   last).
>   I also put 3.0.1-2 and restarted gschedule on the frontend.
>
>   Now I run "cluster-fork --mpd w".
>
>   I currently have a user who ssh'd to compute-0-8 from the frontend and
>   one
>   who
>   ssh'd into compute-0-17 from the front end.
>
>   But the return shows the users on lines for 17 (for the user on 0-8)
>   and 10
>   (for
>   the user on 0-17):
>
>   17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
>   0.03
>   17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU
>   WHAT
>   17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
>   -bash
>
>   10:   1:58pm   up 24 days,   3:21,   1 user,   load average: 0.02, 0.04,
> 0.07
> 10: USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU
> WHAT
> 10: dennis   pts/0    rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
> -bash
>
> When I do "cluster-fork w" (without the --mpd) the users show up on the
> correct
> nodes.
>
> Do the numbers on the left of the -mpd output correspond to the node
> names?
>
> Thanks.
>
> Dennis
>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From DGURGUL at PARTNERS.ORG Mon Dec 8 12:55:13 2003
From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.)
Date: Mon, 8 Dec 2003 15:55:13 -0500
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>

Thanks.

On a related note, when I did "cluster-fork service gschedule restart" gschedule
started with the "OK" output, but then the fork process hung on each node and I
had to ^c out for it to go on to the next node.

I tried to ssh to a node and then did the gschedule restart. Even then, after I
tried to "exit" out of the node, the session hung and I had to log back in and
kill it from the frontend.


Dennis J. Gurgul
Partners Health Care System
Research Management
Research Computing Core
617.724.3169


-----Original Message-----
From: Federico Sacerdoti [mailto:fds at sdsc.edu]
Sent: Monday, December 08, 2003 3:51 PM
To: Gurgul, Dennis J.
Cc: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
You are right, and I think this is a shortcoming of MPD. There is no
obvious way to force the MPD numbering to correspond to the order the
nodes were called out on the command line (cluster-fork --mpd actually
makes a shell call to mpirun and it calls out all the node names
explicitly). MPD seems to number the output differently, as you found
out.

So mpd for now may be more useful for jobs that are not sensitive to
this. If enough of you find this shortcoming to be a real annoyance, we
could work on putting the node name label on the output by explicitly
calling "hostname" or similar.

Good ideas are welcome :)
-Federico

On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:

>   Maybe this is a better description of the "strangeness".
>
>   I did "cluster-fork --mpd hostname":
>
>   1: compute-0-0.local
>   2: compute-0-1.local
>   3: compute-0-3.local
>   4: compute-0-13.local
>   5: compute-0-11.local
>   6: compute-0-15.local
>   7: compute-0-16.local
>   8: compute-0-19.local
>   9: compute-0-21.local
>   10: compute-0-17.local
>   11: compute-0-5.local
>   12: compute-0-20.local
>   13: compute-0-18.local
>   14: compute-0-12.local
>   15: compute-0-9.local
>   16: compute-0-4.local
>   17: compute-0-8.local
>   18: compute-0-14.local
>   19: compute-0-2.local
>   20: compute-0-6.local
>   0: compute-0-7.local
>   21: compute-0-10.local
>
>   Dennis J. Gurgul
>   Partners Health Care System
>   Research Management
>   Research Computing Core
>   617.724.3169
>
>
>   -----Original Message-----
>   From: npaci-rocks-discussion-admin at sdsc.edu
>   [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>   Dennis J.
>   Sent: Monday, December 08, 2003 2:09 PM
>   To: npaci-rocks-discussion at sdsc.edu
> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> I just did "cluster-fork -Uvh
> /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
> and
> then "cluster-fork service gschedule restart" (not sure I had to do the
> last).
> I also put 3.0.1-2 and restarted gschedule on the frontend.
>
> Now I run "cluster-fork --mpd w".
>
> I currently have a user who ssh'd to compute-0-8 from the frontend and
> one
> who
> ssh'd into compute-0-17 from the front end.
>
> But the return shows the users on lines for 17 (for the user on 0-8)
> and 10
> (for
> the user on 0-17):
>
> 17:    1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
> 0.03
> 17: USER      TTY     FROM              LOGIN@   IDLE   JCPU   PCPU
> WHAT
> 17: lance     pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
> -bash
>
> 10:    1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
> 0.07
> 10: USER      TTY     FROM              LOGIN@   IDLE   JCPU   PCPU
> WHAT
> 10: dennis    pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
> -bash
>
> When I do "cluster-fork w" (without the --mpd) the users show up on the
> correct
> nodes.
>
> Do the numbers on the left of the -mpd output correspond to the node
> names?
>
> Thanks.
>
> Dennis
>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA


From mjk at sdsc.edu   Mon Dec   8 12:58:22 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 8 Dec 2003 12:58:22 -0800
Subject: [Rocks-Discuss]PXE and system images
In-Reply-To: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov>
Message-ID: <4261C250-29C1-11D8-AECB-000A95DA5638@sdsc.edu>

On Dec 8, 2003, at 12:35 PM, Tim Carlson wrote:

> 5) In our case, we now quickly disconnect the PXE interface because we
>    don't want to have the machine continually install. The real ROCKS
>    method would have you choose (HD/net) for booting in the BIOS, but
> if you already
>    have an OS on your machine, you would have to go into the BIOS twice
>    before the compute nodes were installed. We disable rocks-grub and
> just
>    connect up the PXE cable if we need to reinstall.
>

For most boxes we've seen that support PXE there is an option to hit
<F12> to force a network PXE boot, this allows you to force a PXE even
when a valid OS/Boot block exists on your hard disk. If you don't have
this you do indeed need to go into BIOS twice -- a pain.


       -mjk



From fds at sdsc.edu Mon Dec 8 13:26:46 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 8 Dec 2003 13:26:46 -0800
Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>
References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu>
Message-ID: <39CC5B05-29C5-11D8-804D-000393A4725A@sdsc.edu>

I've seen this before as well. I believe it has something to do with
the way the color "[ OK ]" characters are interacting with the ssh
session from the normal cluster-fork. We have yet to characterize this
bug adequately.

-Federico

On Dec 8, 2003, at 12:55 PM, Gurgul, Dennis J. wrote:

>   Thanks.
>
>   On a related note, when I did "cluster-fork service gschedule restart"
>   gschedule
>   started with the "OK" output, but then the fork process hung on each
>   node and I
>   had to ^c out for it to go on to the next node.
>
>   I tried to ssh to a node and then did the gschedule restart. Even
>   then, after I
>   tried to "exit" out of the node, the session hung and I had to log
>   back in and
>   kill it from the frontend.
>
>
> Dennis J. Gurgul
> Partners Health Care System
> Research Management
> Research Computing Core
> 617.724.3169
>
>
> -----Original Message-----
> From: Federico Sacerdoti [mailto:fds at sdsc.edu]
> Sent: Monday, December 08, 2003 3:51 PM
> To: Gurgul, Dennis J.
> Cc: npaci-rocks-discussion at sdsc.edu
> Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
>
>
> You are right, and I think this is a shortcoming of MPD. There is no
> obvious way to force the MPD numbering to correspond to the order the
> nodes were called out on the command line (cluster-fork --mpd actually
> makes a shell call to mpirun and it calls out all the node names
> explicitly). MPD seems to number the output differently, as you found
> out.
>
> So mpd for now may be more useful for jobs that are not sensitive to
> this. If enough of you find this shortcoming to be a real annoyance, we
> could work on putting the node name label on the output by explicitly
> calling "hostname" or similar.
>
> Good ideas are welcome :)
> -Federico
>
> On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote:
>
>> Maybe this is a better description of the "strangeness".
>>
>> I did "cluster-fork --mpd hostname":
>>
>> 1: compute-0-0.local
>> 2: compute-0-1.local
>> 3: compute-0-3.local
>> 4: compute-0-13.local
>> 5: compute-0-11.local
>> 6: compute-0-15.local
>> 7: compute-0-16.local
>> 8: compute-0-19.local
>> 9: compute-0-21.local
>> 10: compute-0-17.local
>> 11: compute-0-5.local
>> 12: compute-0-20.local
>> 13: compute-0-18.local
>> 14: compute-0-12.local
>> 15: compute-0-9.local
>> 16: compute-0-4.local
>> 17: compute-0-8.local
>> 18: compute-0-14.local
>> 19: compute-0-2.local
>> 20: compute-0-6.local
>> 0: compute-0-7.local
>>   21: compute-0-10.local
>>
>>   Dennis J. Gurgul
>>   Partners Health Care System
>>   Research Management
>>   Research Computing Core
>>   617.724.3169
>>
>>
>>   -----Original Message-----
>>   From: npaci-rocks-discussion-admin at sdsc.edu
>>   [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul,
>>   Dennis J.
>>   Sent: Monday, December 08, 2003 2:09 PM
>>   To: npaci-rocks-discussion at sdsc.edu
>>   Subject: [Rocks-Discuss]cluster-fork --mpd strangeness
>>
>>
>>   I just did "cluster-fork -Uvh
>>   /sourcedir/ganglia-python-3.0.1-2.i386.rpm"
>>   and
>>   then "cluster-fork service gschedule restart" (not sure I had to do
>>   the
>>   last).
>>   I also put 3.0.1-2 and restarted gschedule on the frontend.
>>
>>   Now I run "cluster-fork --mpd w".
>>
>>   I currently have a user who ssh'd to compute-0-8 from the frontend and
>>   one
>>   who
>>   ssh'd into compute-0-17 from the front end.
>>
>>   But the return shows the users on lines for 17 (for the user on 0-8)
>>   and 10
>>   (for
>>   the user on 0-17):
>>
>>   17:   1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00,
>>   0.03
>>   17: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU
>>   WHAT
>>   17: lance    pts/0   rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s
>>   -bash
>>
>>   10:   1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
>>   0.07
>>   10: USER     TTY     FROM             LOGIN@   IDLE  JCPU  PCPU
>>   WHAT
>>   10: dennis   pts/0   rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s
>>   -bash
>>
>>   When I do "cluster-fork w" (without the --mpd) the users show up on
>>   the
>>   correct
>>   nodes.
>>
>>   Do the numbers on the left of the -mpd output correspond to the node
>>   names?
>>
>> Thanks.
>>
>> Dennis
>>
>> Dennis J. Gurgul
>> Partners Health Care System
>> Research Management
>> Research Computing Core
>> 617.724.3169
>>
> Federico
>
> Rocks Cluster Group, San Diego Supercomputing Center, CA
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From bruno at rocksclusters.org Mon Dec 8 15:31:08 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 8 Dec 2003 15:31:08 -0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
In-Reply-To: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>
References: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com>
Message-ID: <9979F090-29D6-11D8-9715-000A95C4E3B4@rocksclusters.org>

> I have installed Rocks 3.0.0 with default options successful,there was
> not any trouble.But I boot it up,it stopped at beginning,just show
> "GRUB" on the screen and waiting...

when you built the frontend, did you start with the rocks base CD then
add the HPC roll?

    - gb



From bruno at rocksclusters.org Mon Dec 8 15:37:46 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 8 Dec 2003 15:37:46 -0800
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com>
Message-ID: <8700A2BE-29D7-11D8-9715-000A95C4E3B4@rocksclusters.org>

>       Previously I have been installing a custom kernel on the compute
>   nodes
>   with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix
>   grub.conf).
>
>   However I am now trying to do it the 'proper' way. So I do (on :
>   # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
>     /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
>   # cd /home/install
>   # rocks-dist dist
>   # SSH_NO_PASSWD=1 shoot-node compute-0-0
>
>   Hence:
>   # find /home/install/ |xargs -l grep -nH qsnet
>   shows me that hdlist and hdlist2 now contain this RPM. (and indeed If
>   I duplicate my rpm in that directory rocks-dist notices this and warns
>   me.)
>
>   However the node always ends up with "2.4.20-20.7smp" again.
>   anaconda-ks.cfg contains just "kernel-smp" and install.log has
>   "Installing kernel-smp-2.4.20-20.7."
>
>   So my question is:
>      It looks like my RPM has a name that Rocks doesn't understand
>   properly.
>      What is wrong with my name ?
>      and what are the rules for getting the correct name ?
>        (.i686.rpm is of course correct, but I don't have -smp. in the
>   name Is this the problem ?)

the anaconda installer looks for kernel packages with a specific format:

       kernel-<kernel ver>-<redhat ver>.i686.rpm

and for smp nodes:

       kernel-smp-<kernel ver>-<redhat ver>.i686.rpm

we have made the necessary patches to files under /usr/src/linux-2.4 in
order to produce redhat-compliant kernels. see:

http://www.rocksclusters.org/rocks-documentation/3.0.0/customization-
kernel.html

also, would you be interested in making your changes for the quadrics
interconnect available to the general rocks community?

    - gb



From purikk at hotmail.com Mon Dec 8 20:23:35 2003
From: purikk at hotmail.com (purushotham komaravolu)
Date: Mon, 8 Dec 2003 23:23:35 -0500
Subject: [Rocks-Discuss]AMD Opteron
References: <200312082001.hB8K1KJ24139@postal.sdsc.edu>
Message-ID: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com>

Hello,
            I am a newbie to ROCKS cluster. I wanted to setup clusters on
32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel and
AMD).
I found the 64-bit download for Intel on the website but not for AMD. Does
it work for AMD opteron? if not what is the ETA for AMD-64.
We are planning to but AMD-64 bit machines shortly, and I would like to
volunteer for the beta testing if needed.
Thanks
Regards,
Puru
From mjk at sdsc.edu Tue Dec 9 07:28:51 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 9 Dec 2003 07:28:51 -0800
Subject: [Rocks-Discuss]AMD Opteron
In-Reply-To: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com>
References: <200312082001.hB8K1KJ24139@postal.sdsc.edu> <BAY1-
DAV65Bp80SiEmA00005c14@hotmail.com>
Message-ID: <6413D41A-2A5C-11D8-AECB-000A95DA5638@sdsc.edu>

We have a beta right now that we have sent to a few people. We plan on
a release this month, and AMD_64 will be part of this release along
with the usual x86, IA64 support.

If you want to help accelerate this process please talk to your vendor
about loaning/giving us some hardware for testing. Having access to a
variety of Opteron hardware (we own two boxes) is the only way we can
have good support for this chip.

       -mjk


On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:

>   Hello,
>               I am a newbie to ROCKS cluster. I wanted to setup clusters
>   on
>   32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel
>   and
>   AMD).
>   I found the 64-bit download for Intel on the website but not for AMD.
>   Does
>   it work for AMD opteron? if not what is the ETA for AMD-64.
>   We are planning to but AMD-64 bit machines shortly, and I would like to
>   volunteer for the beta testing if needed.
>   Thanks
>   Regards,
>   Puru



From cdmaest at sandia.gov Tue Dec 9 07:48:31 2003
From: cdmaest at sandia.gov (Christopher D. Maestas)
Date: Tue, 09 Dec 2003 08:48:31 -0700
Subject: [Rocks-Discuss]AMD Opteron
In-Reply-To: <6413D41A-2A5C-11D8-AECB-000A95DA5638@sdsc.edu>
References: <200312082001.hB8K1KJ24139@postal.sdsc.edu>
 <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com>
 <6413D41A-2A5C-11D8-AECB-000A95DA5638@sdsc.edu>
Message-ID: <1070984911.19042.12.camel@capdesk.sandia.gov>

What do I have to do to sign up to test?    We have opteron systems we can
test on here.

On Tue, 2003-12-09 at 08:28, Mason J. Katz wrote:
> We have a beta right now that we have sent to a few people. We plan on
> a release this month, and AMD_64 will be part of this release along
> with the usual x86, IA64 support.
>
>   If you want to help accelerate this process please talk to your vendor
>   about loaning/giving us some hardware for testing. Having access to a
>   variety of Opteron hardware (we own two boxes) is the only way we can
>   have good support for this chip.
>
>        -mjk
>
>
>   On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:
>
>   >   Hello,
>   >               I am a newbie to ROCKS cluster. I wanted to setup clusters
>   >   on
>   >   32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel
>   >   and
>   >   AMD).
>   >   I found the 64-bit download for Intel on the website but not for AMD.
>   >   Does
>   >   it work for AMD opteron? if not what is the ETA for AMD-64.
>   >   We are planning to but AMD-64 bit machines shortly, and I would like to
>   >   volunteer for the beta testing if needed.
>   >   Thanks
>   >   Regards,
>   >   Puru
>




From vincent_b_fox at yahoo.com Tue Dec 9 11:10:40 2003
From: vincent_b_fox at yahoo.com (Vincent Fox)
Date: Tue, 9 Dec 2003 11:10:40 -0800 (PST)
Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform
Message-ID: <20031209191040.71171.qmail@web14811.mail.yahoo.com>

I tried doing a rebuild of the ATLAS libraries on a
PII test cluster and no go. Did an export
PATH=/opt/gcc32/bin:$PATH first to make it easy on
myself.

The "make rpm" appears to get stuck in a loop on the
xconfig part. I pause it and it seems like the prompt
is defining f77=-O and f77 FLAGS=y which doesn't work
of course. My guess is the spec file doesn't have an
answer for a previous question, so the /usr/bin/g77
answer is getting set for the previous prompt, and
since no f77 is defined, it gets stuck.

Anyhow thought I would note this problem on the list
for those more qualified to address it.


__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/


From bryan at UCLAlumni.net      Tue Dec   9 12:14:16 2003
From: bryan at UCLAlumni.net (Bryan Littlefield)
Date: Tue, 09 Dec 2003 12:14:16 -0800
Subject: [Rocks-Discuss]Rocks-Discuss] AMD Opteron - Contact Appro
In-Reply-To: <200312091531.hB9FV9J12694@postal.sdsc.edu>
References: <200312091531.hB9FV9J12694@postal.sdsc.edu>
Message-ID: <3FD62D18.7010208@UCLAlumni.net>

Hi Mason,

I suggest contacting Appro. We are using Rocks on our Opteron cluster
and Appro would likely love to help. I will contact them as well to see
if they could help getting a opteron machine for testing. Contact info
below:

Thanks --Bryan

Jian Chang - Regional Sales Manager
(408) 941-8100 x 202
(800) 927-5464 x 202
(408) 941-8111 Fax
jian at appro.com
http://www.appro.com

npaci-rocks-discussion-request at sdsc.edu wrote:

>From: "Mason J. Katz" <mjk at sdsc.edu>
>Subject: Re: [Rocks-Discuss]AMD Opteron
>Date: Tue, 9 Dec 2003 07:28:51 -0800
>To: "purushotham komaravolu" <purikk at hotmail.com>
>
>We have a beta right now that we have sent to a few people. We plan on
>a release this month, and AMD_64 will be part of this release along
>with the usual x86, IA64 support.
>
>If you want to help accelerate this process please talk to your vendor
>about loaning/giving us some hardware for testing. Having access to a
>variety of Opteron hardware (we own two boxes) is the only way we can
>have good support for this chip.
>
>     -mjk
>
>
>On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:
>
>
>
> Cc: <npaci-rocks-discussion at sdsc.edu>
>
>>Hello,
>>            I am a newbie to ROCKS cluster. I wanted to setup clusters
>>on
>>32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel
>>and
>>AMD).
>>I found the 64-bit download for Intel on the website but not for AMD.
>>Does
>>it work for AMD opteron? if not what is the ETA for AMD-64.
>>We are planning to but AMD-64 bit machines shortly, and I would like to
>>volunteer for the beta testing if needed.
>>Thanks
>>Regards,
>>Puru
>>
>>
>
>_______________________________________________
>npaci-rocks-discussion mailing list
>npaci-rocks-discussion at sdsc.edu
>http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>
>
>End of npaci-rocks-discussion Digest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031209/611e65b4/attachment-0001.html

From vincent_b_fox at yahoo.com Tue Dec 9 13:22:59 2003
From: vincent_b_fox at yahoo.com (Vincent Fox)
Date: Tue, 9 Dec 2003 13:22:59 -0800 (PST)
Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform
Message-ID: <20031209212259.39587.qmail@web14810.mail.yahoo.com>

Okay, came up my own quick hack:

Edit atlas.spec.in, go to "other x86" section, remove
2 lines right above "linux", seems to make rpm now.

A more formal patch would be put in a section for
cpuid eq 4 with this correction I suppose.


__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/


From landman at scalableinformatics.com Tue Dec 9 13:49:06 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 09 Dec 2003 16:49:06 -0500
Subject: [Rocks-Discuss]Has anyone tried Gaussian binary only on the ROCKS 3.1.0
beta?
Message-ID: <1071006546.18100.46.camel@squash.scalableinformatics.com>

Hi Folks

  Working on building the same cluster from last week.   The admin nodes
are up and functional (plain old RH9+XFS).

  I want to get the head nodes up, with one of the requirements being
running the Gaussian binary-only code. Gaussian's page lists RH9.0
support, so I wanted to see if someone has tried the beta with this
code.

  Thanks.
Joe

--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615



From landman at scalableinformatics.com Tue Dec 9 13:59:37 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 09 Dec 2003 16:59:37 -0500
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...
Message-ID: <1071007177.18100.58.camel@squash.scalableinformatics.com>

Folks:

  As indicated previously, I am wrestling with a Supermicro based
cluster. None of the RH distributions come with the correct E1000
driver, so a new kernel is needed (in the boot CD, and for
installation).

  The problem I am running into is that it isn't at all obvious/easy how
to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
this thing to work. Following the examples in the documentation have
not met with success. Running "rocks-dist cdrom" with the new kernels
(2.4.23 works nicely on the nodes) in the force/RPMS directory generates
a bootable CD with the original 2.4.18BOOT kernel.

  What I (and I think others) need, is a simple/easy to follow method
that will generate a bootable CD with the correct linux kernel, and the
correct modules.

  Is this in process somewhere? What would be tremendously helpful is
if we can generate a binary module, and put that into the boot process
by placing it into the force/modules/binary directory (assuming one
exists) with the appropriate entry of a similar name in the
force/modules/meta directory as a simple XML document giving pci-ids,
description, name, etc.

  Anything close to this coming? Modules are killing future ROCKS
installs, the inability to easily inject a new module in there has
created a problem whereby ROCKS does not function (as the underlying RH
does not function).



--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615
From tim.carlson at pnl.gov Tue Dec 9 14:11:43 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Tue, 09 Dec 2003 14:11:43 -0800 (PST)
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...
In-Reply-To: <1071007177.18100.58.camel@squash.scalableinformatics.com>
Message-ID: <Pine.GSO.4.44.0312091406080.17458-100000@paradox.emsl.pnl.gov>

On Tue, 9 Dec 2003, Joe Landman wrote:

>     The problem I am running into is that it isn't at all obvious/easy how
>   to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
>   this thing to work. Following the examples in the documentation have
>   not met with success. Running "rocks-dist cdrom" with the new kernels
>   (2.4.23 works nicely on the nodes) in the force/RPMS directory generates
>   a bootable CD with the original 2.4.18BOOT kernel.

So you built a 2.4.23BOOT rpm? The problem people have is with the naming
convention of kernels. A kernel.org spec file isn't going to generate
proper kernel rpms IMHO. What you really want to do (and maybe you are
already doing this) is steal the bit of the Redhat spec building scripts
that generage the -smp .i686 and BOOT rpms.

New hardware is tough for any distro.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From tmartin at physics.ucsd.edu Tue Dec 9 15:57:17 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Tue, 09 Dec 2003 15:57:17 -0800
Subject: [Rocks-Discuss]Intel MT based Gigabit controllers
Message-ID: <3FD6615D.8090200@physics.ucsd.edu>

Does Rocks 3.0 support the Intel MT based Gigabit controllers (PCI
8086:1013) without any modifications? My new cluster has these new
controllers.

Rocks 2.3.1 does not seem detect/drive these cards correctly (install
failes to detect and the e1000 driver does not seem to work). So I was
going to go ahead and move my new head node to 3.0.0 and was wondering
if I am going to have to do additional work to get the intel drivers on
the boot image (for cluster nodes) to have the working Intel driver with
these cards.

Terrence



From tmartin at physics.ucsd.edu Tue Dec 9 15:59:29 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Tue, 09 Dec 2003 15:59:29 -0800
Subject: [Rocks-Discuss]how to include custom driver
In-Reply-To: <Pine.GSO.4.44.0306092142150.18083-100000@poincare.emsl.pnl.gov>
References: <Pine.GSO.4.44.0306092142150.18083-100000@poincare.emsl.pnl.gov>
Message-ID: <3FD661E1.90307@physics.ucsd.edu>

Tim Carlson wrote:
> On Mon, 9 Jun 2003, Greg Bruno wrote:
>
>
>>what driver did you have to add?
>>
>>we may be able to provide a patch for your compute nodes.
>
>
> Ah!!!.. I didn't see this repsonse before I sent off my reply to Matthew.
> Can I please have the aic79xx driver and while your at it can I get a
> module-info file that has this entry for gigabit? Not sure if it is
> already in there? ;)
>
> 0x8086 0x100f "e1000" "Intel Corp. 82545EM Gigabit Ethernet Controller rev
(01)"
>
> It is also quite possible that I burned the 2.3.0 media instead of
> 2.3.2. It was late in the day when I tried to do my install.
>
> Tim
>
> Tim Carlson
> Voice: (509) 376 3423
> Email: Tim.Carlson at pnl.gov
> EMSL UNIX System Support

I would also like to request that this driver/change be made. I have a
cluster with these newer Intel gigabit chipsets.

Terrence



From tmartin at physics.ucsd.edu Tue Dec 9 16:33:18 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Tue, 09 Dec 2003 16:33:18 -0800
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets
 ...
In-Reply-To: <Pine.GSO.4.44.0312091406080.17458-100000@paradox.emsl.pnl.gov>
References: <Pine.GSO.4.44.0312091406080.17458-100000@paradox.emsl.pnl.gov>
Message-ID: <3FD669CE.1070700@physics.ucsd.edu>

Tim Carlson wrote:
> On Tue, 9 Dec 2003, Joe Landman wrote:
>
>
>> The problem I am running into is that it isn't at all obvious/easy how
>>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
>>this thing to work. Following the examples in the documentation have
>>not met with success. Running "rocks-dist cdrom" with the new kernels
>>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates
>>a bootable CD with the original 2.4.18BOOT kernel.
>
>
> So you built a 2.4.23BOOT rpm? The problem people have is with the naming
>   convention of   kernels. A kernel.org spec file isn't going to generate
>   proper kernel   rpms IMHO. What you really want to do (and maybe you are
>   already doing   this) is steal the bit of the Redhat spec building scripts
>   that generage   the -smp .i686 and BOOT rpms.
>
>   New hardware is tough for any distro.
>
>   Tim
>
>   Tim Carlson
>   Voice: (509) 376 3423
>   Email: Tim.Carlson at pnl.gov
>   EMSL UNIX System Support
>

Where do you start if you want to update the PXE boot image to support a
new kernel?

Terrence



From tmartin at physics.ucsd.edu Tue Dec 9 16:58:08 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Tue, 09 Dec 2003 16:58:08 -0800
Subject: [Rocks-Discuss]Could not allocate requested partitions
Message-ID: <3FD66FA0.5070401@physics.ucsd.edu>

I am getting the following error when trying to install a Rocks 3.0.0
headnode. The headnode works find in rocks 2.3.2.

Could not allocate requested partitions: Partitioning failed: Could not
allocate partitions as primary partitions

What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it
cannot find that device (unable to open /dev/hda). However when I watch
the boot messages hda definitely comes up. Also the headnode works fine
with 2.3.2.

Any ideas?

Terrence




From tmartin at physics.ucsd.edu Tue Dec 9 17:33:24 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Tue, 09 Dec 2003 17:33:24 -0800
Subject: [Rocks-Discuss]Could not allocate requested partitions
In-Reply-To: <3FD66FA0.5070401@physics.ucsd.edu>
References: <3FD66FA0.5070401@physics.ucsd.edu>
Message-ID: <3FD677E4.8050806@physics.ucsd.edu>

Terrence Martin wrote:
> I am getting the following error when trying to install a Rocks 3.0.0
> headnode. The headnode works find in rocks 2.3.2.
>
>   Could not allocate requested partitions: Partitioning failed: Could not
>   allocate partitions as primary partitions
>
>   What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it
>   cannot find that device (unable to open /dev/hda). However when I watch
>   the boot messages hda definitely comes up. Also the headnode works fine
>   with 2.3.2.
>
>   Any ideas?
>
>   Terrence
>
>
>

Figured it out, aparently rocks 3.0.0 did not like my partitions from
rocks 2.3.2. I booted knoppix, blew away the partition table and so far
so good on the head node.

Terrence




From mjk at sdsc.edu Tue Dec 9 17:54:01 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 9 Dec 2003 17:54:01 -0800
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...
In-Reply-To: <1071007177.18100.58.camel@squash.scalableinformatics.com>
References: <1071007177.18100.58.camel@squash.scalableinformatics.com>
Message-ID: <BA0ADEC6-2AB3-11D8-981C-000A95DA5638@sdsc.edu>

If the underlying RedHat doesn't support your hardware you are pretty
much dead in the water. We do at times include drivers that RH does
not but this is an exception and only for hardware we physically have
access to. The rocks-boot (rocks/src/rock/boot in CVS) package
controls the boot kernel and module selection. You can look into this
to see what it would take to add your own module. We do plan on
refining and documenting this not for several months. We also have
some very good idea on how we can track this faster than RH, but again
nothing coming in the next few months.

To continue my earlier rant for today, until more hardware vendors
start taking the linux market place seriously buying bleeding edge
hardware and CPUs is asking for problems. It takes several months for
any new hardware to become supported by RedHat and several years for
any new CPU to be supported well. This isn't killing future Rocks
installs, it's just correctly delaying them until the underlying OS
supports the hardware.

       -mjk

On Dec 9, 2003, at 1:59 PM, Joe Landman wrote:

> Folks:
>
>   As indicated previously, I am wrestling with a Supermicro based
> cluster. None of the RH distributions come with the correct E1000
> driver, so a new kernel is needed (in the boot CD, and for
>   installation).
>
>     The problem I am running into is that it isn't at all obvious/easy
>   how
>   to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
>   this thing to work. Following the examples in the documentation have
>   not met with success. Running "rocks-dist cdrom" with the new kernels
>   (2.4.23 works nicely on the nodes) in the force/RPMS directory
>   generates
>   a bootable CD with the original 2.4.18BOOT kernel.
>
>     What I (and I think others) need, is a simple/easy to follow method
>   that will generate a bootable CD with the correct linux kernel, and the
>   correct modules.
>
>     Is this in process somewhere? What would be tremendously helpful is
>   if we can generate a binary module, and put that into the boot process
>   by placing it into the force/modules/binary directory (assuming one
>   exists) with the appropriate entry of a similar name in the
>   force/modules/meta directory as a simple XML document giving pci-ids,
>   description, name, etc.
>
>     Anything close to this coming? Modules are killing future ROCKS
>   installs, the inability to easily inject a new module in there has
>   created a problem whereby ROCKS does not function (as the underlying RH
>   does not function).
>
>
>
>   --
>   Joseph Landman, Ph.D
>   Scalable Informatics LLC,
>   email: landman at scalableinformatics.com
>   web : http://scalableinformatics.com
>   phone: +1 734 612 4615



From gotero at linuxprophet.com Tue Dec 9 18:02:23 2003
From: gotero at linuxprophet.com (gotero at linuxprophet.com)
Date: Tue, 09 Dec 2003 18:02:23 -0800 (PST)
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
Message-ID:
<20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>

Daniel-

I recently had the same problem when building a quadrics cluster on Rocks 2.3.2
with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The problem is
definitely in the naming of the rpms, in that anaconda running on the compute
nodes is not going to recognize kernel rpms that begin with 'qsnet' as potential
boot options. Unfortunately, being under a severe time contraint, I resorted to
manually installing the qsnet kernel on all nodes of the cluster, which isn't
the Rocks way. The long term solution is to mangle the kernel makefiles so that
the qsnet kernel rpms have conventional kernel rpm names, which is what Greg's
post referred to.

Glen
On Mon, 8 Dec 2003 17:54:53 -0000, daniel.kidger at quadrics.com wrote:

>
> Dear all,
>      Previously I have been installing a custom kernel on the compute nodes
> with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix
grub.conf).
>
> However I am now trying to do it the 'proper' way. So I do (on :
> # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm 
>    /home/install/rocks-dist/7.3/en/os/i386/force/RPMS
> # cd /home/install
> # rocks-dist dist
> # SSH_NO_PASSWD=1 shoot-node compute-0-0
>
> Hence:
> # find /home/install/ |xargs -l grep -nH qsnet
> shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I
> duplicate my rpm in that directory rocks-dist notices this and warns me.)
>
> However the node always ends up with "2.4.20-20.7smp" again.
> anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing
> kernel-smp-2.4.20-20.7."
>
> So my question is:
>     It looks like my RPM has a name that Rocks doesn't understand properly.
>     What is wrong with my name ?
>     and what are the rules for getting the correct name ?
>       (.i686.rpm is of course correct, but I don't have -smp. in the name Is
this
> the problem ?)
>
> cf. Greg Bruno's wisdom:
>
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html
>
>
> Yours,
> Daniel.
>
> --------------------------------------------------------------
> Dr. Dan Kidger, Quadrics Ltd.       daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK          0117 915 5505
> ----------------------- www.quadrics.com --------------------
>
> >

Glen Otero, Ph.D.
Linux Prophet


From gotero at linuxprophet.com Tue Dec 9 18:05:04 2003
From: gotero at linuxprophet.com (gotero at linuxprophet.com)
Date: Tue, 09 Dec 2003 18:05:04 -0800 (PST)
Subject: [Rocks-Discuss]Could not allocate requested partitions
Message-ID:
<20031209180504.716.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>
On Tue, 09 Dec 2003 17:33:24 -0800, Terrence Martin wrote:

>
>   Terrence Martin wrote:
>   > I am getting the following error when trying to install a Rocks 3.0.0
>   > headnode. The headnode works find in rocks 2.3.2.
>   >
>   > Could not allocate requested partitions: Partitioning failed: Could not
>   > allocate partitions as primary partitions
>   >
>   > What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it
>   > cannot find that device (unable to open /dev/hda). However when I watch
>   > the boot messages hda definitely comes up. Also the headnode works fine
>   > with 2.3.2.
>   >
>   > Any ideas?
>   >
>   > Terrence
>   >
>   >
>   >
>
>   Figured it out, aparently rocks 3.0.0 did not like my partitions from
>   rocks 2.3.2. I booted knoppix, blew away the partition table and so far
>   so good on the head node.

I had the same problem with moving from 2.3.2 to 3.1. I'll try your solution.

Glen

>
> Terrence

Glen Otero, Ph.D.
Linux Prophet


From jorge at phys.ufl.edu Tue Dec 9 18:55:02 2003
From: jorge at phys.ufl.edu (Jorge L. Rodriguez)
Date: Tue, 09 Dec 2003 21:55:02 -0500
Subject: [Rocks-Discuss]Adding partitions that are not reformatted under hard boots
or shoot-node
Message-ID: <3FD68B06.9010709@phys.ufl.edu>

Hi,

How do I add an extra partition to my compute nodes and retain the data
on all non / partitions when system hard boots or is shot?
I tried the suggestion in the documentation under "Customizing your
ROCKS Installation" where you replace the auto-partition.xml but hard
boots or shoot-nodes on these reformat all partitions instead of just
the /. I have also tried to modify the installclass.xml so that an
extra partition is added into the python code see below. This does
mostly what I want but now I can't shoot-node even though a hard boot
reinstalls without reformatting all but /. Is this the right approach?
I'd rather avoid having to replace installclass since I don't really
want to partition all nodes this way but if I must I will.

Jorge
#
                         # set up the root partition
                         #
                         args = [ "/" , "--size" , "4096",
                                 "--fstype", "&fstype;",
                                 "--ondisk", devnames[0] ]
                         KickstartBase.definePartition(self, id, args)

# ---- Jorge, I added this args
                       args = [ "/state/partition1" , "--size" , "55000",
                                "--fstype", "&fstype;",
                                "--ondisk", devnames[0] ]
                       KickstartBase.definePartition(self, id, args)
# -----
                       args = [ "swap" , "--size" , "1000",
                                "--ondisk", devnames[0] ]
                       KickstartBase.definePartition(self, id, args)

                       #
                       # greedy partitioning
                       #
# ----- Jorge, I change this from i = 1
                       i = 2
# -----
                       for devname in devnames:
                               partname = "/state/partition%d" % (i)
                               args = [ partname, "--size", "1",
                                        "--fstype", "&fstype;",
                                        "--grow", "--ondisk", devname ]
                               KickstartBase.definePartition(self, id,
args)

                                 i = i + 1




From bruno at rocksclusters.org Tue Dec 9 22:43:04 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 9 Dec 2003 22:43:04 -0800
Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform
In-Reply-To: <20031209212259.39587.qmail@web14810.mail.yahoo.com>
References: <20031209212259.39587.qmail@web14810.mail.yahoo.com>
Message-ID: <1B097BEE-2ADC-11D8-9715-000A95C4E3B4@rocksclusters.org>

>   Okay, came up my own quick hack:
>
>   Edit atlas.spec.in, go to "other x86" section, remove
>   2 lines right above "linux", seems to make rpm now.
>
>   A more formal patch would be put in a section for
>   cpuid eq 4 with this correction I suppose.

if you provide the patch, we'll include it in our next release.

    - gb
From tlw at cs.unm.edu Tue Dec 9 23:23:43 2003
From: tlw at cs.unm.edu (Tiffani Williams)
Date: Wed, 10 Dec 2003 00:23:43 -0700
Subject: [Rocks-Discuss]PBS errors
Message-ID: <3FD6C9FF.60603@cs.unm.edu>

Hello,

I am trying to submit a job through PBS, but I receive 2 errors.    The
first error is
       Job cannot be executed
       See job standard error file

The second error is that the standard error file cannot be written into
my home directory.

I downloaded the sample script at

http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-jobs.html
and have tried a more simple script with PBS directives and echo commands.

I do not know what I am doing wrong?    I have used PBS successfully on
other clusters.

Does anyone have any suggestions?

Tiffani




From bruno at rocksclusters.org Tue Dec 9 23:35:59 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 9 Dec 2003 23:35:59 -0800
Subject: [Rocks-Discuss]PBS errors
In-Reply-To: <3FD6C9FF.60603@cs.unm.edu>
References: <3FD6C9FF.60603@cs.unm.edu>
Message-ID: <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org>

>   I am trying to submit a job through PBS, but I receive 2 errors.   The
>   first error is
>         Job cannot be executed
>         See job standard error file
>
>   The second error is that the standard error file cannot be written
>   into my home directory.
>   I downloaded the sample script at
>
>   http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-
>   jobs.html
>   and have tried a more simple script with PBS directives and echo
>   commands.
>
> I do not know what I am doing wrong?   I have used PBS successfully on
> other clusters.
>
> Does anyone have any suggestions?

can you login to the compute nodes successfully?

if not, try restarting autofs on all the compute nodes. on the
frontend, execute:

     # ssh-agent $SHELL
     # ssh-add

     # cluster-fork "/etc/rc.d/init.d/autofs restart"

we've found the startup of autofs to be flaky at times.

 - gb



From tlw at cs.unm.edu Wed Dec 10 00:03:13 2003
From: tlw at cs.unm.edu (Tiffani Williams)
Date: Wed, 10 Dec 2003 01:03:13 -0700
Subject: [Rocks-Discuss]PBS errors
In-Reply-To: <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org>
References: <3FD6C9FF.60603@cs.unm.edu>
<7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org>
Message-ID: <3FD6D341.5070501@cs.unm.edu>

>> I am trying to submit a job through PBS, but I receive 2 errors.
>> The first error is
>>       Job cannot be executed
>>       See job standard error file
>>
>> The second error is that the standard error file cannot be written
>> into my home directory.
>> I downloaded the sample script at
>>
>> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-
>> jobs.html
>> and have tried a more simple script with PBS directives and echo
>> commands.
>>
>> I do not know what I am doing wrong? I have used PBS successfully
>> on other clusters.
>>
>> Does anyone have any suggestions?
>
>
> can you login to the compute nodes successfully?
>
> if not, try restarting autofs on all the compute nodes. on the
> frontend, execute:
>
>     # ssh-agent $SHELL
>     # ssh-add
>
>     # cluster-fork "/etc/rc.d/init.d/autofs restart"
>
> we've found the startup of autofs to be flaky at times.
>
> - gb


Do these commands have to be run by an administrator? If so, I do not
have such privileges. I can ssh to the compute nodes, but I am denied
entry. Am I supposed to be able to login to a compute node as a user.

Tiffani



From bruno at rocksclusters.org Wed Dec 10 06:37:05 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 10 Dec 2003 06:37:05 -0800
Subject: [Rocks-Discuss]PBS errors
In-Reply-To: <3FD6D341.5070501@cs.unm.edu>
References: <3FD6C9FF.60603@cs.unm.edu>
<7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org>
<3FD6D341.5070501@cs.unm.edu>
Message-ID: <53451392-2B1E-11D8-9715-000A95C4E3B4@rocksclusters.org>

On Dec 10, 2003, at 12:03 AM, Tiffani Williams wrote:

>
>>> I am trying to submit a job through PBS, but I receive 2 errors.
>>> The first error is
>>>       Job cannot be executed
>>>       See job standard error file
>>>
>>> The second error is that the standard error file cannot be written
>>> into my home directory.
>>> I downloaded the sample script at
>>>
>>> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-
>>> jobs.html
>>> and have tried a more simple script with PBS directives and echo
>>> commands.
>>>
>>> I do not know what I am doing wrong? I have used PBS successfully
>>> on other clusters.
>>>
>>> Does anyone have any suggestions?
>>
>>
>> can you login to the compute nodes successfully?
>>
>> if not, try restarting autofs on all the compute nodes. on the
>> frontend, execute:
>>
>>     # ssh-agent $SHELL
>>     # ssh-add
>>
>>     # cluster-fork "/etc/rc.d/init.d/autofs restart"
>>
>> we've found the startup of autofs to be flaky at times.
>>
>> - gb
>
>
> Do these commands have to be run by an administrator? If so, I do not
> have such privileges. I can ssh to the compute nodes, but I am denied
> entry. Am I supposed to be able to login to a compute node as a user.

yes, you need to be 'root'.

it appears your home directory is not being mounted when you login --
have your administrator run the commands above.

  - gb



From mjk at sdsc.edu Wed Dec 10 07:20:47 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 10 Dec 2003 07:20:47 -0800
Subject: [Rocks-Discuss]PBS errors
In-Reply-To: <53451392-2B1E-11D8-9715-000A95C4E3B4@rocksclusters.org>
References: <3FD6C9FF.60603@cs.unm.edu>
<7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org>
<3FD6D341.5070501@cs.unm.edu>
<53451392-2B1E-11D8-9715-000A95C4E3B4@rocksclusters.org>
Message-ID: <6E659550-2B24-11D8-981C-000A95DA5638@sdsc.edu>

This is most likely the dreaded NIS-crash. You'll need to restart the
ypserver on the frontend and the ypbind daemon on all the nodes. We've
seen this on our clusters maybe 4 times (on production systems) in the
last several years. Others have seen this on a weekly basis. This is
why NIS is dead in Rocks 3.1 - it served us reasonably well but never
matured to a stable system.

       -mjk

On Dec 10, 2003, at 6:37 AM, Greg Bruno wrote:

>
> On   Dec 10, 2003, at 12:03 AM, Tiffani Williams wrote:
>
>>
>>>>   I am trying to submit a job through PBS, but I receive 2 errors.
>>>>   The first error is
>>>>         Job cannot be executed
>>>>         See job standard error file
>>>>
>>>>   The second error is that the standard error file cannot be written
>>>>   into my home directory.
>>>>   I downloaded the sample script at
>>>>
>>>>   http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-
>>>>   jobs.html
>>>>   and have tried a more simple script with PBS directives and echo
>>>>   commands.
>>>>
>>>>   I do not know what I am doing wrong?   I have used PBS successfully
>>>>   on other clusters.
>>>>
>>>> Does anyone have any suggestions?
>>>
>>>
>>> can you login to the compute nodes successfully?
>>>
>>> if not, try restarting autofs on all the compute nodes. on the
>>> frontend, execute:
>>>
>>>     # ssh-agent $SHELL
>>>     # ssh-add
>>>
>>>     # cluster-fork "/etc/rc.d/init.d/autofs restart"
>>>
>>> we've found the startup of autofs to be flaky at times.
>>>
>>> - gb
>>
>>
>> Do these commands have to be run by an administrator? If so, I do not
>> have such privileges. I can ssh to the compute nodes, but I am
>> denied entry. Am I supposed to be able to login to a compute node as
>> a user.
>
> yes, you need to be 'root'.
>
> it appears your home directory is not being mounted when you login --
> have your administrator run the commands above.
>
> - gb



From vincent_b_fox at yahoo.com Wed Dec 10 07:59:14 2003
From: vincent_b_fox at yahoo.com (Vincent Fox)
Date: Wed, 10 Dec 2003 07:59:14 -0800 (PST)
Subject: [Rocks-Discuss]one node short in "labels"
Message-ID: <20031210155914.55789.qmail@web14812.mail.yahoo.com>

So I go to the "labels" selection on the web page to print out the pretty labels.
What a nice idea by the way!

EXCEPT....it's one node short! I go up to 0-13 and this stops at 0-12.   Any ideas
where I should check to fix this?



---------------------------------
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031210/c5bf5e79/attachment-0001.html

From cdwan at mail.ahc.umn.edu Wed Dec 10 12:04:53 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)
Subject: [Rocks-Discuss]Non-homogenous legacy hardware
Message-ID: <Pine.GSO.4.58.0312101359380.22@lenti.med.umn.edu>
I am integrating legacy systems into a ROCKS cluster, and have hit a
snag with the auto-partition configuration: The new (old) systems have
SCSI disks, while old (new) ones contain IDE. This is a non-issue so
long as the initial install does its default partitioning. However, I
have a "replace-auto-partition.xml" file which is unworkable for the SCSI
based systems since it makes specific reference to "hda" rather than
"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with a
conditional such that "hda" or "sda" is used, based on the name of the
node (or some other criterion).

Is this possible?

Thanks, in advance. If this is out there on the mailing list archives, a
pointer would be greatly appreciated.

-Chris Dwan
 The University of Minnesota


From tmartin at physics.ucsd.edu Wed Dec 10 12:09:11 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Wed, 10 Dec 2003 12:09:11 -0800
Subject: [Rocks-Discuss]Error during Make when building a new install floppy
Message-ID: <3FD77D67.7000708@physics.ucsd.edu>

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today according to
the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'
make[2]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'
strip -o loader         anaconda-7.3/loader/loader
strip: anaconda-7.3/loader/loader: No such file or directory
make[1]: *** [loader] Error 1
make[1]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader'
make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary
module into the appropriate location in the boot image.

Would it be correct to modify the following image file with my changes
and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-
dist/7.3/en/os/i386/images/bootnet.img

Basically I am injecting an updated e1000 driver with changes to
pcitable to support the address of my gigabit cards.

Terrence
From tim.carlson at pnl.gov Wed Dec 10 12:40:41 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)
Subject: [Rocks-Discuss]Error during Make when building a new install floppy
In-Reply-To: <3FD77D67.7000708@physics.ucsd.edu>
Message-ID: <Pine.LNX.4.44.0312101235310.20272-100000@scorpion.emsl.pnl.gov>

On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy for rocks.
>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at
least it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary
> module into the appropriate location in the boot image.
>
> Would it be correct to modify the following image file with my changes
> and then write it to a floppy via dd?
>
> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-
dist/7.3/en/os/i386/images/bootnet.img
>
> Basically I am injecting an updated e1000 driver with changes to
> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you go
down that path. You also need to work on netstg1.img and you'll need to
update the drive in the kernel rpm that gets installed on the box. None of
this is trivial.

If it were me, I would go down the same path I took for updating the
AIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From tim.carlson at pnl.gov Wed Dec 10 12:52:38 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)
Subject: [Rocks-Discuss]Non-homogenous legacy hardware
In-Reply-To: <Pine.GSO.4.58.0312101359380.22@lenti.med.umn.edu>
Message-ID: <Pine.LNX.4.44.0312101249400.20272-100000@scorpion.emsl.pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>
> I am integrating legacy systems into a ROCKS cluster, and have hit a
> snag with the auto-partition configuration: The new (old) systems have
> SCSI disks, while old (new) ones contain IDE. This is a non-issue so
>   long as the initial install does its default partitioning. However, I
>   have a "replace-auto-partition.xml" file which is unworkable for the SCSI
>   based systems since it makes specific reference to "hda" rather than
>   "sda."

If you have just a single drive, then you should be able to skip the
"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an
<eval sh="bash">
</eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From agrajag at dragaera.net Wed Dec 10 10:21:07 2003
From: agrajag at dragaera.net (Jag)
Date: Wed, 10 Dec 2003 13:21:07 -0500
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
Message-ID: <1071080467.4693.6.camel@pel>

I noticed a previous post on this list
(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934.html)
indicating that Rocks distributes ssh keys for all the nodes over
ganglia. Can anyone enlighten me as to how this is done?

I looked through the ganglia docs and didn't see anything indicating how
to do this, so I'm assuming Rocks made some changes. Unfortunately the
rocks iso images don't seem to contain srpms, so I'm now coming here.
What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found?    I've done quite
a bit of searching, but haven't found them anywhere.



From mjk at sdsc.edu Wed Dec 10 14:39:15 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 10 Dec 2003 14:39:15 -0800
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
In-Reply-To: <1071080467.4693.6.camel@pel>
References: <1071080467.4693.6.camel@pel>
Message-ID: <AF006859-2B61-11D8-981C-000A95DA5638@sdsc.edu>

Most of the SRPMS are on our FTP site, but we've screwed this up
before. The SRPMS are entirely Rocks specific so they are of little
value outside of Rocks. You can also checkout our CVS tree
(cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We
have a ganglia-python package we created to allow us to write our own
metrics at a high level than the provide gmetric application. We've
also moved from this method to a single cluster-wide ssh key for Rocks
3.1.

       -mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

>   I noticed a previous post on this list
>   (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/
>   001934.html) indicating that Rocks distributes ssh keys for all the
>   nodes over
>   ganglia. Can anyone enlighten me as to how this is done?
>
>   I looked through the ganglia docs and didn't see anything indicating
>   how
>   to do this, so I'm assuming Rocks made some changes. Unfortunately the
>   rocks iso images don't seem to contain srpms, so I'm now coming here.
>   What did Rocks do to ganglia to make the distribution of ssh keys work?
>
>   Also, does anyone know where Rocks SRPMs can be found?   I've done quite
>   a bit of searching, but haven't found them anywhere.



From vrowley at ucsd.edu Wed Dec 10 14:43:49 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Wed, 10 Dec 2003 14:43:49 -0800
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build
CD distro
Message-ID: <3FD7A1A5.2030805@ucsd.edu>

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist
--dist=cdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

>   Cleaning distribution
>   Resolving versions (RPMs)
>   Resolving versions (SRPMs)
>   Adding support for rebuild distribution from source
>   Creating files (symbolic links - fast)
>   Creating symlinks to kickstart files
>   Fixing Comps Database
>   Generating hdlist (rpm database)
>   Patching second stage loader (eKV, partioning, ...)
>       patching "rocks-ekv" into distribution ...
>       patching "rocks-piece-pipe" into distribution ...
>       patching "PyXML" into distribution ...
>       patching "expat" into distribution ...
>       patching "rocks-pylib" into distribution ...
>       patching "MySQL-python" into distribution ...
>       patching "rocks-kickstart" into distribution ...
>       patching "rocks-kickstart-profiles" into distribution ...
>       patching "rocks-kickstart-dtds" into distribution ...
>       building CRAM filesystem ...
>   Cleaning distribution
>   Resolving versions (RPMs)
>   Resolving versions (SRPMs)
>   Creating symlinks to kickstart files
>   Generating hdlist (rpm database)
>   Segregating RPMs (rocks, non-rocks)
>   sh: ./kickstart.cgi: No such file or directory
>   sh: ./kickstart.cgi: No such file or directory
>   Traceback (innermost last):
>     File "/opt/rocks/bin/rocks-dist", line 807, in ?
>       app.run()
>     File "/opt/rocks/bin/rocks-dist", line 623, in run
>       eval('self.command_%s()' % (command))
>     File "<string>", line 0, in ?
>     File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>       builder.build()
>     File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>       (rocks, nonrocks) = self.segregateRPMS()
>     File "/opt/rocks/lib/python/rocks/build.py", line 1107, in segregateRPMS
>       for pkg in ks.getSection('packages'):
>   TypeError: loop over non-sequence

Any ideas?

--
Vicky Rowley                               email: vrowley at ucsd.edu
Biomedical Informatics Research Network       work: (858) 536-5980
University of California, San Diego            fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From bruno at rocksclusters.org Wed Dec 10 15:12:49 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 10 Dec 2003 15:12:49 -0800
Subject: [Rocks-Discuss]one node short in "labels"
In-Reply-To: <20031210155914.55789.qmail@web14812.mail.yahoo.com>
References: <20031210155914.55789.qmail@web14812.mail.yahoo.com>
Message-ID: <5F8539FC-2B66-11D8-9715-000A95C4E3B4@rocksclusters.org>

>   So I go to the "labels" selection on the web page to print out the
>   pretty labels. What a nice idea by the way!
>   ?
>   EXCEPT....it's one node short! I go up to 0-13 and this stops at
>   0-12.? Any ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

    - gb
From mjk at sdsc.edu Wed Dec 10 15:16:27 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 10 Dec 2003 15:16:27 -0800
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build
CD distro
In-Reply-To: <3FD7A1A5.2030805@ucsd.edu>
References: <3FD7A1A5.2030805@ucsd.edu>
Message-ID: <E17B3F9E-2B66-11D8-981C-000A95DA5638@sdsc.edu>

It looks like someone moved the profiles directory to profiles.orig.

     -mjk


[root at rocks14 install]# ls -l
total 56
drwxr-sr-x    3 root     wheel        4096 Dec   10 21:16 cdrom
drwxrwsr-x    5 root     wheel        4096 Dec   10 20:38 contrib.orig
drwxr-sr-x    3 root     wheel        4096 Dec   10 21:07
ftp.rocksclusters.org
drwxr-sr-x    3 root     wheel        4096 Dec   10 20:38
ftp.rocksclusters.org.orig
-r-xrwsr-x    1 root     wheel       19254 Sep    3   12:40   kickstart.cgi
drwxr-xr-x    3 root     root         4096 Dec   10   20:38   profiles.orig
drwxr-sr-x    3 root     wheel        4096 Dec   10   21:15   rocks-dist
drwxrwsr-x    3 root     wheel        4096 Dec   10   20:38   rocks-dist.orig
drwxr-sr-x    3 root     wheel        4096 Dec   10   21:02   src
drwxr-sr-x    4 root     wheel        4096 Dec   10   20:49   src.foo
On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:
>
> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
> rocks-dist --dist=cdrom cdrom
>
> on a server installed with ROCKS 3.0.0, I eventually get this:
>
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Adding support for rebuild distribution from source
>> Creating files (symbolic links - fast)
>> Creating symlinks to kickstart files
>> Fixing Comps Database
>> Generating hdlist (rpm database)
>> Patching second stage loader (eKV, partioning, ...)
>>     patching "rocks-ekv" into distribution ...
>>     patching "rocks-piece-pipe" into distribution ...
>>     patching "PyXML" into distribution ...
>>     patching "expat" into distribution ...
>>     patching "rocks-pylib" into distribution ...
>>     patching "MySQL-python" into distribution ...
>>     patching "rocks-kickstart" into distribution ...
>>     patching "rocks-kickstart-profiles" into distribution ...
>>     patching "rocks-kickstart-dtds" into distribution ...
>>     building CRAM filesystem ...
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Creating symlinks to kickstart files
>> Generating hdlist (rpm database)
>> Segregating RPMs (rocks, non-rocks)
>> sh: ./kickstart.cgi: No such file or directory
>> sh: ./kickstart.cgi: No such file or directory
>> Traceback (innermost last):
>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>     app.run()
>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>     eval('self.command_%s()' % (command))
>>   File "<string>", line 0, in ?
>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>     builder.build()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>     (rocks, nonrocks) = self.segregateRPMS()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>> segregateRPMS
>>     for pkg in ks.getSection('packages'):
>> TypeError: loop over non-sequence
>
> Any ideas?
>
> --
> Vicky Rowley                             email: vrowley at ucsd.edu
> Biomedical Informatics Research Network     work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at
> http://www.sagacitech.com/Chinaweb



From vrowley at ucsd.edu Wed Dec 10 16:50:16 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Wed, 10 Dec 2003 16:50:16 -0800
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying
 to build CD distro
In-Reply-To: <E17B3F9E-2B66-11D8-981C-000A95DA5638@sdsc.edu>
References: <3FD7A1A5.2030805@ucsd.edu>
<E17B3F9E-2B66-11D8-981C-000A95DA5638@sdsc.edu>
Message-ID: <3FD7BF48.9020409@ucsd.edu>

Yep, I did that, but only *AFTER* getting the error. [Thought it was
generated by the rocks-dist sequence, but apparently not.] Go ahead.
Move it back. Same difference.

Vicky

Mason J. Katz wrote:
> It looks like someone moved the profiles directory to profiles.orig.
>
>     -mjk
>
>
> [root at rocks14 install]# ls -l
> total 56
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
> drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07
> ftp.rocksclusters.org
> drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38
> ftp.rocksclusters.org.orig
> -r-xrwsr-x    1 root     wheel        19254 Sep 3 12:40 kickstart.cgi
> drwxr-xr-x    3 root     root          4096 Dec 10 20:38 profiles.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
> drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38 rocks-dist.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
> drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>
>> When I run this:
>>
>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>> rocks-dist --dist=cdrom cdrom
>>
>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Adding support for rebuild distribution from source
>>> Creating files (symbolic links - fast)
>>> Creating symlinks to kickstart files
>>> Fixing Comps Database
>>> Generating hdlist (rpm database)
>>> Patching second stage loader (eKV, partioning, ...)
>>>     patching "rocks-ekv" into distribution ...
>>>     patching "rocks-piece-pipe" into distribution ...
>>>     patching "PyXML" into distribution ...
>>>     patching "expat" into distribution ...
>>>     patching "rocks-pylib" into distribution ...
>>>     patching "MySQL-python" into distribution ...
>>>     patching "rocks-kickstart" into distribution ...
>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>     building CRAM filesystem ...
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Creating symlinks to kickstart files
>>> Generating hdlist (rpm database)
>>> Segregating RPMs (rocks, non-rocks)
>>> sh: ./kickstart.cgi: No such file or directory
>>> sh: ./kickstart.cgi: No such file or directory
>>> Traceback (innermost last):
>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>     app.run()
>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>     eval('self.command_%s()' % (command))
>>>   File "<string>", line 0, in ?
>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>     builder.build()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>     (rocks, nonrocks) = self.segregateRPMS()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>> segregateRPMS
>>>     for pkg in ks.getSection('packages'):
>>> TypeError: loop over non-sequence
>>
>>
>> Any ideas?
>>
>> --
>> Vicky Rowley                             email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network     work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
>
>
>

--
Vicky Rowley                               email: vrowley at ucsd.edu
Biomedical Informatics Research Network       work: (858) 536-5980
University of California, San Diego            fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From tim.carlson at pnl.gov Wed Dec 10 17:23:25 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to
 build CD distro
In-Reply-To: <3FD7BF48.9020409@ucsd.edu>
Message-ID: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in
/usr/bin/python while rocks-dist does an "env python"

Tim

>   Yep, I did that, but only *AFTER* getting the error. [Thought it was
>   generated by the rocks-dist sequence, but apparently not.] Go ahead.
>   Move it back. Same difference.
>
>   Vicky
>
>   Mason J. Katz wrote:
>   > It looks like someone moved the profiles directory to profiles.orig.
>   >
>   >     -mjk
>   >
>   >
>   > [root at rocks14 install]# ls -l
>   > total 56
>   > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
>   > drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
>   > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07
>   > ftp.rocksclusters.org
>   > drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38
>   > ftp.rocksclusters.org.orig
>   > -r-xrwsr-x    1 root     wheel        19254 Sep 3 12:40 kickstart.cgi
>   > drwxr-xr-x    3 root     root          4096 Dec 10 20:38 profiles.orig
>   > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
>   > drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38 rocks-dist.orig
>   > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
>   > drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
>   > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>   >
>   >> When I run this:
>   >>
>   >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>   >> rocks-dist --dist=cdrom cdrom
>   >>
>   >> on a server installed with ROCKS 3.0.0, I eventually get this:
>   >>
>   >>> Cleaning distribution
>   >>> Resolving versions (RPMs)
>   >>> Resolving versions (SRPMs)
>   >>> Adding support for rebuild distribution from source
>   >>> Creating files (symbolic links - fast)
>   >>> Creating symlinks to kickstart files
>   >>> Fixing Comps Database
>   >>> Generating hdlist (rpm database)
>   >>> Patching second stage loader (eKV, partioning, ...)
>   >>>     patching "rocks-ekv" into distribution ...
>   >>>     patching "rocks-piece-pipe" into distribution ...
>   >>>     patching "PyXML" into distribution ...
>   >>>     patching "expat" into distribution ...
>   >>>     patching "rocks-pylib" into distribution ...
>   >>>     patching "MySQL-python" into distribution ...
>   >>>     patching "rocks-kickstart" into distribution ...
>   >>>     patching "rocks-kickstart-profiles" into distribution ...
>   >>>     patching "rocks-kickstart-dtds" into distribution ...
>   >>>     building CRAM filesystem ...
>   >>> Cleaning distribution
>   >>> Resolving versions (RPMs)
>   >>> Resolving versions (SRPMs)
>   >>> Creating symlinks to kickstart files
>   >>> Generating hdlist (rpm database)
>   >>> Segregating RPMs (rocks, non-rocks)
>   >>> sh: ./kickstart.cgi: No such file or directory
>   >>> sh: ./kickstart.cgi: No such file or directory
>   >>> Traceback (innermost last):
>   >>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>   >>>     app.run()
>   >>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>   >>>     eval('self.command_%s()' % (command))
>   >>>   File "<string>", line 0, in ?
>   >>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>   >>>     builder.build()
>   >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>   >>>     (rocks, nonrocks) = self.segregateRPMS()
>   >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>   >>> segregateRPMS
>   >>>     for pkg in ks.getSection('packages'):
>   >>> TypeError: loop over non-sequence
>   >>
>   >>
>   >> Any ideas?
>   >>
>   >> --
>   >> Vicky Rowley                             email: vrowley at ucsd.edu
>   >> Biomedical Informatics Research Network     work: (858) 536-5980
>   >> University of California, San Diego           fax: (858) 822-0828
>   >> 9500 Gilman Drive
>   >> La Jolla, CA 92093-0715
>   >>
>   >>
>   >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
>   >
>   >
>   >
>
>   --
>   Vicky Rowley                              email: vrowley at ucsd.edu
>   Biomedical Informatics Research Network      work: (858) 536-5980
>   University of California, San Diego           fax: (858) 822-0828
>   9500 Gilman Drive
>   La Jolla, CA 92093-0715
>
>
>   See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
>
>




From naihh at imcb.a-star.edu.sg Wed Dec 10 17:45:18 2003
From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)
Date: Thu, 11 Dec 2003 09:45:18 +0800
Subject: [Rocks-Discuss]RE: Do you have a list of the various models of Gigabit
Ethernet Interfaces compatible to Rocks 3?
Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCD66@EXIMCB2.imcb.a-star.edu.sg>


Hi All,

Do you have a list of the various gigabit Ethernet interfaces that are
compatible to Rocks 3?

I am changing my nodes connectivity from 10/100 to 1000.

Have anyone done that and how are the differences in performance or
turnaround time?

Have anyone successfully build a set of grid compute nodes using Rocks
3?
Thanks and Regards

Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: npaci-rocks-discussion-request at sdsc.edu
[mailto:npaci-rocks-discussion-request at sdsc.edu]
Sent: Thursday, December 11, 2003 9:25 AM
To: npaci-rocks-discussion at sdsc.edu
Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs

Send npaci-rocks-discussion mailing list submissions to
      npaci-rocks-discussion at sdsc.edu

To subscribe or unsubscribe via the World Wide Web, visit

http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
or, via email, send a message with subject or body 'help' to
      npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list at
      npaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of npaci-rocks-discussion digest..."


Today's Topics:

   1. Non-homogenous legacy hardware (Chris Dwan (CCGB))
   2. Error during Make when building a new install floppy (Terrence
Martin)
   3. Re: Error during Make when building a new install floppy (Tim
Carlson)
   4. Re: Non-homogenous legacy hardware (Tim Carlson)
   5. ssh_known_hosts and ganglia (Jag)
   6. Re: ssh_known_hosts and ganglia (Mason J. Katz)
   7. "TypeError: loop over non-sequence" when trying to build CD
distro (V. Rowley)
   8. Re: one node short in "labels" (Greg Bruno)
   9. Re: "TypeError: loop over non-sequence" when trying to build CD
distro (Mason J. Katz)
  10. Re: "TypeError: loop over non-sequence" when trying
        to build CD distro (V. Rowley)
  11. Re: "TypeError: loop over non-sequence" when trying to
        build CD distro (Tim Carlson)

--__--__--

Message: 1
Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)
From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]Non-homogenous legacy hardware


I am integrating legacy systems into a ROCKS cluster, and have hit a
snag with the auto-partition configuration: The new (old) systems have
SCSI disks, while old (new) ones contain IDE. This is a non-issue so
long as the initial install does its default partitioning. However, I
have a "replace-auto-partition.xml" file which is unworkable for the
SCSI
based systems since it makes specific reference to "hda" rather than
"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with a
conditional such that "hda" or "sda" is used, based on the name of the
node (or some other criterion).

Is this possible?

Thanks, in advance. If this is out there on the mailing list archives,
a
pointer would be greatly appreciated.

-Chris Dwan
 The University of Minnesota

--__--__--

Message: 2
Date: Wed, 10 Dec 2003 12:09:11 -0800
From: Terrence Martin <tmartin at physics.ucsd.edu>
To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
Subject: [Rocks-Discuss]Error during Make when building a new install
floppy

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today according to
the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'
make[2]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'
strip -o loader         anaconda-7.3/loader/loader
strip: anaconda-7.3/loader/loader: No such file or directory
make[1]: *** [loader] Error 1
make[1]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader'
make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary
module into the appropriate location in the boot image.

Would it be correct to modify the following image file with my changes
and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3
/en/os/i386/images/bootnet.img
Basically I am injecting an updated e1000 driver with changes to
pcitable to support the address of my gigabit cards.

Terrence


--__--__--

Message: 3
Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]Error during Make when building a new
install floppy
To: Terrence Martin <tmartin at physics.ucsd.edu>
Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy for
rocks.
>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at
least it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary
> module into the appropriate location in the boot image.
>
> Would it be correct to modify the following image file with my changes
> and then write it to a floppy via dd?
>
>
/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3
/en/os/i386/images/bootnet.img
>
> Basically I am injecting an updated e1000 driver with changes to
> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you go
down that path. You also need to work on netstg1.img and you'll need to
update the drive in the kernel rpm that gets installed on the box. None
of
this is trivial.

If it were me, I would go down the same path I took for updating the
AIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003
533.html

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support
--__--__--

Message: 4
Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware
To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
Cc: npaci-rocks-discussion at sdsc.edu
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>
> I am integrating legacy systems into a ROCKS cluster, and have hit a
> snag with the auto-partition configuration: The new (old) systems
have
> SCSI disks, while old (new) ones contain IDE. This is a non-issue so
> long as the initial install does its default partitioning. However, I
> have a "replace-auto-partition.xml" file which is unworkable for the
SCSI
> based systems since it makes specific reference to "hda" rather than
> "sda."

If you have just a single drive, then you should be able to skip the
"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an
<eval sh="bash">
</eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support


--__--__--

Message: 5
From: Jag <agrajag at dragaera.net>
To: npaci-rocks-discussion at sdsc.edu
Date: Wed, 10 Dec 2003 13:21:07 -0500
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia

I noticed a previous post on this list
(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934
.html) indicating that Rocks distributes ssh keys for all the nodes over
ganglia. Can anyone enlighten me as to how this is done?
I looked through the ganglia docs and didn't see anything indicating how
to do this, so I'm assuming Rocks made some changes. Unfortunately the
rocks iso images don't seem to contain srpms, so I'm now coming here.
What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found?    I've done quite
a bit of searching, but haven't found them anywhere.


--__--__--

Message: 6
Cc: npaci-rocks-discussion at sdsc.edu
From: "Mason J. Katz" <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia
Date: Wed, 10 Dec 2003 14:39:15 -0800
To: Jag <agrajag at dragaera.net>

Most of the SRPMS are on our FTP site, but we've screwed this up
before. The SRPMS are entirely Rocks specific so they are of little
value outside of Rocks. You can also checkout our CVS tree
(cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We
have a ganglia-python package we created to allow us to write our own
metrics at a high level than the provide gmetric application. We've
also moved from this method to a single cluster-wide ssh key for Rocks
3.1.

     -mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list
> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/
> 001934.html) indicating that Rocks distributes ssh keys for all the
> nodes over
> ganglia. Can anyone enlighten me as to how this is done?
>
> I looked through the ganglia docs and didn't see anything indicating
> how
> to do this, so I'm assuming Rocks made some changes. Unfortunately
the
> rocks iso images don't seem to contain srpms, so I'm now coming here.
> What did Rocks do to ganglia to make the distribution of ssh keys
work?
>
> Also, does anyone know where Rocks SRPMs can be found? I've done
quite
> a bit of searching, but haven't found them anywhere.


--__--__--

Message: 7
Date: Wed, 10 Dec 2003 14:43:49 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying
to build CD distro
When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist

--dist=cdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

> Cleaning distribution
> Resolving versions (RPMs)
> Resolving versions (SRPMs)
> Adding support for rebuild distribution from source
> Creating files (symbolic links - fast)
> Creating symlinks to kickstart files
> Fixing Comps Database
> Generating hdlist (rpm database)
> Patching second stage loader (eKV, partioning, ...)
>     patching "rocks-ekv" into distribution ...
>     patching "rocks-piece-pipe" into distribution ...
>     patching "PyXML" into distribution ...
>     patching "expat" into distribution ...
>     patching "rocks-pylib" into distribution ...
>     patching "MySQL-python" into distribution ...
>     patching "rocks-kickstart" into distribution ...
>     patching "rocks-kickstart-profiles" into distribution ...
>     patching "rocks-kickstart-dtds" into distribution ...
>     building CRAM filesystem ...
> Cleaning distribution
> Resolving versions (RPMs)
> Resolving versions (SRPMs)
> Creating symlinks to kickstart files
> Generating hdlist (rpm database)
> Segregating RPMs (rocks, non-rocks)
> sh: ./kickstart.cgi: No such file or directory
> sh: ./kickstart.cgi: No such file or directory
> Traceback (innermost last):
>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>     app.run()
>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>     eval('self.command_%s()' % (command))
>   File "<string>", line 0, in ?
>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>     builder.build()
>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>     (rocks, nonrocks) = self.segregateRPMS()
>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
segregateRPMS
>     for pkg in ks.getSection('packages'):
> TypeError: loop over non-sequence

Any ideas?

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715
See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb


--__--__--

Message: 8
Cc: rocks <npaci-rocks-discussion at sdsc.edu>
From: Greg Bruno <bruno at rocksclusters.org>
Subject: Re: [Rocks-Discuss]one node short in "labels"
Date: Wed, 10 Dec 2003 15:12:49 -0800
To: Vincent Fox <vincent_b_fox at yahoo.com>

>   So I go to the "labels" selection on the web page to print out the=20
>   pretty labels. What a nice idea by the way!
>   =A0
>   EXCEPT....it's one node short! I go up to 0-13 and this stops at=20
>   0-12.=A0 Any ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

    - gb


--__--__--

Message: 9
Cc: npaci-rocks-discussion at sdsc.edu
From: "Mason J. Katz" <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]"TypeError:    loop over non-sequence" when
trying to build CD distro
Date: Wed, 10 Dec 2003 15:16:27 -0800
To: "V. Rowley" <vrowley at ucsd.edu>

It looks like someone moved the profiles directory to profiles.orig.

       -mjk


[root at rocks14 install]# ls -l
total 56
drwxr-sr-x    3 root     wheel        4096 Dec   10 21:16 cdrom
drwxrwsr-x    5 root     wheel        4096 Dec   10 20:38 contrib.orig
drwxr-sr-x    3 root     wheel        4096 Dec   10 21:07
ftp.rocksclusters.org
drwxr-sr-x    3 root     wheel        4096 Dec   10 20:38
ftp.rocksclusters.org.orig
-r-xrwsr-x    1 root     wheel       19254 Sep    3   12:40   kickstart.cgi
drwxr-xr-x    3 root     root         4096 Dec   10   20:38   profiles.orig
drwxr-sr-x    3 root     wheel        4096 Dec   10   21:15   rocks-dist
drwxrwsr-x    3 root     wheel        4096 Dec   10   20:38   rocks-dist.orig
drwxr-sr-x    3 root     wheel        4096 Dec   10   21:02   src
drwxr-sr-x    4 root     wheel        4096 Dec   10   20:49   src.foo
On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:
>
> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
> rocks-dist --dist=cdrom cdrom
>
> on a server installed with ROCKS 3.0.0, I eventually get this:
>
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Adding support for rebuild distribution from source
>> Creating files (symbolic links - fast)
>> Creating symlinks to kickstart files
>> Fixing Comps Database
>> Generating hdlist (rpm database)
>> Patching second stage loader (eKV, partioning, ...)
>>     patching "rocks-ekv" into distribution ...
>>     patching "rocks-piece-pipe" into distribution ...
>>     patching "PyXML" into distribution ...
>>     patching "expat" into distribution ...
>>     patching "rocks-pylib" into distribution ...
>>     patching "MySQL-python" into distribution ...
>>     patching "rocks-kickstart" into distribution ...
>>     patching "rocks-kickstart-profiles" into distribution ...
>>     patching "rocks-kickstart-dtds" into distribution ...
>>     building CRAM filesystem ...
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Creating symlinks to kickstart files
>> Generating hdlist (rpm database)
>> Segregating RPMs (rocks, non-rocks)
>> sh: ./kickstart.cgi: No such file or directory
>> sh: ./kickstart.cgi: No such file or directory
>> Traceback (innermost last):
>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>     app.run()
>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>     eval('self.command_%s()' % (command))
>>   File "<string>", line 0, in ?
>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>     builder.build()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>     (rocks, nonrocks) = self.segregateRPMS()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>> segregateRPMS
>>     for pkg in ks.getSection('packages'):
>> TypeError: loop over non-sequence
>
> Any ideas?
>
> --
> Vicky Rowley                             email: vrowley at ucsd.edu
> Biomedical Informatics Research Network     work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at
> http://www.sagacitech.com/Chinaweb


--__--__--

Message: 10
Date: Wed, 10 Dec 2003 16:50:16 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: "Mason J. Katz" <mjk at sdsc.edu>
CC: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]"TypeError:   loop over non-sequence" when
trying
 to build CD distro

Yep, I did that, but only *AFTER* getting the error. [Thought it was
generated by the rocks-dist sequence, but apparently not.] Go ahead.
Move it back. Same difference.

Vicky

Mason J. Katz wrote:
> It looks like someone moved the profiles directory to profiles.orig.
>
>     -mjk
>
>
> [root at rocks14 install]# ls -l
> total 56
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
> drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07
> ftp.rocksclusters.org
> drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38
> ftp.rocksclusters.org.orig
> -r-xrwsr-x    1 root     wheel       19254 Sep 3 12:40 kickstart.cgi
> drwxr-xr-x    3 root     root          4096 Dec 10 20:38 profiles.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
> drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38
rocks-dist.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
> drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>
>> When I run this:
>>
>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>> rocks-dist --dist=cdrom cdrom
>>
>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Adding support for rebuild distribution from source
>>> Creating files (symbolic links - fast)
>>> Creating symlinks to kickstart files
>>> Fixing Comps Database
>>> Generating hdlist (rpm database)
>>> Patching second stage loader (eKV, partioning, ...)
>>>     patching "rocks-ekv" into distribution ...
>>>     patching "rocks-piece-pipe" into distribution ...
>>>     patching "PyXML" into distribution ...
>>>     patching "expat" into distribution ...
>>>     patching "rocks-pylib" into distribution ...
>>>     patching "MySQL-python" into distribution ...
>>>     patching "rocks-kickstart" into distribution ...
>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>     building CRAM filesystem ...
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Creating symlinks to kickstart files
>>> Generating hdlist (rpm database)
>>> Segregating RPMs (rocks, non-rocks)
>>> sh: ./kickstart.cgi: No such file or directory
>>> sh: ./kickstart.cgi: No such file or directory
>>> Traceback (innermost last):
>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>     app.run()
>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>     eval('self.command_%s()' % (command))
>>>   File "<string>", line 0, in ?
>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>     builder.build()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>     (rocks, nonrocks) = self.segregateRPMS()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>> segregateRPMS
>>>     for pkg in ks.getSection('packages'):
>>> TypeError: loop over non-sequence
>>
>>
>> Any ideas?
>>
>> --
>> Vicky Rowley                             email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network     work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>
>
>

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb


--__--__--

Message: 11
Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
trying to
 build CD distro
To: "V. Rowley" <vrowley at ucsd.edu>
Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in
/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was
> generated by the rocks-dist sequence, but apparently not.] Go ahead.
> Move it back. Same difference.
>
> Vicky
>
> Mason J. Katz wrote:
> > It looks like someone moved the profiles directory to profiles.orig.
> >
> >     -mjk
> >
> >
> > [root at rocks14 install]# ls -l
> > total 56
> > drwxr-sr-x    3 root     wheel        4096 Dec 10 21:16 cdrom
> > drwxrwsr-x    5 root     wheel        4096 Dec 10 20:38 contrib.orig
> > drwxr-sr-x    3 root     wheel        4096 Dec 10 21:07
> > ftp.rocksclusters.org
> > drwxr-sr-x    3 root     wheel        4096 Dec 10 20:38
> > ftp.rocksclusters.org.orig
> > -r-xrwsr-x    1 root     wheel       19254 Sep 3 12:40
kickstart.cgi
> > drwxr-xr-x    3 root     root         4096 Dec 10 20:38
profiles.orig
> > drwxr-sr-x    3 root     wheel        4096 Dec 10 21:15 rocks-dist
> > drwxrwsr-x    3 root     wheel        4096 Dec 10 20:38
rocks-dist.orig
> > drwxr-sr-x    3 root     wheel        4096 Dec 10 21:02 src
> > drwxr-sr-x    4 root     wheel        4096 Dec 10 20:49 src.foo
> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
> >
> >> When I run this:
> >>
> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
> >> rocks-dist --dist=cdrom cdrom
> >>
> >> on a server installed with ROCKS 3.0.0, I eventually get this:
> >>
> >>> Cleaning distribution
> >>> Resolving versions (RPMs)
> >>> Resolving versions (SRPMs)
> >>> Adding support for rebuild distribution from source
> >>> Creating files (symbolic links - fast)
> >>> Creating symlinks to kickstart files
> >>> Fixing Comps Database
> >>> Generating hdlist (rpm database)
> >>> Patching second stage loader (eKV, partioning, ...)
> >>>     patching "rocks-ekv" into distribution ...
> >>>     patching "rocks-piece-pipe" into distribution ...
> >>>     patching "PyXML" into distribution ...
> >>>     patching "expat" into distribution ...
> >>>     patching "rocks-pylib" into distribution ...
> >>>     patching "MySQL-python" into distribution ...
> >>>     patching "rocks-kickstart" into distribution ...
> >>>     patching "rocks-kickstart-profiles" into distribution ...
> >>>     patching "rocks-kickstart-dtds" into distribution ...
> >>>     building CRAM filesystem ...
> >>> Cleaning distribution
> >>> Resolving versions (RPMs)
> >>> Resolving versions (SRPMs)
> >>> Creating symlinks to kickstart files
> >>> Generating hdlist (rpm database)
> >>> Segregating RPMs (rocks, non-rocks)
> >>> sh: ./kickstart.cgi: No such file or directory
> >>> sh: ./kickstart.cgi: No such file or directory
> >>> Traceback (innermost last):
> >>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
> >>>     app.run()
> >>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
> >>>     eval('self.command_%s()' % (command))
> >>>   File "<string>", line 0, in ?
> >>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
> >>>     builder.build()
> >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
> >>>     (rocks, nonrocks) = self.segregateRPMS()
> >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
> >>> segregateRPMS
> >>>     for pkg in ks.getSection('packages'):
> >>> TypeError: loop over non-sequence
> >>
> >>
> >> Any ideas?
> >>
> >> --
> >> Vicky Rowley                             email: vrowley at ucsd.edu
> >> Biomedical Informatics Research Network     work: (858) 536-5980
> >> University of California, San Diego           fax: (858) 822-0828
> >> 9500 Gilman Drive
> >> La Jolla, CA 92093-0715
> >>
> >>
> >> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
> >
> >
> >
>
> --
> Vicky Rowley                              email: vrowley at ucsd.edu
> Biomedical Informatics Research Network      work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>
>




--__--__--

_______________________________________________
npaci-rocks-discussion mailing list
npaci-rocks-discussion at sdsc.edu
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion


End of npaci-rocks-discussion Digest


DISCLAIMER:
This email is confidential and may be privileged. If you are not the intended
recipient, please delete it and notify us immediately. Please do not copy or use it
for any purpose, or disclose its contents to any other person as it may be an
offence under the Official Secrets Act. Thank you.


From tmartin at physics.ucsd.edu Wed Dec 10 18:03:41 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Wed, 10 Dec 2003 18:03:41 -0800
Subject: [Rocks-Discuss]Rocks 3.0.0
Message-ID: <3FD7D07D.8090108@physics.ucsd.edu>

I am having a problem on install of rocks 3.0.0 on my new cluster.

The python error occurs right after anaconda starts and just before the
install asks for the roll CDROM.

The error refers to an inability to find or load rocks.file. The error
is associated I think with the window that pops up and asks you in put
the roll CDROM in.

The process I followed to get to this point is

Put the Rocks 3.0.0 CDROM into the CDROM drive
Boot the system
At the prompt type frontend
Wait till anaconda starts
Error referring to unable to load rocks.file.

I have successfully installed rocks on a smaller cluster but that has
different hardware. I used the same CDROM for both installs.

Any thoughts?

Terrence




From vrowley at ucsd.edu Wed Dec 10 19:52:49 2003
From: vrowley at ucsd.edu (V. Rowley)
Date: Wed, 10 Dec 2003 19:52:49 -0800
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying
 to build CD distro
In-Reply-To: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov>
References: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov>
Message-ID: <3FD7EA11.10204@ucsd.edu>

Looks like python is okay:

>   [root at rocks14 birn-oracle1]# which python
>   /usr/bin/python
>   [root at rocks14 birn-oracle1]# python --help
>   Unknown option: --
>   usage: python [option] ... [-c cmd | file | -] [arg] ...
>   Options and arguments (and corresponding environment variables):
>   -d     : debug output from parser (also PYTHONDEBUG=x)
>   -i     : inspect interactively after running script, (also PYTHONINSPECT=x)
>            and force prompts, even if stdin does not appear to be a terminal
>   -O     : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)
>   -OO    : remove doc-strings in addition to the -O optimizations
>   -S     : don't imply 'import site' on initialization
>   -t     : issue warnings about inconsistent tab usage (-tt: issue errors)
>   -u     : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
>   -v     : verbose (trace import statements) (also PYTHONVERBOSE=x)
>   -x     : skip first line of source, allowing use of non-Unix forms of #!cmd
>   -X     : disable class based built-in exceptions
>   -c cmd : program passed in as string (terminates option list)
>   file   : program read from script file
>   -      : program read from stdin (default; interactive mode if a tty)
>   arg ...: arguments passed to program in sys.argv[1:]
>   Other environment variables:
>   PYTHONSTARTUP: file executed on interactive startup (no default)
>   PYTHONPATH   : ':'-separated list of directories prefixed to the
>                   default module search path. The result is sys.path.
>   PYTHONHOME   : alternate <prefix> directory (or <prefix>:<exec_prefix>).
>                   The default module search path uses <prefix>/python1.5.
>   [root at rocks14 birn-oracle1]#



Tim Carlson wrote:
> On Wed, 10 Dec 2003, V. Rowley wrote:
>
> Did you remove python by chance? kickstart.cgi calls python directly in
> /usr/bin/python while rocks-dist does an "env python"
>
> Tim
>
>
>>Yep, I did that, but only *AFTER* getting the error. [Thought it was
>>generated by the rocks-dist sequence, but apparently not.] Go ahead.
>>Move it back. Same difference.
>>
>>Vicky
>>
>>Mason J. Katz wrote:
>>
>>>It looks like someone moved the profiles directory to profiles.orig.
>>>
>>>     -mjk
>>>
>>>
>>>[root at rocks14 install]# ls -l
>>>total 56
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:16 cdrom
>>>drwxrwsr-x     5 root     wheel        4096 Dec 10 20:38 contrib.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:07
>>>ftp.rocksclusters.org
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 20:38
>>>ftp.rocksclusters.org.orig
>>>-r-xrwsr-x     1 root     wheel       19254 Sep 3 12:40 kickstart.cgi
>>>drwxr-xr-x     3 root     root         4096 Dec 10 20:38 profiles.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:15 rocks-dist
>>>drwxrwsr-x     3 root     wheel        4096 Dec 10 20:38 rocks-dist.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:02 src
>>>drwxr-sr-x     4 root     wheel        4096 Dec 10 20:49 src.foo
>>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>>
>>>
>>>>When I run this:
>>>>
>>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>>>>rocks-dist --dist=cdrom cdrom
>>>>
>>>>on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>
>>>>
>>>>>Cleaning distribution
>>>>>Resolving versions (RPMs)
>>>>>Resolving versions (SRPMs)
>>>>>Adding support for rebuild distribution from source
>>>>>Creating files (symbolic links - fast)
>>>>>Creating symlinks to kickstart files
>>>>>Fixing Comps Database
>>>>>Generating hdlist (rpm database)
>>>>>Patching second stage loader (eKV, partioning, ...)
>>>>>     patching "rocks-ekv" into distribution ...
>>>>>     patching "rocks-piece-pipe" into distribution ...
>>>>>     patching "PyXML" into distribution ...
>>>>>     patching "expat" into distribution ...
>>>>>     patching "rocks-pylib" into distribution ...
>>>>>     patching "MySQL-python" into distribution ...
>>>>>     patching "rocks-kickstart" into distribution ...
>>>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>>>     building CRAM filesystem ...
>>>>>Cleaning distribution
>>>>>Resolving versions (RPMs)
>>>>>Resolving versions (SRPMs)
>>>>>Creating symlinks to kickstart files
>>>>>Generating hdlist (rpm database)
>>>>>Segregating RPMs (rocks, non-rocks)
>>>>>sh: ./kickstart.cgi: No such file or directory
>>>>>sh: ./kickstart.cgi: No such file or directory
>>>>>Traceback (innermost last):
>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>    app.run()
>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>    eval('self.command_%s()' % (command))
>>>>> File "<string>", line 0, in ?
>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>    builder.build()
>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>    (rocks, nonrocks) = self.segregateRPMS()
>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>>>>segregateRPMS
>>>>>    for pkg in ks.getSection('packages'):
>>>>>TypeError: loop over non-sequence
>>>>
>>>>
>>>>Any ideas?
>>>>
>>>>--
>>>>Vicky Rowley                              email: vrowley at ucsd.edu
>>>>Biomedical Informatics Research Network      work: (858) 536-5980
>>>>University of California, San Diego           fax: (858) 822-0828
>>>>9500 Gilman Drive
>>>>La Jolla, CA 92093-0715
>>>>
>>>>
>>>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
>>>
>>>
>>>
>>--
>>Vicky Rowley                              email: vrowley at ucsd.edu
>>Biomedical Informatics Research Network      work: (858) 536-5980
>>University of California, San Diego           fax: (858) 822-0828
>>9500 Gilman Drive
>>La Jolla, CA 92093-0715
>>
>>
>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
>>
>>
>
>
>
>

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715
See pictures from our trip to China at http://www.sagacitech.com/Chinaweb



From wyzhong78 at msn.com Wed Dec 10 20:38:53 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Thu, 11 Dec 2003 12:38:53 +0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
Message-ID: <BAY3-F3296PnPlpNvHX000097eb@hotmail.com>



>From: Greg Bruno <bruno at rocksclusters.org>
>To: "zhong wenyu" <wyzhong78 at msn.com>
>CC: npaci-rocks-discussion at sdsc.edu
>Subject: Re: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
>Date: Mon, 8 Dec 2003 15:31:08 -0800
>
>>I have installed Rocks 3.0.0 with default options successful,there
>>was not any trouble.But I boot it up,it stopped at beginning,just
>>show "GRUB" on the screen and waiting...
>
>when you built the frontend, did you start with the rocks base CD
>then add the HPC roll?
>
> - gb
>
I have raveled out this trouble.But I don't know why.
I have one SCSI harddisk and one IDE disk On the frontend,I choose SCSI to
be the first HDD and installed "/" on it.then it can not boot up.Even
disabled the IDE HDD and install it again,It can not boot up also.at last I
choose SCSI to be the first HDD and install,then choose IDE HDD to be the
first and boot up, it's ok!
GRUB must be installed on IDE HDD?
thanks!

_________________________________________________________________
??????????????? MSN Hotmail? http://www.hotmail.com



From wyzhong78 at msn.com Wed Dec 10 20:44:09 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Thu, 11 Dec 2003 12:44:09 +0800
Subject: [Rocks-Discuss]I can't use xpbs in rocks
Message-ID: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com>

Hi,everyone!
I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of them.
typed:xpbs[enter]
showed:xpbs: initialization failed! output: invalid command name
"Pref_Init"
thanks!

_________________________________________________________________
?????????????? MSN Messenger: http://messenger.msn.com/cn
From phil at sdsc.edu Wed Dec 10 21:26:50 2003
From: phil at sdsc.edu (Philip Papadopoulos)
Date: Wed, 10 Dec 2003 21:26:50 -0800
Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
In-Reply-To: <BAY3-F3296PnPlpNvHX000097eb@hotmail.com>
References: <BAY3-F3296PnPlpNvHX000097eb@hotmail.com>
Message-ID: <3FD8001A.9030702@sdsc.edu>

There is a conflict in the way the BIOS numbers drives and the way the
install
kernel numbers the drive (and this is not standard). You should check in
your BIOS
if you can select which is the boot device. If it just says "Hard Disk"
(no choice between
IDE and SCSI), then you are stuck with needing to have Grub on the
device that
BIOS thinks is the boot device. If you can choose, then SCSI can
probably be made
to work.

These sorts of issues (this is a general redhat/linux problem) can be
quite troublesome
(and annoying). We had some older HW that had two different types of
SCSI controllers
with drives on each controller. The boot kernel labeled the /sda
differently than the BIOS.
Install went fine, by the dreaded "OS Not Found" BIOS message when
rebooting. The cause was that
the Grub loader was being put on Linux's notion of /sda, but when BIOS
loaded, it found
nothing (because grub was installed on BIOS's idea of /sdb). For this
particular machine, we were not
able to change BIOSes notion -- we had to force Linux to boot the
bootloader on linuxes idea of
/sdb.

-P


zhong wenyu wrote:

>
>
>
>> From: Greg Bruno <bruno at rocksclusters.org>
>> To: "zhong wenyu" <wyzhong78 at msn.com>
>> CC: npaci-rocks-discussion at sdsc.edu
>> Subject: Re: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up
>> Date: Mon, 8 Dec 2003 15:31:08 -0800
>>
>>> I have installed Rocks 3.0.0 with default options successful,there
>>> was not any trouble.But I boot it up,it stopped at beginning,just
>>> show "GRUB" on the screen and waiting...
>>
>>
>> when you built the frontend, did you start with the rocks base CD
>> then add the HPC roll?
>>
>> - gb
>>
> I have raveled out this trouble.But I don't know why.
> I have one SCSI harddisk and one IDE disk On the frontend,I choose
> SCSI to be the first HDD and installed "/" on it.then it can not boot
> up.Even disabled the IDE HDD and install it again,It can not boot up
> also.at last I choose SCSI to be the first HDD and install,then choose
> IDE HDD to be the first and boot up, it's ok!
> GRUB must be installed on IDE HDD?
> thanks!
>
> _________________________________________________________________
> ??????????????? MSN Hotmail? http://www.hotmail.com




From mjk at sdsc.edu Wed Dec 10 22:04:57 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 10 Dec 2003 22:04:57 -0800
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build
CD distro
In-Reply-To: <3FD7EA11.10204@ucsd.edu>
References: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov>
<3FD7EA11.10204@ucsd.edu>
Message-ID: <F23F7B5E-2B9F-11D8-981C-000A95DA5638@sdsc.edu>

Hi Vicky,

The following directory cannot resolve its symlinks anymore. If you
start removing the profiles and mirror directories around Rocks cannot
find them to build kickstart files.

     -mjk


[root at rocks14 default]# ls -l
total 16
lrwxrwxrwx    1 root     root          113 Nov 13 20:19 core.xml ->
/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/
7.3/en/os/i386/build/graphs/default/core.xml
-rwxrwsr-x    1 root     wheel        3123 Sep 3 17:10 hpc.xml
-rwxr-xr-x    1 root     root          495 Sep 9 22:55 patch.xml
-rwxrwsr-x    1 root     wheel         452 Sep 3 17:10 root.xml
lrwxrwxrwx    1 root     root          112 Nov 13 20:19 rsh.xml ->
/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/
7.3/en/os/i386/build/graphs/default/rsh.xml
-rwxrwsr-x    1 root     wheel         923 Sep 3 17:10 sge.xml

On Dec 10, 2003, at 7:52 PM, V. Rowley wrote:

> Looks like python is okay:
>
>> [root at rocks14 birn-oracle1]# which python
>> /usr/bin/python
>> [root at rocks14 birn-oracle1]# python --help
>> Unknown option: --
>> usage: python [option] ... [-c cmd | file | -] [arg] ...
>> Options and arguments (and corresponding environment variables):
>> -d      : debug output from parser (also PYTHONDEBUG=x)
>> -i      : inspect interactively after running script, (also
>> PYTHONINSPECT=x)
>>            and force prompts, even if stdin does not appear to be a
>> terminal
>> -O      : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)
>> -OO     : remove doc-strings in addition to the -O optimizations
>> -S      : don't imply 'import site' on initialization
>> -t      : issue warnings about inconsistent tab usage (-tt: issue
>> errors)
>> -u      : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
>> -v      : verbose (trace import statements) (also PYTHONVERBOSE=x)
>> -x      : skip first line of source, allowing use of non-Unix forms of
>> #!cmd
>> -X      : disable class based built-in exceptions
>> -c cmd : program passed in as string (terminates option list)
>> file    : program read from script file
>> -       : program read from stdin (default; interactive mode if a tty)
>> arg ...: arguments passed to program in sys.argv[1:]
>> Other environment variables:
>> PYTHONSTARTUP: file executed on interactive startup (no default)
>> PYTHONPATH     : ':'-separated list of directories prefixed to the
>>                  default module search path. The result is sys.path.
>> PYTHONHOME     : alternate <prefix> directory (or
>> <prefix>:<exec_prefix>).
>>                  The default module search path uses <prefix>/python1.5.
>> [root at rocks14 birn-oracle1]#
>
>
>
> Tim Carlson wrote:
>> On Wed, 10 Dec 2003, V. Rowley wrote:
>> Did you remove python by chance? kickstart.cgi calls python directly
>> in
>> /usr/bin/python while rocks-dist does an "env python"
>> Tim
>>> Yep, I did that, but only *AFTER* getting the error. [Thought it was
>>> generated by the rocks-dist sequence, but apparently not.] Go ahead.
>>> Move it back. Same difference.
>>>
>>> Vicky
>>>
>>> Mason J. Katz wrote:
>>>
>>>> It looks like someone moved the profiles directory to profiles.orig.
>>>>
>>>>     -mjk
>>>>
>>>>
>>>> [root at rocks14 install]# ls -l
>>>> total 56
>>>> drwxr-sr-x      3 root     wheel        4096 Dec 10 21:16 cdrom
>>>> drwxrwsr-x      5 root     wheel        4096 Dec 10 20:38 contrib.orig
>>>> drwxr-sr-x      3 root     wheel        4096 Dec 10 21:07
>>>> ftp.rocksclusters.org
>>>> drwxr-sr-x      3 root     wheel        4096 Dec 10 20:38
>>>> ftp.rocksclusters.org.orig
>>>> -r-xrwsr-x      1 root     wheel       19254 Sep 3 12:40
>>>> kickstart.cgi
>>>> drwxr-xr-x     3 root    root          4096 Dec 10 20:38
>>>> profiles.orig
>>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 21:15 rocks-dist
>>>> drwxrwsr-x     3 root    wheel         4096 Dec 10 20:38
>>>> rocks-dist.orig
>>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 21:02 src
>>>> drwxr-sr-x     4 root    wheel         4096 Dec 10 20:49 src.foo
>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>>>
>>>>
>>>>> When I run this:
>>>>>
>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>>>>> rocks-dist --dist=cdrom cdrom
>>>>>
>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>>
>>>>>
>>>>>> Cleaning distribution
>>>>>> Resolving versions (RPMs)
>>>>>> Resolving versions (SRPMs)
>>>>>> Adding support for rebuild distribution from source
>>>>>> Creating files (symbolic links - fast)
>>>>>> Creating symlinks to kickstart files
>>>>>> Fixing Comps Database
>>>>>> Generating hdlist (rpm database)
>>>>>> Patching second stage loader (eKV, partioning, ...)
>>>>>>    patching "rocks-ekv" into distribution ...
>>>>>>    patching "rocks-piece-pipe" into distribution ...
>>>>>>    patching "PyXML" into distribution ...
>>>>>>    patching "expat" into distribution ...
>>>>>>    patching "rocks-pylib" into distribution ...
>>>>>>    patching "MySQL-python" into distribution ...
>>>>>>    patching "rocks-kickstart" into distribution ...
>>>>>>    patching "rocks-kickstart-profiles" into distribution ...
>>>>>>    patching "rocks-kickstart-dtds" into distribution ...
>>>>>>    building CRAM filesystem ...
>>>>>> Cleaning distribution
>>>>>> Resolving versions (RPMs)
>>>>>> Resolving versions (SRPMs)
>>>>>> Creating symlinks to kickstart files
>>>>>> Generating hdlist (rpm database)
>>>>>> Segregating RPMs (rocks, non-rocks)
>>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>>> Traceback (innermost last):
>>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>>    app.run()
>>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>>    eval('self.command_%s()' % (command))
>>>>>> File "<string>", line 0, in ?
>>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>>    builder.build()
>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>>    (rocks, nonrocks) = self.segregateRPMS()
>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>>>>> segregateRPMS
>>>>>>    for pkg in ks.getSection('packages'):
>>>>>> TypeError: loop over non-sequence
>>>>>
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> --
>>>>> Vicky Rowley                              email: vrowley at ucsd.edu
>>>>> Biomedical Informatics Research Network      work: (858) 536-5980
>>>>> University of California, San Diego           fax: (858) 822-0828
>>>>> 9500 Gilman Drive
>>>>> La Jolla, CA 92093-0715
>>>>>
>>>>>
>>>>> See pictures from our trip to China at
>>>>> http://www.sagacitech.com/Chinaweb
>>>>
>>>>
>>>>
>>> --
>>> Vicky Rowley                              email: vrowley at ucsd.edu
>>> Biomedical Informatics Research Network      work: (858) 536-5980
>>> University of California, San Diego           fax: (858) 822-0828
>>> 9500 Gilman Drive
>>> La Jolla, CA 92093-0715
>>>
>>>
>>> See pictures from our trip to China at
>>> http://www.sagacitech.com/Chinaweb
>>>
>>>
>
> --
> Vicky Rowley                              email: vrowley at ucsd.edu
> Biomedical Informatics Research Network      work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at
> http://www.sagacitech.com/Chinaweb



From bruno at rocksclusters.org Wed Dec 10 22:31:11 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 10 Dec 2003 22:31:11 -0800
Subject: [Rocks-Discuss]Rocks 3.0.0
In-Reply-To: <3FD7D07D.8090108@physics.ucsd.edu>
References: <3FD7D07D.8090108@physics.ucsd.edu>
Message-ID: <9C7EE8E9-2BA3-11D8-9715-000A95C4E3B4@rocksclusters.org>

>   I am having a problem on install of rocks 3.0.0 on my new cluster.
>
>   The python error occurs right after anaconda starts and just before
>   the install asks for the roll CDROM.
>
>   The error refers to an inability to find or load rocks.file. The error
>   is associated I think with the window that pops up and asks you in put
>   the roll CDROM in.
>
>   The process I followed to get to this point is
>
>   Put the Rocks 3.0.0 CDROM into the CDROM drive
>   Boot the system
>   At the prompt type frontend
>   Wait till anaconda starts
>   Error referring to unable to load rocks.file.
>
>   I have successfully installed rocks on a smaller cluster but that has
>   different hardware. I used the same CDROM for both installs.
>
>   Any thoughts?

hard to say -- but some folks had similar problems due to bad memory:

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-February/
001246.html

    - gb



From vincent_b_fox at yahoo.com Wed Dec 10 22:43:21 2003
From: vincent_b_fox at yahoo.com (Vincent Fox)
Date: Wed, 10 Dec 2003 22:43:21 -0800 (PST)
Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform
In-Reply-To: <1B097BEE-2ADC-11D8-9715-000A95C4E3B4@rocksclusters.org>
Message-ID: <20031211064321.41781.qmail@web14801.mail.yahoo.com>

Okay, here's the context diff as plain text. I test-applied it using "patch -p0 <
atlas.patch" and did a compile on my PII box successfully. I can send it as
attachment or submit to CVS or some other way if you need:

*** atlas.spec.in.orig Thu Dec 11 06:27:13 2003
--- atlas.spec.in       Thu Dec 11 06:30:46 2003
***************
*** 111,117 ****
--- 111,133 ----
  y
  " | make
+ elif [ $CPUID -eq 4 ]
+ then
+ #
+ # Pentium II
+ #
+ echo "0
+ y
+ y
+ n
+ y
+ linux
+ 0
+ /usr/bin/g77
+ -O
+ y
+ " | make
  else
#


Greg Bruno <bruno at rocksclusters.org> wrote:
> Okay, came up my own quick hack:
>
> Edit atlas.spec.in, go to "other x86" section, remove
> 2 lines right above "linux", seems to make rpm now.
>
> A more formal patch would be put in a section for
> cpuid eq 4 with this correction I suppose.

if you provide the patch, we'll include it in our next release.

- gb


---------------------------------
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031210/be5c8b04/attachment-0001.html

From naihh at imcb.a-star.edu.sg Thu Dec 11 00:08:14 2003
From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)
Date: Thu, 11 Dec 2003 16:08:14 +0800
Subject: [Rocks-Discuss]RE: Have anyone successfully build a set of grid compute
nodes using Rocks?
Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCDB9@EXIMCB2.imcb.a-star.edu.sg>



Hi,

Have anyone successfully build a set of grid compute nodes using Rocks
3?
Anyone care to share?


Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: npaci-rocks-discussion-request at sdsc.edu
[mailto:npaci-rocks-discussion-request at sdsc.edu]
Sent: Thursday, December 11, 2003 11:54 AM
To: npaci-rocks-discussion at sdsc.edu
Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs

Send npaci-rocks-discussion mailing list submissions to
      npaci-rocks-discussion at sdsc.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
or, via email, send a message with subject or body 'help' to
      npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list at
      npaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of npaci-rocks-discussion digest..."


Today's Topics:

   1. RE: Do you have a list of the various models of Gigabit Ethernet
Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis)
   2. Rocks 3.0.0 (Terrence Martin)
   3. Re: "TypeError: loop over non-sequence" when trying
       to build CD distro (V. Rowley)

--__--__--

Message: 1
Date: Thu, 11 Dec 2003 09:45:18 +0800
From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg>
To: <npaci-rocks-discussion at sdsc.edu>
Subject: [Rocks-Discuss]RE: Do you have a list of the various models of
Gigabit Ethernet Interfaces compatible to Rocks 3?



Hi All,

Do you have a list of the various gigabit Ethernet interfaces that are
compatible to Rocks 3?

I am changing my nodes connectivity from 10/100 to 1000.

Have anyone done that and how are the differences in performance or
turnaround time?



Thanks and Regards

Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: npaci-rocks-discussion-request at sdsc.edu
[mailto:npaci-rocks-discussion-request at sdsc.edu]=20
Sent: Thursday, December 11, 2003 9:25 AM
To: npaci-rocks-discussion at sdsc.edu
Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs

Send npaci-rocks-discussion mailing list submissions to
      npaci-rocks-discussion at sdsc.edu
To subscribe or unsubscribe via the World Wide Web, visit
=09
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
or, via email, send a message with subject or body 'help' to
      npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list at
      npaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of npaci-rocks-discussion digest..."


Today's Topics:

   1. Non-homogenous legacy hardware (Chris Dwan (CCGB))
   2. Error during Make when building a new install floppy (Terrence
Martin)
   3. Re: Error during Make when building a new install floppy (Tim
Carlson)
   4. Re: Non-homogenous legacy hardware (Tim Carlson)
   5. ssh_known_hosts and ganglia (Jag)
   6. Re: ssh_known_hosts and ganglia (Mason J. Katz)
   7. "TypeError: loop over non-sequence" when trying to build CD
distro (V. Rowley)
   8. Re: one node short in "labels" (Greg Bruno)
   9. Re: "TypeError: loop over non-sequence" when trying to build CD
distro (Mason J. Katz)
  10. Re: "TypeError: loop over non-sequence" when trying
        to build CD distro (V. Rowley)
  11. Re: "TypeError: loop over non-sequence" when trying to
        build CD distro (Tim Carlson)

-- __--__--

Message: 1
Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)
From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]Non-homogenous legacy hardware


I am integrating legacy systems into a ROCKS cluster, and have hit a
snag with the auto-partition configuration: The new (old) systems have
SCSI disks, while old (new) ones contain IDE. This is a non-issue so
long as the initial install does its default partitioning. However, I
have a "replace-auto-partition.xml" file which is unworkable for the
SCSI
based systems since it makes specific reference to "hda" rather than
"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with a
conditional such that "hda" or "sda" is used, based on the name of the
node (or some other criterion).

Is this possible?

Thanks, in advance.   If this is out there on the mailing list archives,
a
pointer would be greatly appreciated.

-Chris Dwan
 The University of Minnesota

-- __--__--

Message: 2
Date: Wed, 10 Dec 2003 12:09:11 -0800
From: Terrence Martin <tmartin at physics.ucsd.edu>
To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
Subject: [Rocks-Discuss]Error during Make when building a new install
floppy

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today according
to=20
the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory=20
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'
make[2]: Leaving directory=20
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'
strip -o loader         anaconda-7.3/loader/loader
strip: anaconda-7.3/loader/loader: No such file or directory
make[1]: *** [loader] Error 1
make[1]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader'
make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary=20
module into the appropriate location in the boot image.

Would it be correct to modify the following image file with my
changes=20
and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3
/en/os/i386/images/bootnet.img

Basically I am injecting an updated e1000 driver with changes to=20
pcitable to support the address of my gigabit cards.

Terrence


-- __--__--

Message: 3
Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]Error during Make when building a new
install floppy
To: Terrence Martin <tmartin at physics.ucsd.edu>
Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
Reply-to: Tim Carlson <tim.carlson at pnl.gov>
On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy for
rocks.
>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at
least it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary
> module into the appropriate location in the boot image.
>
> Would it be correct to modify the following image file with my changes
> and then write it to a floppy via dd?
>
>
/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3
/en/os/i386/images/bootnet.img
>
> Basically I am injecting an updated e1000 driver with changes to
> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you go
down that path. You also need to work on netstg1.img and you'll need to
update the drive in the kernel rpm that gets installed on the box. None
of
this is trivial.

If it were me, I would go down the same path I took for updating the
AIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003
533.html

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support


-- __--__--

Message: 4
Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware
To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
Cc: npaci-rocks-discussion at sdsc.edu
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>
> I am integrating legacy systems into a ROCKS cluster, and have hit a
> snag with the auto-partition configuration: The new (old) systems
have
> SCSI disks, while old (new) ones contain IDE. This is a non-issue so
> long as the initial install does its default partitioning. However, I
> have a "replace-auto-partition.xml" file which is unworkable for the
SCSI
> based systems since it makes specific reference to "hda" rather than
> "sda."

If you have just a single drive, then you should be able to skip the
"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an
<eval sh=3D"bash">
</eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support


-- __--__--

Message: 5
From: Jag <agrajag at dragaera.net>
To: npaci-rocks-discussion at sdsc.edu
Date: Wed, 10 Dec 2003 13:21:07 -0500
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia

I noticed a previous post on this list
(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934
.html) indicating that Rocks distributes ssh keys for all the nodes over
ganglia. Can anyone enlighten me as to how this is done?

I looked through the ganglia docs and didn't see anything indicating how
to do this, so I'm assuming Rocks made some changes. Unfortunately the
rocks iso images don't seem to contain srpms, so I'm now coming here.=20
What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found?    I've done quite
a bit of searching, but haven't found them anywhere.


-- __--__--

Message: 6
Cc: npaci-rocks-discussion at sdsc.edu
From: "Mason J. Katz" <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia
Date: Wed, 10 Dec 2003 14:39:15 -0800
To: Jag <agrajag at dragaera.net>
Most of the SRPMS are on our FTP site, but we've screwed this up =20
before. The SRPMS are entirely Rocks specific so they are of little =20
value outside of Rocks. You can also checkout our CVS tree =20
(cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We
=20
have a ganglia-python package we created to allow us to write our own
=20
metrics at a high level than the provide gmetric application. We've =20
also moved from this method to a single cluster-wide ssh key for Rocks
=20
3.1.

       -mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list
> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20
> 001934.html) indicating that Rocks distributes ssh keys for all the
=20
> nodes over
> ganglia. Can anyone enlighten me as to how this is done?
>
> I looked through the ganglia docs and didn't see anything indicating
=20
> how
> to do this, so I'm assuming Rocks made some changes. Unfortunately
the
> rocks iso images don't seem to contain srpms, so I'm now coming here.
> What did Rocks do to ganglia to make the distribution of ssh keys
work?
>
> Also, does anyone know where Rocks SRPMs can be found? I've done
quite
> a bit of searching, but haven't found them anywhere.


-- __--__--

Message: 7
Date: Wed, 10 Dec 2003 14:43:49 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying
to build CD distro

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist

--dist=3Dcdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

>   Cleaning distribution
>   Resolving versions (RPMs)
>   Resolving versions (SRPMs)
>   Adding support for rebuild distribution from source
> Creating files (symbolic links - fast)
> Creating symlinks to kickstart files
> Fixing Comps Database
> Generating hdlist (rpm database)
> Patching second stage loader (eKV, partioning, ...)
>     patching "rocks-ekv" into distribution ...
>     patching "rocks-piece-pipe" into distribution ...
>     patching "PyXML" into distribution ...
>     patching "expat" into distribution ...
>     patching "rocks-pylib" into distribution ...
>     patching "MySQL-python" into distribution ...
>     patching "rocks-kickstart" into distribution ...
>     patching "rocks-kickstart-profiles" into distribution ...
>     patching "rocks-kickstart-dtds" into distribution ...
>     building CRAM filesystem ...
> Cleaning distribution
> Resolving versions (RPMs)
> Resolving versions (SRPMs)
> Creating symlinks to kickstart files
> Generating hdlist (rpm database)
> Segregating RPMs (rocks, non-rocks)
> sh: ./kickstart.cgi: No such file or directory
> sh: ./kickstart.cgi: No such file or directory
> Traceback (innermost last):
>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>     app.run()
>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>     eval('self.command_%s()' % (command))
>   File "<string>", line 0, in ?
>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>     builder.build()
>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>     (rocks, nonrocks) =3D self.segregateRPMS()
>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
segregateRPMS
>     for pkg in ks.getSection('packages'):
> TypeError: loop over non-sequence

Any ideas?

--=20
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb


-- __--__--

Message: 8
Cc: rocks <npaci-rocks-discussion at sdsc.edu>
From: Greg Bruno <bruno at rocksclusters.org>
Subject: Re: [Rocks-Discuss]one node short in "labels"
Date: Wed, 10 Dec 2003 15:12:49 -0800
To: Vincent Fox <vincent_b_fox at yahoo.com>

> So I go to the   "labels" selection on the web page to print out =
the=3D20
> pretty labels.   What a nice idea by the way!
> =3DA0
> EXCEPT....it's   one node short! I go up to 0-13 and this stops at=3D20
> 0-12.=3DA0 Any   ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

  - gb


-- __--__--

Message: 9
Cc: npaci-rocks-discussion at sdsc.edu
From: "Mason J. Katz" <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]"TypeError:    loop over non-sequence" when
trying to build CD distro
Date: Wed, 10 Dec 2003 15:16:27 -0800
To: "V. Rowley" <vrowley at ucsd.edu>

It looks like someone moved the profiles directory to profiles.orig.

     -mjk


[root at rocks14 install]# ls -l
total 56
drwxr-sr-x    3 root     wheel        4096 Dec    10 21:16 cdrom
drwxrwsr-x    5 root     wheel        4096 Dec    10 20:38 contrib.orig
drwxr-sr-x    3 root     wheel        4096 Dec    10 21:07=20
ftp.rocksclusters.org
drwxr-sr-x    3 root     wheel        4096 Dec    10 20:38=20
ftp.rocksclusters.org.orig
-r-xrwsr-x    1 root     wheel       19254 Sep     3   12:40   kickstart.cgi
drwxr-xr-x    3 root     root         4096 Dec    10   20:38   profiles.orig
drwxr-sr-x    3 root     wheel        4096 Dec    10   21:15   rocks-dist
drwxrwsr-x    3 root     wheel        4096 Dec    10   20:38   rocks-dist.orig
drwxr-sr-x    3 root     wheel        4096 Dec    10   21:02   src
drwxr-sr-x    4 root     wheel        4096 Dec    10   20:49   src.foo
On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:
>
> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20
> rocks-dist --dist=3Dcdrom cdrom
>
> on a server installed with ROCKS 3.0.0, I eventually get this:
>
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Adding support for rebuild distribution from source
>> Creating files (symbolic links - fast)
>> Creating symlinks to kickstart files
>> Fixing Comps Database
>> Generating hdlist (rpm database)
>> Patching second stage loader (eKV, partioning, ...)
>>      patching "rocks-ekv" into distribution ...
>>      patching "rocks-piece-pipe" into distribution ...
>>      patching "PyXML" into distribution ...
>>      patching "expat" into distribution ...
>>      patching "rocks-pylib" into distribution ...
>>      patching "MySQL-python" into distribution ...
>>      patching "rocks-kickstart" into distribution ...
>>      patching "rocks-kickstart-profiles" into distribution ...
>>      patching "rocks-kickstart-dtds" into distribution ...
>>      building CRAM filesystem ...
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Creating symlinks to kickstart files
>> Generating hdlist (rpm database)
>> Segregating RPMs (rocks, non-rocks)
>> sh: ./kickstart.cgi: No such file or directory
>> sh: ./kickstart.cgi: No such file or directory
>> Traceback (innermost last):
>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>      app.run()
>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>      eval('self.command_%s()' % (command))
>>   File "<string>", line 0, in ?
>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>      builder.build()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20
>> segregateRPMS
>>      for pkg in ks.getSection('packages'):
>> TypeError: loop over non-sequence
>
> Any ideas?
>
> --=20
> Vicky Rowley                              email: vrowley at ucsd.edu
> Biomedical Informatics Research Network      work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at=20
> http://www.sagacitech.com/Chinaweb


-- __--__--

Message: 10
Date: Wed, 10 Dec 2003 16:50:16 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: "Mason J. Katz" <mjk at sdsc.edu>
CC: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]"TypeError:   loop over non-sequence" when
trying
 to build CD distro

Yep, I did that, but only *AFTER* getting the error. [Thought it was=20
generated by the rocks-dist sequence, but apparently not.] Go ahead.=20
Move it back. Same difference.

Vicky

Mason J. Katz wrote:
> It looks like someone moved the profiles directory to profiles.orig.
>=20
>     -mjk
>=20
>=20
> [root at rocks14 install]# ls -l
> total 56
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
> drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07=20
> ftp.rocksclusters.org
> drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38=20
> ftp.rocksclusters.org.orig
> -r-xrwsr-x    1 root     wheel       19254 Sep 3 12:40 kickstart.cgi
> drwxr-xr-x    3 root     root          4096 Dec 10 20:38 profiles.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
> drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38
rocks-dist.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
> drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>=20
>> When I run this:
>>
>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20
>> rocks-dist --dist=3Dcdrom cdrom
>>
>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Adding support for rebuild distribution from source
>>> Creating files (symbolic links - fast)
>>> Creating symlinks to kickstart files
>>> Fixing Comps Database
>>> Generating hdlist (rpm database)
>>> Patching second stage loader (eKV, partioning, ...)
>>>     patching "rocks-ekv" into distribution ...
>>>     patching "rocks-piece-pipe" into distribution ...
>>>     patching "PyXML" into distribution ...
>>>     patching "expat" into distribution ...
>>>     patching "rocks-pylib" into distribution ...
>>>     patching "MySQL-python" into distribution ...
>>>     patching "rocks-kickstart" into distribution ...
>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>     building CRAM filesystem ...
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Creating symlinks to kickstart files
>>> Generating hdlist (rpm database)
>>> Segregating RPMs (rocks, non-rocks)
>>> sh: ./kickstart.cgi: No such file or directory
>>> sh: ./kickstart.cgi: No such file or directory
>>> Traceback (innermost last):
>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>      app.run()
>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>      eval('self.command_%s()' % (command))
>>>   File "<string>", line 0, in ?
>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>      builder.build()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20
>>> segregateRPMS
>>>      for pkg in ks.getSection('packages'):
>>> TypeError: loop over non-sequence
>>
>>
>> Any ideas?
>>
>> --=20
>> Vicky Rowley                              email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network      work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>=20
>=20
>=20

--=20
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb


-- __--__--

Message: 11
Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
trying to
 build CD distro
To: "V. Rowley" <vrowley at ucsd.edu>
Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in
/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was
> generated by the rocks-dist sequence, but apparently not.] Go ahead.
> Move it back. Same difference.
>
> Vicky
>
> Mason J. Katz wrote:
> > It looks like someone moved the profiles directory to profiles.orig.
> >
> >     -mjk
> >
> >
> > [root at rocks14 install]# ls -l
> > total 56
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
> > drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07
> > ftp.rocksclusters.org
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38
> > ftp.rocksclusters.org.orig
> > -r-xrwsr-x    1 root     wheel       19254 Sep 3 12:40
kickstart.cgi
> > drwxr-xr-x    3 root     root          4096 Dec 10 20:38
profiles.orig
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
> > drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38
rocks-dist.orig
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
> > drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
> >
> >> When I run this:
> >>
> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
> >> rocks-dist --dist=3Dcdrom cdrom
> >>
> >> on a server installed with ROCKS 3.0.0, I eventually get this:
> >>
> >>> Cleaning distribution
> >>> Resolving versions (RPMs)
> >>> Resolving versions (SRPMs)
> >>> Adding support for rebuild distribution from source
> >>> Creating files (symbolic links - fast)
> >>> Creating symlinks to kickstart files
> >>> Fixing Comps Database
> >>> Generating hdlist (rpm database)
> >>> Patching second stage loader (eKV, partioning, ...)
> >>>     patching "rocks-ekv" into distribution ...
> >>>     patching "rocks-piece-pipe" into distribution ...
> >>>     patching "PyXML" into distribution ...
> >>>     patching "expat" into distribution ...
> >>>     patching "rocks-pylib" into distribution ...
> >>>     patching "MySQL-python" into distribution ...
> >>>     patching "rocks-kickstart" into distribution ...
> >>>     patching "rocks-kickstart-profiles" into distribution ...
> >>>     patching "rocks-kickstart-dtds" into distribution ...
> >>>     building CRAM filesystem ...
> >>> Cleaning distribution
> >>> Resolving versions (RPMs)
> >>> Resolving versions (SRPMs)
> >>> Creating symlinks to kickstart files
> >>> Generating hdlist (rpm database)
> >>> Segregating RPMs (rocks, non-rocks)
> >>> sh: ./kickstart.cgi: No such file or directory
> >>> sh: ./kickstart.cgi: No such file or directory
> >>> Traceback (innermost last):
> >>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
> >>>     app.run()
> >>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
> >>>     eval('self.command_%s()' % (command))
> >>>   File "<string>", line 0, in ?
> >>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
> >>>     builder.build()
> >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
> >>>     (rocks, nonrocks) =3D self.segregateRPMS()
> >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
> >>> segregateRPMS
> >>>     for pkg in ks.getSection('packages'):
> >>> TypeError: loop over non-sequence
> >>
> >>
> >> Any ideas?
> >>
> >> --
> >> Vicky Rowley                              email: vrowley at ucsd.edu
> >> Biomedical Informatics Research Network      work: (858) 536-5980
> >> University of California, San Diego           fax: (858) 822-0828
> >> 9500 Gilman Drive
> >> La Jolla, CA 92093-0715
> >>
> >>
> >> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
> >
> >
> >
>
> --
> Vicky Rowley                              email: vrowley at ucsd.edu
> Biomedical Informatics Research Network      work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>
>




-- __--__--

_______________________________________________
npaci-rocks-discussion mailing list
npaci-rocks-discussion at sdsc.edu
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion


End of npaci-rocks-discussion Digest


DISCLAIMER:
This email is confidential and may be privileged. If you are not the =
intended recipient, please delete it and notify us immediately. Please =
do not copy or use it for any purpose, or disclose its contents to any =
other person as it may be an offence under the Official Secrets Act. =
Thank you.

--__--__--

Message: 2
Date: Wed, 10 Dec 2003 18:03:41 -0800
From: Terrence Martin <tmartin at physics.ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]Rocks 3.0.0

I am having a problem on install of rocks 3.0.0 on my new cluster.

The python error occurs right after anaconda starts and just before the
install asks for the roll CDROM.

The error refers to an inability to find or load rocks.file. The error
is associated I think with the window that pops up and asks you in put
the roll CDROM in.

The process I followed to get to this point is

Put the Rocks 3.0.0 CDROM into the CDROM drive
Boot the system
At the prompt type frontend
Wait till anaconda starts
Error referring to unable to load rocks.file.

I have successfully installed rocks on a smaller cluster but that has
different hardware. I used the same CDROM for both installs.

Any thoughts?

Terrence



--__--__--
Message: 3
Date: Wed, 10 Dec 2003 19:52:49 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]"TypeError:   loop over non-sequence" when
trying
 to build CD distro

Looks like python is okay:

> [root at rocks14 birn-oracle1]# which python
> /usr/bin/python
> [root at rocks14 birn-oracle1]# python --help
> Unknown option: --
> usage: python [option] ... [-c cmd | file | -] [arg] ...
> Options and arguments (and corresponding environment variables):
> -d     : debug output from parser (also PYTHONDEBUG=x)
> -i     : inspect interactively after running script, (also
PYTHONINSPECT=x)
>          and force prompts, even if stdin does not appear to be a
terminal
> -O     : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)
> -OO    : remove doc-strings in addition to the -O optimizations
> -S     : don't imply 'import site' on initialization
> -t     : issue warnings about inconsistent tab usage (-tt: issue
errors)
> -u     : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
> -v     : verbose (trace import statements) (also PYTHONVERBOSE=x)
> -x     : skip first line of source, allowing use of non-Unix forms of
#!cmd
> -X     : disable class based built-in exceptions
> -c cmd : program passed in as string (terminates option list)
> file   : program read from script file
> -      : program read from stdin (default; interactive mode if a tty)
> arg ...: arguments passed to program in sys.argv[1:]
> Other environment variables:
> PYTHONSTARTUP: file executed on interactive startup (no default)
> PYTHONPATH   : ':'-separated list of directories prefixed to the
>                 default module search path. The result is sys.path.
> PYTHONHOME   : alternate <prefix> directory (or
<prefix>:<exec_prefix>).
>                 The default module search path uses <prefix>/python1.5.
> [root at rocks14 birn-oracle1]#



Tim Carlson wrote:
> On Wed, 10 Dec 2003, V. Rowley wrote:
>
> Did you remove python by chance? kickstart.cgi calls python directly
in
> /usr/bin/python while rocks-dist does an "env python"
>
> Tim
>
>
>>Yep, I did that, but only *AFTER* getting the error. [Thought it was
>>generated by the rocks-dist sequence, but apparently not.] Go ahead.
>>Move it back. Same difference.
>>
>>Vicky
>>
>>Mason J. Katz wrote:
>>
>>>It looks like someone moved the profiles directory to profiles.orig.
>>>
>>>     -mjk
>>>
>>>
>>>[root at rocks14 install]# ls -l
>>>total 56
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:16 cdrom
>>>drwxrwsr-x     5 root     wheel        4096 Dec 10 20:38 contrib.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:07
>>>ftp.rocksclusters.org
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 20:38
>>>ftp.rocksclusters.org.orig
>>>-r-xrwsr-x     1 root     wheel       19254 Sep 3 12:40 kickstart.cgi
>>>drwxr-xr-x     3 root     root         4096 Dec 10 20:38 profiles.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:15 rocks-dist
>>>drwxrwsr-x     3 root     wheel        4096 Dec 10 20:38
rocks-dist.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:02 src
>>>drwxr-sr-x     4 root     wheel        4096 Dec 10 20:49 src.foo
>>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>>
>>>
>>>>When I run this:
>>>>
>>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>>>>rocks-dist --dist=cdrom cdrom
>>>>
>>>>on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>
>>>>
>>>>>Cleaning distribution
>>>>>Resolving versions (RPMs)
>>>>>Resolving versions (SRPMs)
>>>>>Adding support for rebuild distribution from source
>>>>>Creating files (symbolic links - fast)
>>>>>Creating symlinks to kickstart files
>>>>>Fixing Comps Database
>>>>>Generating hdlist (rpm database)
>>>>>Patching second stage loader (eKV, partioning, ...)
>>>>>     patching "rocks-ekv" into distribution ...
>>>>>     patching "rocks-piece-pipe" into distribution ...
>>>>>     patching "PyXML" into distribution ...
>>>>>     patching "expat" into distribution ...
>>>>>     patching "rocks-pylib" into distribution ...
>>>>>     patching "MySQL-python" into distribution ...
>>>>>     patching "rocks-kickstart" into distribution ...
>>>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>>>     building CRAM filesystem ...
>>>>>Cleaning distribution
>>>>>Resolving versions (RPMs)
>>>>>Resolving versions (SRPMs)
>>>>>Creating symlinks to kickstart files
>>>>>Generating hdlist (rpm database)
>>>>>Segregating RPMs (rocks, non-rocks)
>>>>>sh: ./kickstart.cgi: No such file or directory
>>>>>sh: ./kickstart.cgi: No such file or directory
>>>>>Traceback (innermost last):
>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>    app.run()
>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>    eval('self.command_%s()' % (command))
>>>>> File "<string>", line 0, in ?
>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>    builder.build()
>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>    (rocks, nonrocks) = self.segregateRPMS()
>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>>>>segregateRPMS
>>>>>    for pkg in ks.getSection('packages'):
>>>>>TypeError: loop over non-sequence
>>>>
>>>>
>>>>Any ideas?
>>>>
>>>>--
>>>>Vicky Rowley                              email: vrowley at ucsd.edu
>>>>Biomedical Informatics Research Network      work: (858) 536-5980
>>>>University of California, San Diego           fax: (858) 822-0828
>>>>9500 Gilman Drive
>>>>La Jolla, CA 92093-0715
>>>>
>>>>
>>>>See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>>>
>>>
>>>
>>--
>>Vicky Rowley                              email: vrowley at ucsd.edu
>>Biomedical Informatics Research Network      work: (858) 536-5980
>>University of California, San Diego           fax: (858) 822-0828
>>9500 Gilman Drive
>>La Jolla, CA 92093-0715
>>
>>
>>See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>>
>>
>
>
>
>

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715
See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb



--__--__--

_______________________________________________
npaci-rocks-discussion mailing list
npaci-rocks-discussion at sdsc.edu
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion


End of npaci-rocks-discussion Digest


DISCLAIMER:
This email is confidential and may be privileged. If you are not the intended
recipient, please delete it and notify us immediately. Please do not copy or use it
for any purpose, or disclose its contents to any other person as it may be an
offence under the Official Secrets Act. Thank you.


From naihh at imcb.a-star.edu.sg Thu Dec 11 00:09:34 2003
From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis)
Date: Thu, 11 Dec 2003 16:09:34 +0800
Subject: [Rocks-Discuss]RE: Install rocks on Titan64 Superblade Classic with Dual
Opteron 244
Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCDBA@EXIMCB2.imcb.a-star.edu.sg>


Hi,

Has anyone successfully install rocks on Titan64 Superblade Classic with
Dual Opteron 244?


Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: npaci-rocks-discussion-request at sdsc.edu
[mailto:npaci-rocks-discussion-request at sdsc.edu]
Sent: Thursday, December 11, 2003 11:54 AM
To: npaci-rocks-discussion at sdsc.edu
Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs

Send npaci-rocks-discussion mailing list submissions to
      npaci-rocks-discussion at sdsc.edu

To subscribe or unsubscribe via the World Wide Web, visit

http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
or, via email, send a message with subject or body 'help' to
npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list at
      npaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of npaci-rocks-discussion digest..."


Today's Topics:

   1. RE: Do you have a list of the various models of Gigabit Ethernet
Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis)
   2. Rocks 3.0.0 (Terrence Martin)
   3. Re: "TypeError: loop over non-sequence" when trying
       to build CD distro (V. Rowley)

--__--__--

Message: 1
Date: Thu, 11 Dec 2003 09:45:18 +0800
From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg>
To: <npaci-rocks-discussion at sdsc.edu>
Subject: [Rocks-Discuss]RE: Do you have a list of the various models of
Gigabit Ethernet Interfaces compatible to Rocks 3?



Hi All,

Do you have a list of the various gigabit Ethernet interfaces that are
compatible to Rocks 3?

I am changing my nodes connectivity from 10/100 to 1000.

Have anyone done that and how are the differences in performance or
turnaround time?

Have anyone successfully build a set of grid compute nodes using Rocks
3?


Thanks and Regards

Nai Hong Hwa Francis
Institute of Molecular and Cell Biology (A*STAR)
30 Medical Drive
Singapore 117609.
DID: (65) 6874-6196

-----Original Message-----
From: npaci-rocks-discussion-request at sdsc.edu
[mailto:npaci-rocks-discussion-request at sdsc.edu]=20
Sent: Thursday, December 11, 2003 9:25 AM
To: npaci-rocks-discussion at sdsc.edu
Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs

Send npaci-rocks-discussion mailing list submissions to
      npaci-rocks-discussion at sdsc.edu
To subscribe or unsubscribe via the World Wide Web, visit
=09
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
or, via email, send a message with subject or body 'help' to
      npaci-rocks-discussion-request at sdsc.edu

You can reach the person managing the list at
      npaci-rocks-discussion-admin at sdsc.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of npaci-rocks-discussion digest..."


Today's Topics:

   1. Non-homogenous legacy hardware (Chris Dwan (CCGB))
   2. Error during Make when building a new install floppy (Terrence
Martin)
   3. Re: Error during Make when building a new install floppy (Tim
Carlson)
   4. Re: Non-homogenous legacy hardware (Tim Carlson)
   5. ssh_known_hosts and ganglia (Jag)
   6. Re: ssh_known_hosts and ganglia (Mason J. Katz)
   7. "TypeError: loop over non-sequence" when trying to build CD
distro (V. Rowley)
   8. Re: one node short in "labels" (Greg Bruno)
   9. Re: "TypeError: loop over non-sequence" when trying to build CD
distro (Mason J. Katz)
  10. Re: "TypeError: loop over non-sequence" when trying
        to build CD distro (V. Rowley)
  11. Re: "TypeError: loop over non-sequence" when trying to
        build CD distro (Tim Carlson)

-- __--__--

Message: 1
Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)
From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]Non-homogenous legacy hardware


I am integrating legacy systems into a ROCKS cluster, and have hit a
snag with the auto-partition configuration: The new (old) systems have
SCSI disks, while old (new) ones contain IDE. This is a non-issue so
long as the initial install does its default partitioning. However, I
have a "replace-auto-partition.xml" file which is unworkable for the
SCSI
based systems since it makes specific reference to "hda" rather than
"sda."

I would like to have a site-nodes/replace-auto-partition.xml file with a
conditional such that "hda" or "sda" is used, based on the name of the
node (or some other criterion).

Is this possible?

Thanks, in advance.   If this is out there on the mailing list archives,
a
pointer would be greatly appreciated.

-Chris Dwan
 The University of Minnesota

-- __--__--

Message: 2
Date: Wed, 10 Dec 2003 12:09:11 -0800
From: Terrence Martin <tmartin at physics.ucsd.edu>
To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
Subject: [Rocks-Discuss]Error during Make when building a new install
floppy

I get the following error when I try to rebuild a boot floppy for rocks.

This is with the default CVS checkout with an update today according
to=20
the rocks userguide. I have not actually attempted to make any changes.

make[3]: Leaving directory=20
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'
make[2]: Leaving directory=20
`/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'
strip -o loader         anaconda-7.3/loader/loader
strip: anaconda-7.3/loader/loader: No such file or directory
make[1]: *** [loader] Error 1
make[1]: Leaving directory
`/home/install/rocks/src/rocks/boot/7.3/loader'
make: *** [loader] Error 2

Of course I could avoid all of this together and just put my binary=20
module into the appropriate location in the boot image.

Would it be correct to modify the following image file with my
changes=20
and then write it to a floppy via dd?

/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3
/en/os/i386/images/bootnet.img

Basically I am injecting an updated e1000 driver with changes to=20
pcitable to support the address of my gigabit cards.

Terrence


-- __--__--

Message: 3
Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]Error during Make when building a new
install floppy
To: Terrence Martin <tmartin at physics.ucsd.edu>
Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
Reply-to: Tim Carlson <tim.carlson at pnl.gov>
On Wed, 10 Dec 2003, Terrence Martin wrote:

> I get the following error when I try to rebuild a boot floppy for
rocks.
>

You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at
least it wasn't the last time I checked

> Of course I could avoid all of this together and just put my binary
> module into the appropriate location in the boot image.
>
> Would it be correct to modify the following image file with my changes
> and then write it to a floppy via dd?
>
>
/home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3
/en/os/i386/images/bootnet.img
>
> Basically I am injecting an updated e1000 driver with changes to
> pcitable to support the address of my gigabit cards.

Modifiying the bootnet.img is about 1/3 of what you need to do if you go
down that path. You also need to work on netstg1.img and you'll need to
update the drive in the kernel rpm that gets installed on the box. None
of
this is trivial.

If it were me, I would go down the same path I took for updating the
AIC79XX driver

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003
533.html

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support


-- __--__--

Message: 4
Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware
To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
Cc: npaci-rocks-discussion at sdsc.edu
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:

>
> I am integrating legacy systems into a ROCKS cluster, and have hit a
> snag with the auto-partition configuration: The new (old) systems
have
> SCSI disks, while old (new) ones contain IDE. This is a non-issue so
> long as the initial install does its default partitioning. However, I
> have a "replace-auto-partition.xml" file which is unworkable for the
SCSI
> based systems since it makes specific reference to "hda" rather than
> "sda."

If you have just a single drive, then you should be able to skip the
"--ondisk" bits of your "part" command

Otherwise, you would have first to do something ugly like the following:

http://penguin.epfl.ch/slides/kickstart/ks.cfg

You could probably (maybe) wrap most of that in an
<eval sh=3D"bash">
</eval>

block in the <main> block.

Just guessing.. haven't tried this.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support


-- __--__--

Message: 5
From: Jag <agrajag at dragaera.net>
To: npaci-rocks-discussion at sdsc.edu
Date: Wed, 10 Dec 2003 13:21:07 -0500
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia

I noticed a previous post on this list
(https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934
.html) indicating that Rocks distributes ssh keys for all the nodes over
ganglia. Can anyone enlighten me as to how this is done?

I looked through the ganglia docs and didn't see anything indicating how
to do this, so I'm assuming Rocks made some changes. Unfortunately the
rocks iso images don't seem to contain srpms, so I'm now coming here.=20
What did Rocks do to ganglia to make the distribution of ssh keys work?

Also, does anyone know where Rocks SRPMs can be found?    I've done quite
a bit of searching, but haven't found them anywhere.


-- __--__--

Message: 6
Cc: npaci-rocks-discussion at sdsc.edu
From: "Mason J. Katz" <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia
Date: Wed, 10 Dec 2003 14:39:15 -0800
To: Jag <agrajag at dragaera.net>
Most of the SRPMS are on our FTP site, but we've screwed this up =20
before. The SRPMS are entirely Rocks specific so they are of little =20
value outside of Rocks. You can also checkout our CVS tree =20
(cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We
=20
have a ganglia-python package we created to allow us to write our own
=20
metrics at a high level than the provide gmetric application. We've =20
also moved from this method to a single cluster-wide ssh key for Rocks
=20
3.1.

       -mjk

On Dec 10, 2003, at 10:21 AM, Jag wrote:

> I noticed a previous post on this list
> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20
> 001934.html) indicating that Rocks distributes ssh keys for all the
=20
> nodes over
> ganglia. Can anyone enlighten me as to how this is done?
>
> I looked through the ganglia docs and didn't see anything indicating
=20
> how
> to do this, so I'm assuming Rocks made some changes. Unfortunately
the
> rocks iso images don't seem to contain srpms, so I'm now coming here.
> What did Rocks do to ganglia to make the distribution of ssh keys
work?
>
> Also, does anyone know where Rocks SRPMs can be found? I've done
quite
> a bit of searching, but haven't found them anywhere.


-- __--__--

Message: 7
Date: Wed, 10 Dec 2003 14:43:49 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying
to build CD distro

When I run this:

[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist

--dist=3Dcdrom cdrom

on a server installed with ROCKS 3.0.0, I eventually get this:

>   Cleaning distribution
>   Resolving versions (RPMs)
>   Resolving versions (SRPMs)
>   Adding support for rebuild distribution from source
> Creating files (symbolic links - fast)
> Creating symlinks to kickstart files
> Fixing Comps Database
> Generating hdlist (rpm database)
> Patching second stage loader (eKV, partioning, ...)
>     patching "rocks-ekv" into distribution ...
>     patching "rocks-piece-pipe" into distribution ...
>     patching "PyXML" into distribution ...
>     patching "expat" into distribution ...
>     patching "rocks-pylib" into distribution ...
>     patching "MySQL-python" into distribution ...
>     patching "rocks-kickstart" into distribution ...
>     patching "rocks-kickstart-profiles" into distribution ...
>     patching "rocks-kickstart-dtds" into distribution ...
>     building CRAM filesystem ...
> Cleaning distribution
> Resolving versions (RPMs)
> Resolving versions (SRPMs)
> Creating symlinks to kickstart files
> Generating hdlist (rpm database)
> Segregating RPMs (rocks, non-rocks)
> sh: ./kickstart.cgi: No such file or directory
> sh: ./kickstart.cgi: No such file or directory
> Traceback (innermost last):
>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>     app.run()
>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>     eval('self.command_%s()' % (command))
>   File "<string>", line 0, in ?
>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>     builder.build()
>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>     (rocks, nonrocks) =3D self.segregateRPMS()
>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
segregateRPMS
>     for pkg in ks.getSection('packages'):
> TypeError: loop over non-sequence

Any ideas?

--=20
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb


-- __--__--

Message: 8
Cc: rocks <npaci-rocks-discussion at sdsc.edu>
From: Greg Bruno <bruno at rocksclusters.org>
Subject: Re: [Rocks-Discuss]one node short in "labels"
Date: Wed, 10 Dec 2003 15:12:49 -0800
To: Vincent Fox <vincent_b_fox at yahoo.com>

> So I go to the   "labels" selection on the web page to print out =
the=3D20
> pretty labels.   What a nice idea by the way!
> =3DA0
> EXCEPT....it's   one node short! I go up to 0-13 and this stops at=3D20
> 0-12.=3DA0 Any   ideas where I should check to fix this?

yeah, we found this corner case -- it'll be fixed in the next release.

thanks for bug report.

  - gb


-- __--__--

Message: 9
Cc: npaci-rocks-discussion at sdsc.edu
From: "Mason J. Katz" <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]"TypeError:    loop over non-sequence" when
trying to build CD distro
Date: Wed, 10 Dec 2003 15:16:27 -0800
To: "V. Rowley" <vrowley at ucsd.edu>

It looks like someone moved the profiles directory to profiles.orig.

     -mjk


[root at rocks14 install]# ls -l
total 56
drwxr-sr-x    3 root     wheel        4096 Dec    10 21:16 cdrom
drwxrwsr-x    5 root     wheel        4096 Dec    10 20:38 contrib.orig
drwxr-sr-x    3 root     wheel        4096 Dec    10 21:07=20
ftp.rocksclusters.org
drwxr-sr-x    3 root     wheel        4096 Dec    10 20:38=20
ftp.rocksclusters.org.orig
-r-xrwsr-x    1 root     wheel       19254 Sep     3   12:40   kickstart.cgi
drwxr-xr-x    3 root     root         4096 Dec    10   20:38   profiles.orig
drwxr-sr-x    3 root     wheel        4096 Dec    10   21:15   rocks-dist
drwxrwsr-x    3 root     wheel        4096 Dec    10   20:38   rocks-dist.orig
drwxr-sr-x    3 root     wheel        4096 Dec    10   21:02   src
drwxr-sr-x    4 root     wheel        4096 Dec    10   20:49   src.foo
On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:

> When I run this:
>
> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20
> rocks-dist --dist=3Dcdrom cdrom
>
> on a server installed with ROCKS 3.0.0, I eventually get this:
>
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Adding support for rebuild distribution from source
>> Creating files (symbolic links - fast)
>> Creating symlinks to kickstart files
>> Fixing Comps Database
>> Generating hdlist (rpm database)
>> Patching second stage loader (eKV, partioning, ...)
>>      patching "rocks-ekv" into distribution ...
>>      patching "rocks-piece-pipe" into distribution ...
>>      patching "PyXML" into distribution ...
>>      patching "expat" into distribution ...
>>      patching "rocks-pylib" into distribution ...
>>      patching "MySQL-python" into distribution ...
>>      patching "rocks-kickstart" into distribution ...
>>      patching "rocks-kickstart-profiles" into distribution ...
>>      patching "rocks-kickstart-dtds" into distribution ...
>>      building CRAM filesystem ...
>> Cleaning distribution
>> Resolving versions (RPMs)
>> Resolving versions (SRPMs)
>> Creating symlinks to kickstart files
>> Generating hdlist (rpm database)
>> Segregating RPMs (rocks, non-rocks)
>> sh: ./kickstart.cgi: No such file or directory
>> sh: ./kickstart.cgi: No such file or directory
>> Traceback (innermost last):
>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>      app.run()
>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>      eval('self.command_%s()' % (command))
>>   File "<string>", line 0, in ?
>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>      builder.build()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20
>> segregateRPMS
>>      for pkg in ks.getSection('packages'):
>> TypeError: loop over non-sequence
>
> Any ideas?
>
> --=20
> Vicky Rowley                              email: vrowley at ucsd.edu
> Biomedical Informatics Research Network      work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at=20
> http://www.sagacitech.com/Chinaweb


-- __--__--

Message: 10
Date: Wed, 10 Dec 2003 16:50:16 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: "Mason J. Katz" <mjk at sdsc.edu>
CC: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]"TypeError:   loop over non-sequence" when
trying
 to build CD distro

Yep, I did that, but only *AFTER* getting the error. [Thought it was=20
generated by the rocks-dist sequence, but apparently not.] Go ahead.=20
Move it back. Same difference.

Vicky

Mason J. Katz wrote:
> It looks like someone moved the profiles directory to profiles.orig.
>=20
>     -mjk
>=20
>=20
> [root at rocks14 install]# ls -l
> total 56
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
> drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07=20
> ftp.rocksclusters.org
> drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38=20
> ftp.rocksclusters.org.orig
> -r-xrwsr-x    1 root     wheel       19254 Sep 3 12:40 kickstart.cgi
> drwxr-xr-x    3 root     root          4096 Dec 10 20:38 profiles.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
> drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38
rocks-dist.orig
> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
> drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>=20
>> When I run this:
>>
>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20
>> rocks-dist --dist=3Dcdrom cdrom
>>
>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Adding support for rebuild distribution from source
>>> Creating files (symbolic links - fast)
>>> Creating symlinks to kickstart files
>>> Fixing Comps Database
>>> Generating hdlist (rpm database)
>>> Patching second stage loader (eKV, partioning, ...)
>>>     patching "rocks-ekv" into distribution ...
>>>     patching "rocks-piece-pipe" into distribution ...
>>>     patching "PyXML" into distribution ...
>>>     patching "expat" into distribution ...
>>>     patching "rocks-pylib" into distribution ...
>>>     patching "MySQL-python" into distribution ...
>>>     patching "rocks-kickstart" into distribution ...
>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>     building CRAM filesystem ...
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Creating symlinks to kickstart files
>>> Generating hdlist (rpm database)
>>> Segregating RPMs (rocks, non-rocks)
>>> sh: ./kickstart.cgi: No such file or directory
>>> sh: ./kickstart.cgi: No such file or directory
>>> Traceback (innermost last):
>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>      app.run()
>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>      eval('self.command_%s()' % (command))
>>>   File "<string>", line 0, in ?
>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>      builder.build()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20
>>> segregateRPMS
>>>      for pkg in ks.getSection('packages'):
>>> TypeError: loop over non-sequence
>>
>>
>> Any ideas?
>>
>> --=20
>> Vicky Rowley                              email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network      work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>=20
>=20
>=20

--=20
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715


See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb


-- __--__--

Message: 11
Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)
From: Tim Carlson <tim.carlson at pnl.gov>
Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
trying to
 build CD distro
To: "V. Rowley" <vrowley at ucsd.edu>
Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu
Reply-to: Tim Carlson <tim.carlson at pnl.gov>

On Wed, 10 Dec 2003, V. Rowley wrote:

Did you remove python by chance? kickstart.cgi calls python directly in
/usr/bin/python while rocks-dist does an "env python"

Tim

> Yep, I did that, but only *AFTER* getting the error. [Thought it was
> generated by the rocks-dist sequence, but apparently not.] Go ahead.
> Move it back. Same difference.
>
> Vicky
>
> Mason J. Katz wrote:
> > It looks like someone moved the profiles directory to profiles.orig.
> >
> >     -mjk
> >
> >
> > [root at rocks14 install]# ls -l
> > total 56
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
> > drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07
> > ftp.rocksclusters.org
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38
> > ftp.rocksclusters.org.orig
> > -r-xrwsr-x    1 root     wheel       19254 Sep 3 12:40
kickstart.cgi
> > drwxr-xr-x    3 root     root          4096 Dec 10 20:38
profiles.orig
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
> > drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38
rocks-dist.orig
> > drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
> > drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
> > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
> >
> >> When I run this:
> >>
> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
> >> rocks-dist --dist=3Dcdrom cdrom
> >>
> >> on a server installed with ROCKS 3.0.0, I eventually get this:
> >>
> >>> Cleaning distribution
> >>> Resolving versions (RPMs)
> >>> Resolving versions (SRPMs)
> >>> Adding support for rebuild distribution from source
> >>> Creating files (symbolic links - fast)
> >>> Creating symlinks to kickstart files
> >>> Fixing Comps Database
> >>> Generating hdlist (rpm database)
> >>> Patching second stage loader (eKV, partioning, ...)
> >>>     patching "rocks-ekv" into distribution ...
> >>>     patching "rocks-piece-pipe" into distribution ...
> >>>     patching "PyXML" into distribution ...
> >>>     patching "expat" into distribution ...
> >>>     patching "rocks-pylib" into distribution ...
> >>>     patching "MySQL-python" into distribution ...
> >>>     patching "rocks-kickstart" into distribution ...
> >>>     patching "rocks-kickstart-profiles" into distribution ...
> >>>     patching "rocks-kickstart-dtds" into distribution ...
> >>>     building CRAM filesystem ...
> >>> Cleaning distribution
> >>> Resolving versions (RPMs)
> >>> Resolving versions (SRPMs)
> >>> Creating symlinks to kickstart files
> >>> Generating hdlist (rpm database)
> >>> Segregating RPMs (rocks, non-rocks)
> >>> sh: ./kickstart.cgi: No such file or directory
> >>> sh: ./kickstart.cgi: No such file or directory
> >>> Traceback (innermost last):
> >>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
> >>>     app.run()
> >>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
> >>>     eval('self.command_%s()' % (command))
> >>>   File "<string>", line 0, in ?
> >>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
> >>>     builder.build()
> >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
> >>>     (rocks, nonrocks) =3D self.segregateRPMS()
> >>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
> >>> segregateRPMS
> >>>     for pkg in ks.getSection('packages'):
> >>> TypeError: loop over non-sequence
> >>
> >>
> >> Any ideas?
> >>
> >> --
> >> Vicky Rowley                              email: vrowley at ucsd.edu
> >> Biomedical Informatics Research Network      work: (858) 536-5980
> >> University of California, San Diego           fax: (858) 822-0828
> >> 9500 Gilman Drive
> >> La Jolla, CA 92093-0715
> >>
> >>
> >> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
> >
> >
> >
>
> --
> Vicky Rowley                              email: vrowley at ucsd.edu
> Biomedical Informatics Research Network      work: (858) 536-5980
> University of California, San Diego           fax: (858) 822-0828
> 9500 Gilman Drive
> La Jolla, CA 92093-0715
>
>
> See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>
>




-- __--__--

_______________________________________________
npaci-rocks-discussion mailing list
npaci-rocks-discussion at sdsc.edu
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion


End of npaci-rocks-discussion Digest


DISCLAIMER:
This email is confidential and may be privileged. If you are not the =
intended recipient, please delete it and notify us immediately. Please =
do not copy or use it for any purpose, or disclose its contents to any =
other person as it may be an offence under the Official Secrets Act. =
Thank you.

--__--__--

Message: 2
Date: Wed, 10 Dec 2003 18:03:41 -0800
From: Terrence Martin <tmartin at physics.ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]Rocks 3.0.0

I am having a problem on install of rocks 3.0.0 on my new cluster.

The python error occurs right after anaconda starts and just before the
install asks for the roll CDROM.

The error refers to an inability to find or load rocks.file. The error
is associated I think with the window that pops up and asks you in put
the roll CDROM in.

The process I followed to get to this point is

Put the Rocks 3.0.0 CDROM into the CDROM drive
Boot the system
At the prompt type frontend
Wait till anaconda starts
Error referring to unable to load rocks.file.

I have successfully installed rocks on a smaller cluster but that has
different hardware. I used the same CDROM for both installs.

Any thoughts?

Terrence



--__--__--
Message: 3
Date: Wed, 10 Dec 2003 19:52:49 -0800
From: "V. Rowley" <vrowley at ucsd.edu>
To: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]"TypeError:   loop over non-sequence" when
trying
 to build CD distro

Looks like python is okay:

> [root at rocks14 birn-oracle1]# which python
> /usr/bin/python
> [root at rocks14 birn-oracle1]# python --help
> Unknown option: --
> usage: python [option] ... [-c cmd | file | -] [arg] ...
> Options and arguments (and corresponding environment variables):
> -d     : debug output from parser (also PYTHONDEBUG=x)
> -i     : inspect interactively after running script, (also
PYTHONINSPECT=x)
>          and force prompts, even if stdin does not appear to be a
terminal
> -O     : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)
> -OO    : remove doc-strings in addition to the -O optimizations
> -S     : don't imply 'import site' on initialization
> -t     : issue warnings about inconsistent tab usage (-tt: issue
errors)
> -u     : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
> -v     : verbose (trace import statements) (also PYTHONVERBOSE=x)
> -x     : skip first line of source, allowing use of non-Unix forms of
#!cmd
> -X     : disable class based built-in exceptions
> -c cmd : program passed in as string (terminates option list)
> file   : program read from script file
> -      : program read from stdin (default; interactive mode if a tty)
> arg ...: arguments passed to program in sys.argv[1:]
> Other environment variables:
> PYTHONSTARTUP: file executed on interactive startup (no default)
> PYTHONPATH   : ':'-separated list of directories prefixed to the
>                 default module search path. The result is sys.path.
> PYTHONHOME   : alternate <prefix> directory (or
<prefix>:<exec_prefix>).
>                 The default module search path uses <prefix>/python1.5.
> [root at rocks14 birn-oracle1]#



Tim Carlson wrote:
> On Wed, 10 Dec 2003, V. Rowley wrote:
>
> Did you remove python by chance? kickstart.cgi calls python directly
in
> /usr/bin/python while rocks-dist does an "env python"
>
> Tim
>
>
>>Yep, I did that, but only *AFTER* getting the error. [Thought it was
>>generated by the rocks-dist sequence, but apparently not.] Go ahead.
>>Move it back. Same difference.
>>
>>Vicky
>>
>>Mason J. Katz wrote:
>>
>>>It looks like someone moved the profiles directory to profiles.orig.
>>>
>>>     -mjk
>>>
>>>
>>>[root at rocks14 install]# ls -l
>>>total 56
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:16 cdrom
>>>drwxrwsr-x     5 root     wheel        4096 Dec 10 20:38 contrib.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:07
>>>ftp.rocksclusters.org
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 20:38
>>>ftp.rocksclusters.org.orig
>>>-r-xrwsr-x     1 root     wheel       19254 Sep 3 12:40 kickstart.cgi
>>>drwxr-xr-x     3 root     root         4096 Dec 10 20:38 profiles.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:15 rocks-dist
>>>drwxrwsr-x     3 root     wheel        4096 Dec 10 20:38
rocks-dist.orig
>>>drwxr-sr-x     3 root     wheel        4096 Dec 10 21:02 src
>>>drwxr-sr-x     4 root     wheel        4096 Dec 10 20:49 src.foo
>>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>>
>>>
>>>>When I run this:
>>>>
>>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>>>>rocks-dist --dist=cdrom cdrom
>>>>
>>>>on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>
>>>>
>>>>>Cleaning distribution
>>>>>Resolving versions (RPMs)
>>>>>Resolving versions (SRPMs)
>>>>>Adding support for rebuild distribution from source
>>>>>Creating files (symbolic links - fast)
>>>>>Creating symlinks to kickstart files
>>>>>Fixing Comps Database
>>>>>Generating hdlist (rpm database)
>>>>>Patching second stage loader (eKV, partioning, ...)
>>>>>     patching "rocks-ekv" into distribution ...
>>>>>     patching "rocks-piece-pipe" into distribution ...
>>>>>     patching "PyXML" into distribution ...
>>>>>     patching "expat" into distribution ...
>>>>>     patching "rocks-pylib" into distribution ...
>>>>>     patching "MySQL-python" into distribution ...
>>>>>     patching "rocks-kickstart" into distribution ...
>>>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>>>     building CRAM filesystem ...
>>>>>Cleaning distribution
>>>>>Resolving versions (RPMs)
>>>>>Resolving versions (SRPMs)
>>>>>Creating symlinks to kickstart files
>>>>>Generating hdlist (rpm database)
>>>>>Segregating RPMs (rocks, non-rocks)
>>>>>sh: ./kickstart.cgi: No such file or directory
>>>>>sh: ./kickstart.cgi: No such file or directory
>>>>>Traceback (innermost last):
>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>    app.run()
>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>    eval('self.command_%s()' % (command))
>>>>> File "<string>", line 0, in ?
>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>    builder.build()
>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>    (rocks, nonrocks) = self.segregateRPMS()
>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>>>>segregateRPMS
>>>>>    for pkg in ks.getSection('packages'):
>>>>>TypeError: loop over non-sequence
>>>>
>>>>
>>>>Any ideas?
>>>>
>>>>--
>>>>Vicky Rowley                              email: vrowley at ucsd.edu
>>>>Biomedical Informatics Research Network      work: (858) 536-5980
>>>>University of California, San Diego           fax: (858) 822-0828
>>>>9500 Gilman Drive
>>>>La Jolla, CA 92093-0715
>>>>
>>>>
>>>>See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>>>
>>>
>>>
>>--
>>Vicky Rowley                              email: vrowley at ucsd.edu
>>Biomedical Informatics Research Network      work: (858) 536-5980
>>University of California, San Diego           fax: (858) 822-0828
>>9500 Gilman Drive
>>La Jolla, CA 92093-0715
>>
>>
>>See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb
>>
>>
>
>
>
>

--
Vicky Rowley                              email: vrowley at ucsd.edu
Biomedical Informatics Research Network      work: (858) 536-5980
University of California, San Diego           fax: (858) 822-0828
9500 Gilman Drive
La Jolla, CA 92093-0715
See pictures from our trip to China at
http://www.sagacitech.com/Chinaweb



--__--__--

_______________________________________________
npaci-rocks-discussion mailing list
npaci-rocks-discussion at sdsc.edu
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion


End of npaci-rocks-discussion Digest


DISCLAIMER:
This email is confidential and may be privileged. If you are not the intended
recipient, please delete it and notify us immediately. Please do not copy or use it
for any purpose, or disclose its contents to any other person as it may be an
offence under the Official Secrets Act. Thank you.


From wyzhong78 at msn.com Thu Dec 11 07:27:39 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Thu, 11 Dec 2003 23:27:39 +0800
Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?
Message-ID: <BAY3-F25UBUhr3ukkwu000156fe@hotmail.com>

I have build a rocks cluster with four double Xeon computer to run namd.one
  frontend and the other three to be compute.with intel's hyper threading
tecnology i have 16 cpus at all.
now I have some troubles. Maybe someone can help me.
  I created bellow pbs script named mytask.
#!/bin/csh
#PBS -N NAMD
#PBS -m be
#PBS -l ncpus=8
#PBS -l nodes=2
#
cd $PBS_O_WORKDIR
/charmrun namd2 +p8 mytask.namd

i typed:
qsub mytask
qrun N

then i use
qstat -f N

the message feedback showed(i'm sorry i can't copy the orgin message,just
the meaning)

host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1
cpu used: 8

it's strange why 4 hosts and 8 cpu used?
but when i saw ganlia, the cluster status. it show me only one node used
(fore example ,compute-0-0).both the other two are idle.
i want to know whether the job was doing by one or two node.
so i creat a new task specify to compute-0-1,message feedback show no
resource availabe.
while the task ended,i checked the information, found that the cpu time per
step is half of 4 cpus (1 nodes),but the whole time(include wall time) is
equal.
Does my namd job allocate to each node?
please help me!
thanks

_________________________________________________________________
?????????????? MSN Messenger: http://messenger.msn.com/cn



From bruno at rocksclusters.org Thu Dec 11 07:55:17 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Thu, 11 Dec 2003 07:55:17 -0800
Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform
In-Reply-To: <20031211064321.41781.qmail@web14801.mail.yahoo.com>
References: <20031211064321.41781.qmail@web14801.mail.yahoo.com>
Message-ID: <6A67C95F-2BF2-11D8-B821-000A95C4E3B4@rocksclusters.org>

outstanding -- thanks for the patch!

i just committed the change to cvs. the fix will be reflected in the
upcoming release (or immediately for anyone who has the rocks source
tree checked out on their local frontend).

    - gb


On Dec 10, 2003, at 10:43 PM, Vincent Fox wrote:

>   Okay, here's the?context diff?as plain text. I test-applied it using
>   "patch -p0 < atlas.patch" and did a compile on my PII box
>   successfully. I can send it as attachment or submit to CVS or some
>   other way if you need:
>   ?
>   *** atlas.spec.in.orig? Thu Dec 11 06:27:13 2003
>   --- atlas.spec.in?????? Thu Dec 11 06:30:46 2003
>   ***************
>   *** 111,117 ****
>   --- 111,133 ----
>   ? y
>   ? " | make
>   + elif [ $CPUID -eq 4 ]
>   + then
>   + #
>   + # Pentium II
>   + #
>   + echo "0
>   + y
>   + y
>   + n
>   + y
>   + linux
>   +   0
>   +   /usr/bin/g77
>   +   -O
>   +   y
>   +   " | make
>   ?   else
>   ?   #
>
>
>   Greg Bruno <bruno at rocksclusters.org>wrote:
>   > Okay, came up my own quick hack:
>   >
>   > Edit atlas.spec.in, go to "other x86" section, remove
>   > 2 lines right above "linux", seems to make rpm now.
>   >
>   > A more formal patch would be put in a section for
>   > cpuid eq 4 with this correction I suppose.
>
>   if you provide the patch, we'll include it in our next release.
>
>   - gb
>
>   Do you Yahoo!?
>   New Yahoo! Photos - easier uploading and sharing


From phil at sdsc.edu Thu Dec 11 08:00:06 2003
From: phil at sdsc.edu (Philip Papadopoulos)
Date: Thu, 11 Dec 2003 12:00:06 -0400
Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?
Message-ID: <1920451470-1071158479-cardhu_blackberry.rim.net-21416-@engine05>

The important thing to understand is the pbs only gives an allocation of nodes
(listed in the PBS_NODES environment variable) when the job is run. It is the
user's responsibility to actually start the code on multiple nodes. This is the way
pbs works on all platforms, not just rocks.

Pbs will start the submitted code (usually a script) on the first node listed in
PBS_NODES. This environment variable is only available once the queued job is
running. Your mytask script must explicitly start on the allocated nodes.

Pbs (actually maui) will pack jobs onto nodes by default, so allocating 8 cpu jobs
to four nodes is normal, but changable.

-p

-----Original Message-----
From: "zhong wenyu" <wyzhong78 at msn.com>
Date: Thu, 11 Dec 2003 23:27:39
To:npaci-rocks-discussion at sdsc.edu
Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?

I have build a rocks cluster with four double Xeon computer to run namd.one
 frontend and the other three to be compute.with intel's hyper threading
tecnology i have 16 cpus at all.
now I have some troubles. Maybe someone can help me.
 I created bellow pbs script named mytask.
#!/bin/csh
#PBS -N NAMD
#PBS -m be
#PBS -l ncpus=8
#PBS -l nodes=2
#
cd $PBS_O_WORKDIR
/charmrun namd2 +p8 mytask.namd

i typed:
qsub mytask
qrun N

then i use
qstat -f N

the message feedback showed(i'm sorry i can't copy the orgin message,just
the meaning)

host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1
cpu used: 8

it's strange why 4 hosts and 8 cpu used?
but when i saw ganlia, the cluster status. it show me only one node used
(fore example ,compute-0-0).both the other two are idle.
i want to know whether the job was doing by one or two node.
so i creat a new task specify to compute-0-1,message feedback show no
resource availabe.
while the task ended,i checked the information, found that the cpu time per
step is half of 4 cpus (1 nodes),but the whole time(include wall time) is
equal.
Does my namd job allocate to each node?
please help me!
thanks

_________________________________________________________________
???????????????????????????? MSN Messenger: http://messenger.msn.com/cn


Sent via BlackBerry - a service from AT&T Wireless.

From jlkaiser at fnal.gov Thu Dec 11 08:28:08 2003
From: jlkaiser at fnal.gov (Joe Kaiser)
Date: Thu, 11 Dec 2003 10:28:08 -0600
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...
In-Reply-To: <1071007177.18100.58.camel@squash.scalableinformatics.com>
References: <1071007177.18100.58.camel@squash.scalableinformatics.com>
Message-ID: <1071160088.18486.25.camel@nietzsche.fnal.gov>

Hi,

I'm sorry, I thought I sent email to the list reporting how I did this.

You have not said what motherboard you are using or what the error
exactly is. The instructions below are for the X5DPA-GG and the error
isn't reported as an error, I just get prompted to insert my driver.

If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to
make a change to the pcitable on the initrd.img. The current pcitable
on the initrd.img does NOT have the proper deviceId for the e1000 for
this board. If you look in /etc/sysconfig/hwconf and search for the
e1000, you will find this:

class: NETWORK
bus: PCI
detached: 0
device: eth
driver: e1000
desc: "Unknown vendor|Generic e1000 device"
vendorId: 8086
deviceId: 1013
subVendorId: 8086
subDeviceId: 1213
pciType: 1

The device ID is 1013. If you look in the pcitable that comes off of
the initrd.img you will see that the highest the e1000 device id's go is
1012. Just add in the proper line to the initrd.img in your /tftpboot
directory and it should work. Instructions are below.

Here are the instructions:

This should be done on the frontend:

cd /tftpboot/X86PC/UNDI/pxelinux/
cp initrd.img initrd.img.orig
cp initrd.img /tmp
cd /tmp
mv initrd.img initrd.gz
gunzip initrd.gz
mkdir /mnt/loop
mount -o loop initrd /mnt/loop
cd /mnt/loop/modules/
vi pcitable

Search for the e1000 drivers and add the following line:

0x8086 0x1013    "e1000" "Intel Corp.|82546EB Gigabit Ethernet
Controller"

write the file

cd /tmp
umount /mnt/loop
gzip initrd
mv initrd.gz initrd.img
mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/

Then boot the node.

Hope this helps.

Thanks,

Joe

On Tue, 2003-12-09 at 15:59, Joe Landman wrote:
> Folks:
>
>   As indicated previously, I am wrestling with a Supermicro based
> cluster. None of the RH distributions come with the correct E1000
> driver, so a new kernel is needed (in the boot CD, and for
> installation).
>
>   The problem I am running into is that it isn't at all obvious/easy how
> to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
> this thing to work. Following the examples in the documentation have
> not met with success. Running "rocks-dist cdrom" with the new kernels
> (2.4.23 works nicely on the nodes) in the force/RPMS directory generates
> a bootable CD with the original 2.4.18BOOT kernel.
>
>   What I (and I think others) need, is a simple/easy to follow method
> that will generate a bootable CD with the correct linux kernel, and the
> correct modules.
>
>   Is this in process somewhere? What would be tremendously helpful is
> if we can generate a binary module, and put that into the boot process
> by placing it into the force/modules/binary directory (assuming one
> exists) with the appropriate entry of a similar name in the
> force/modules/meta directory as a simple XML document giving pci-ids,
> description, name, etc.
>
>   Anything close to this coming? Modules are killing future ROCKS
> installs, the inability to easily inject a new module in there has
> created a problem whereby ROCKS does not function (as the underlying RH
> does not function).
>
>
>
--
===================================================================
Joe Kaiser - Systems Administrator

Fermi Lab
CD/OSS-SCS                Never laugh at live dragons.
630-840-6444
jlkaiser at fnal.gov
===================================================================



From jghobrial at uh.edu Thu Dec 11 08:41:42 2003
From: jghobrial at uh.edu (Joseph)
Date: Thu, 11 Dec 2003 10:41:42 -0600 (CST)
Subject: [Rocks-Discuss]Re: Rocks Pythone Error with rocks.file
In-Reply-To: <3FD82F68.9070600@physics.ucsd.edu>
References: <3FD82F68.9070600@physics.ucsd.edu>
Message-ID: <Pine.LNX.4.56.0312111001150.9106@mail.tlc2.uh.edu>

On Thu, 11 Dec 2003, Terrence Martin wrote:

>   I am having the exact same error that you reported to the list on my
>   cluster when I try to install rocks 3.0.0.
>
>   X tries to start, fails, then just before the HPC roll is supposed to
>   start I get the python error about not being able to load the rocks.file.
>
>   The thing is that my system is a dual Xeon supermicro not AMD, so it
>   must not be an AMD specific issue.
>
> Did you ever find a resolution to the problem?
>
> Thanks,
>
> Terrence
>

Yes, I guess you should check your memory as Greg suggests, but my
solution was to install the frontend on a different machine and then take
the HD back to the original frontend. The only problem that I had was that
the build box was a single processor setup so when I went back to the
dual-AMD pvfs fails because it was built against a non-SMP kernel.
I installed the SMP kernel and noticed this problem.

It seems the problem may be related to an SMP issue do to the fact we both
have an SMP setup. I did not check the frontend's memory so this may still
be a factor, but I have had no trouble with the box after the installation.

My initial problem was a booting problem on the frontend due to a cdrom
issue. All my other attempts at installing failed with the error you mentioned, but
as I
posted early I tried 3 different AMD single processor boxes and they
failed. The boxes are up all the time and stressed pretty hard so I don't
believe it is a memory issue.

This is some very strange behaviour.

Thanks,
Joseph



From shewa at inel.gov Thu Dec 11 10:02:59 2003
From: shewa at inel.gov (Andrew Shewmaker)
Date: Thu, 11 Dec 2003 11:02:59 -0700
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
Message-ID: <3FD8B153.6000205@inel.gov>

"Mason J. Katz" <mjk at sdsc.edu> wrote:

 > We've also moved from this method to a single cluster-wide ssh key for
 > Rocks 3.1.

How does a single key work? I have successfully set up ssh host
based authentication for some non-Rocks systems using

http://www.omega.telia.net/vici/openssh/

(Note that OpenSSH_3.7.1p2 requires one more setting in addition
to those mentioned in the above url.

In <dir-of-ssh-conf-files>/ssh_config:
EnableSSHKeysign yes)

But I thought it still requires that each host in the has a key...
am I wrong? Do you do it differently?

Thanks,
Andrew

--
Andrew Shewmaker, Associate Engineer
Phone: 1-208-526-1415
Idaho National Eng. and Environmental Lab.
P.0. Box 1625, M.S. 3605
Idaho Falls, Idaho 83415-3605



From tmartin at physics.ucsd.edu Thu Dec 11 11:13:16 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Thu, 11 Dec 2003 11:13:16 -0800
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets
 ...
In-Reply-To: <1071160088.18486.25.camel@nietzsche.fnal.gov>
References: <1071007177.18100.58.camel@squash.scalableinformatics.com>
<1071160088.18486.25.camel@nietzsche.fnal.gov>
Message-ID: <3FD8C1CC.20700@physics.ucsd.edu>

Hi Joe,

Do you know if 2.3.2 can also benefit from the same small change?

Terrence

Joe Kaiser wrote:
> Hi,
>
> I'm sorry, I thought I sent email to the list reporting how I did this.
>
> You have not said what motherboard you are using or what the error
> exactly is. The instructions below are for the X5DPA-GG and the error
> isn't reported as an error, I just get prompted to insert my driver.
>
> If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to
> make a change to the pcitable on the initrd.img. The current pcitable
> on the initrd.img does NOT have the proper deviceId for the e1000 for
> this board. If you look in /etc/sysconfig/hwconf and search for the
> e1000, you will find this:
>
> class: NETWORK
> bus: PCI
> detached: 0
> device: eth
> driver: e1000
> desc: "Unknown vendor|Generic e1000 device"
> vendorId: 8086
> deviceId: 1013
> subVendorId: 8086
> subDeviceId: 1213
> pciType: 1
>
> The device ID is 1013. If you look in the pcitable that comes off of
> the initrd.img you will see that the highest the e1000 device id's go is
> 1012. Just add in the proper line to the initrd.img in your /tftpboot
> directory and it should work. Instructions are below.
>
> Here are the instructions:
>
> This should be done on the frontend:
>
> cd /tftpboot/X86PC/UNDI/pxelinux/
> cp initrd.img initrd.img.orig
> cp initrd.img /tmp
> cd /tmp
> mv initrd.img initrd.gz
> gunzip initrd.gz
> mkdir /mnt/loop
> mount -o loop initrd /mnt/loop
> cd /mnt/loop/modules/
> vi pcitable
>
> Search for the e1000 drivers and add the following line:
>
> 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet
> Controller"
>
> write the file
>
> cd /tmp
> umount /mnt/loop
> gzip initrd
> mv initrd.gz initrd.img
> mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/
>
> Then boot the node.
>
> Hope this helps.
>
> Thanks,
>
> Joe
>
> On Tue, 2003-12-09 at 15:59, Joe Landman wrote:
>
>>Folks:
>>
>> As indicated previously, I am wrestling with a Supermicro based
>>cluster. None of the RH distributions come with the correct E1000
>>driver, so a new kernel is needed (in the boot CD, and for
>>installation).
>>
>> The problem I am running into is that it isn't at all obvious/easy how
>>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
>>this thing to work. Following the examples in the documentation have
>>not met with success. Running "rocks-dist cdrom" with the new kernels
>>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates
>>a bootable CD with the original 2.4.18BOOT kernel.
>>
>> What I (and I think others) need, is a simple/easy to follow method
>>that will generate a bootable CD with the correct linux kernel, and the
>>correct modules.
>>
>> Is this in process somewhere? What would be tremendously helpful is
>>if we can generate a binary module, and put that into the boot process
>>by placing it into the force/modules/binary directory (assuming one
>>exists) with the appropriate entry of a similar name in the
>>force/modules/meta directory as a simple XML document giving pci-ids,
>>description, name, etc.
>>
>> Anything close to this coming? Modules are killing future ROCKS
>>installs, the inability to easily inject a new module in there has
>>created a problem whereby ROCKS does not function (as the underlying RH
>>does not function).
>>
>>
>>




From tmartin at physics.ucsd.edu Thu Dec 11 11:19:55 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Thu, 11 Dec 2003 11:19:55 -0800
Subject: [Rocks-Discuss]Re: Rocks Pythone Error with rocks.file
In-Reply-To: <Pine.LNX.4.56.0312111001150.9106@mail.tlc2.uh.edu>
References: <3FD82F68.9070600@physics.ucsd.edu>
<Pine.LNX.4.56.0312111001150.9106@mail.tlc2.uh.edu>
Message-ID: <3FD8C35B.2090309@physics.ucsd.edu>

I am fairly certain it is not the memory even without memtest86. I have
in my office the same Supermicro 613A-Xi (SB-613A-Xi-B) with a SUPER
X5DPA-GG motherboard as the ones at the SDSC but it is from a different
vendor and completely different ram from another manufacturer.

When I put rocks 3.0.0 into it I get the crash of the installer in the
same spot, right after the system attempts to start Xwindows and fails
(either it fails because it just fails to start X or if a mouse is not
present) a python error comes up complaining that the rocks.file could
not be found.

On the exact same system rocks 2.3.2 installs fine.

Terrence

Joseph wrote:
> On Thu, 11 Dec 2003, Terrence Martin wrote:
>
>
>>I am having the exact same error that you reported to the   list on my
>>cluster when I try to install rocks 3.0.0.
>>
>>X tries to start, fails, then just before the HPC roll is   supposed to
>>start I get the python error about not being able to load   the rocks.file.
>>
>>The thing is that my system is a dual Xeon supermicro not   AMD, so it
>>must not be an AMD specific issue.
>>
>>Did you ever find a resolution to the problem?
>>
>>Thanks,
>>
>>Terrence
>>
>
>
> Yes, I guess you should check your memory as Greg suggests, but my
> solution was to install the frontend on a different machine and then take
> the HD back to the original frontend. The only problem that I had was that
> the build box was a single processor setup so when I went back to the
> dual-AMD pvfs fails because it was built against a non-SMP kernel.
> I installed the SMP kernel and noticed this problem.
>
> It seems the problem may be related to an SMP issue do to the fact we both
> have an SMP setup. I did not check the frontend's memory so this may still
> be a factor, but I have had no trouble with the box after the installation.
>
> My initial problem was a booting problem on the frontend due to a cdrom
> issue. All my other attempts at installing failed with the error you mentioned,
but as I
> posted early I tried 3 different AMD single processor boxes and they
> failed. The boxes are up all the time and stressed pretty hard so I don't
> believe it is a memory issue.
>
> This is some very strange behaviour.
>
> Thanks,
> Joseph
>




From landman at scalableinformatics.com Thu Dec 11 11:42:14 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Dec 2003 14:42:14 -0500
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets
      ...
In-Reply-To: <3FD8C1CC.20700@physics.ucsd.edu>
References: <1071007177.18100.58.camel@squash.scalableinformatics.com>
       <1071160088.18486.25.camel@nietzsche.fnal.gov>
       <3FD8C1CC.20700@physics.ucsd.edu>
Message-ID: <1071171734.6164.12.camel@squash.scalableinformatics.com>

Hi Terrence and Joe:

  These are indeed X5DPA-GG. I am working on a device driver disk for
3.0 ROCKS. If this works, it is a weak hack, but it might be fine.
More later (testing it now as we speak)..

Joe


On Thu, 2003-12-11 at 14:13, Terrence Martin wrote:
> Hi Joe,
>
> Do you know if 2.3.2 can also benefit from the same small change?
>
> Terrence
>
> Joe Kaiser wrote:
> > Hi,
>   >
>   >   I'm sorry, I thought I sent email to the list reporting how I did this.
>   >
>   >   You have not said what motherboard you are using or what the error
>   >   exactly is. The instructions below are for the X5DPA-GG and the error
>   >   isn't reported as an error, I just get prompted to insert my driver.
>   >
>   >   If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to
>   >   make a change to the pcitable on the initrd.img. The current pcitable
>   >   on the initrd.img does NOT have the proper deviceId for the e1000 for
>   >   this board. If you look in /etc/sysconfig/hwconf and search for the
>   >   e1000, you will find this:
>   >
>   >   class: NETWORK
>   >   bus: PCI
>   >   detached: 0
>   >   device: eth
>   >   driver: e1000
>   >   desc: "Unknown vendor|Generic e1000 device"
>   >   vendorId: 8086
>   >   deviceId: 1013
>   >   subVendorId: 8086
>   >   subDeviceId: 1213
>   >   pciType: 1
>   >
>   >   The device ID is 1013. If you look in the pcitable that comes off of
>   >   the initrd.img you will see that the highest the e1000 device id's go is
>   >   1012. Just add in the proper line to the initrd.img in your /tftpboot
>   >   directory and it should work. Instructions are below.
>   >
>   >   Here are the instructions:
>   >
>   >   This should be done on the frontend:
>   >
>   >   cd /tftpboot/X86PC/UNDI/pxelinux/
>   >   cp initrd.img initrd.img.orig
>   >   cp initrd.img /tmp
>   >   cd /tmp
>   >   mv initrd.img initrd.gz
>   >   gunzip initrd.gz
>   >   mkdir /mnt/loop
>   >   mount -o loop initrd /mnt/loop
>   >   cd /mnt/loop/modules/
>   >   vi pcitable
>   >
>   >   Search for the e1000 drivers and add the following line:
>   >
>   >   0x8086 0x1013    "e1000" "Intel Corp.|82546EB Gigabit Ethernet
>   >   Controller"
>   >
>   >   write the file
>   >
>   >   cd /tmp
>   >   umount /mnt/loop
>   >   gzip initrd
>   >   mv initrd.gz initrd.img
>   >   mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/
>   >
>   >   Then boot the node.
> >
> > Hope this helps.
> >
> > Thanks,
> >
> > Joe
> >
> > On Tue, 2003-12-09 at 15:59, Joe Landman wrote:
> >
> >>Folks:
> >>
> >> As indicated previously, I am wrestling with a Supermicro based
> >>cluster. None of the RH distributions come with the correct E1000
> >>driver, so a new kernel is needed (in the boot CD, and for
> >>installation).
> >>
> >> The problem I am running into is that it isn't at all obvious/easy how
> >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
> >>this thing to work. Following the examples in the documentation have
> >>not met with success. Running "rocks-dist cdrom" with the new kernels
> >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates
> >>a bootable CD with the original 2.4.18BOOT kernel.
> >>
> >> What I (and I think others) need, is a simple/easy to follow method
> >>that will generate a bootable CD with the correct linux kernel, and the
> >>correct modules.
> >>
> >> Is this in process somewhere? What would be tremendously helpful is
> >>if we can generate a binary module, and put that into the boot process
> >>by placing it into the force/modules/binary directory (assuming one
> >>exists) with the appropriate entry of a similar name in the
> >>force/modules/meta directory as a simple XML document giving pci-ids,
> >>description, name, etc.
> >>
> >> Anything close to this coming? Modules are killing future ROCKS
> >>installs, the inability to easily inject a new module in there has
> >>created a problem whereby ROCKS does not function (as the underlying RH
> >>does not function).
> >>
> >>
> >>
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615



From jlkaiser at fnal.gov Thu Dec 11 11:33:03 2003
From: jlkaiser at fnal.gov (Joe Kaiser)
Date: Thu, 11 Dec 2003 13:33:03 -0600
Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ...
In-Reply-To: <3FD8C1CC.20700@physics.ucsd.edu>
References: <1071007177.18100.58.camel@squash.scalableinformatics.com>
 <1071160088.18486.25.camel@nietzsche.fnal.gov>
 <3FD8C1CC.20700@physics.ucsd.edu>
Message-ID: <1071171183.18486.28.camel@nietzsche.fnal.gov>
I am not sure.   Presumably, yes....

On Thu, 2003-12-11 at 13:13, Terrence Martin wrote:
> Hi Joe,
>
> Do you know if 2.3.2 can also benefit from the same small change?
>
> Terrence
>
> Joe Kaiser wrote:
> > Hi,
> >
> > I'm sorry, I thought I sent email to the list reporting how I did this.
> >
> > You have not said what motherboard you are using or what the error
> > exactly is. The instructions below are for the X5DPA-GG and the error
> > isn't reported as an error, I just get prompted to insert my driver.
> >
> > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to
> > make a change to the pcitable on the initrd.img. The current pcitable
> > on the initrd.img does NOT have the proper deviceId for the e1000 for
> > this board. If you look in /etc/sysconfig/hwconf and search for the
> > e1000, you will find this:
> >
> > class: NETWORK
> > bus: PCI
> > detached: 0
> > device: eth
> > driver: e1000
> > desc: "Unknown vendor|Generic e1000 device"
> > vendorId: 8086
> > deviceId: 1013
> > subVendorId: 8086
> > subDeviceId: 1213
> > pciType: 1
> >
> > The device ID is 1013. If you look in the pcitable that comes off of
> > the initrd.img you will see that the highest the e1000 device id's go is
> > 1012. Just add in the proper line to the initrd.img in your /tftpboot
> > directory and it should work. Instructions are below.
> >
> > Here are the instructions:
> >
> > This should be done on the frontend:
> >
> > cd /tftpboot/X86PC/UNDI/pxelinux/
> > cp initrd.img initrd.img.orig
> > cp initrd.img /tmp
> > cd /tmp
> > mv initrd.img initrd.gz
> > gunzip initrd.gz
> > mkdir /mnt/loop
> > mount -o loop initrd /mnt/loop
> > cd /mnt/loop/modules/
> > vi pcitable
> >
> > Search for the e1000 drivers and add the following line:
> >
> > 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet
> > Controller"
> >
> > write the file
> >
> > cd /tmp
> > umount /mnt/loop
> > gzip initrd
> > mv initrd.gz initrd.img
> > mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/
> >
> > Then boot the node.
> >
> > Hope this helps.
> >
> > Thanks,
> >
> > Joe
> >
> > On Tue, 2003-12-09 at 15:59, Joe Landman wrote:
> >
> >>Folks:
> >>
> >> As indicated previously, I am wrestling with a Supermicro based
> >>cluster. None of the RH distributions come with the correct E1000
> >>driver, so a new kernel is needed (in the boot CD, and for
> >>installation).
> >>
> >> The problem I am running into is that it isn't at all obvious/easy how
> >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable
> >>this thing to work. Following the examples in the documentation have
> >>not met with success. Running "rocks-dist cdrom" with the new kernels
> >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates
> >>a bootable CD with the original 2.4.18BOOT kernel.
> >>
> >> What I (and I think others) need, is a simple/easy to follow method
> >>that will generate a bootable CD with the correct linux kernel, and the
> >>correct modules.
> >>
> >> Is this in process somewhere? What would be tremendously helpful is
> >>if we can generate a binary module, and put that into the boot process
> >>by placing it into the force/modules/binary directory (assuming one
> >>exists) with the appropriate entry of a similar name in the
> >>force/modules/meta directory as a simple XML document giving pci-ids,
> >>description, name, etc.
> >>
> >> Anything close to this coming? Modules are killing future ROCKS
> >>installs, the inability to easily inject a new module in there has
> >>created a problem whereby ROCKS does not function (as the underlying RH
> >>does not function).
> >>
> >>
> >>
--
===================================================================
Joe Kaiser - Systems Administrator

Fermi Lab
CD/OSS-SCS                Never laugh at live dragons.
630-840-6444
jlkaiser at fnal.gov
===================================================================



From landman at scalableinformatics.com Thu Dec 11 11:51:51 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Dec 2003 14:51:51 -0500
Subject: [Rocks-Discuss]driver disk for e1000 for rocks 3.0.0
Message-ID: <1071172311.6164.18.camel@squash.scalableinformatics.com>

Folks:

  I have built a slightly modified RedHat 7.3 driver disk with the
updated 5.2.22 e1000 driver. I verified that this does indeed work on
my systems (during initial portion of ROCKS install, I can now insmod
e1000 in the shell window and see the ethernet... this is a big change
from before). If you want the driver disk grab it from
http://scalableinformatics.com/downloads/newdrv.img . To use it while
installing a front end, type

     frontend dd

at the boot prompt (not just frontend). I believe it should work for
the compute nodes as well (i will test it soon). Now it is time to work
around the rest of the Supermicro "features".
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615



From dtwright at uiuc.edu Thu Dec 11 12:32:54 2003
From: dtwright at uiuc.edu (Dan Wright)
Date: Thu, 11 Dec 2003 14:32:54 -0600
Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node?
In-Reply-To: <BAY3-F25UBUhr3ukkwu000156fe@hotmail.com>
References: <BAY3-F25UBUhr3ukkwu000156fe@hotmail.com>
Message-ID: <20031211203254.GP6476@uiuc.edu>

NAMD2 needs some more information to be started on multiple nodes like that.
You need to give it a nodelist, in particular, so it knows where to run
itself. We run namd2 on several clusters here (UIUC chemistry department).

Below is a script used to exec namd2 with the right options, etc, on a
cluster. Below that is a script that automates the PBS job submission.    Hope this
helps!

- Dan Wright
(dtwright at uiuc.edu)
(http://www.scs.uiuc.edu/)
(UNIX Systems Administrator, School of Chemical Sciences)
(333-1728)
-- namd2.csh --

#!/bin/csh
# Script to run NAMD2 on the cluster automatically.
# Courtesy of Jim Phillips.

setenv CONV_RSH ssh
setenv TMPDIR /tmp
setenv BINDIR /home/NAMD

if ( $?PBS_JOBID ) then
  if ( $?PBS_NODEFILE ) then
     set nodes = `cat $PBS_NODEFILE`
  else
     set nodes = localhost
  endif
  set nodefile = $TMPDIR/namd2.nodelist.$PBS_JOBID
  echo group main >! $nodefile
  foreach node ( $nodes )
     echo host $node >> $nodefile
  end
  $BINDIR/charmrun $BINDIR/namd2 +p$#nodes ++nodelist $nodefile $*
else
  $BINDIR/charmrun $BINDIR/namd2 ++local $*
endif

-------------

Here's an example script using this to start namd2 on 8 uniprocessor nodes;
you'd just run it as "namd2-8p <jobfile>" to automatically do the PBS job
submission and everything.

-- namd2-8p --

#!/bin/bash
# This script runs namd2 on 8 nodes.
#

echo
echo "Please remember to specify the FULL PATH to your namd2 job file."
echo "If you haven't done that, please press ctrl-c now and re-run"
echo "this command with the full path."
echo
sleep 10

export SCRIPTFILE=/tmp/namd2-script.$USER.`date "+%s"`
export NAMD_SCRIPT=/usr/local/bin/namd2.csh

NAMD_CMD="$NAMD_SCRIPT $* > $HOME/namd2.out.`date '+%d%b%Y-%H:%M:%S'` 2>&1"

cat >$SCRIPTFILE <<EOF
#!/bin/bash
#PBS -l nodes=8

EOF
echo $NAMD_CMD >> $SCRIPTFILE
echo "exit" >> $SCRIPTFILE
/usr/apps/pbs/bin/qsub -V $SCRIPTFILE
sleep 5

rm -f $SCRIPTFILE

--------------


zhong wenyu said:
> I have build a rocks cluster with four double Xeon computer to run namd.one
> frontend and the other three to be compute.with intel's hyper threading
> tecnology i have 16 cpus at all.
> now I have some troubles. Maybe someone can help me.
> I created bellow pbs script named mytask.
> #!/bin/csh
> #PBS -N NAMD
> #PBS -m be
> #PBS -l ncpus=8
> #PBS -l nodes=2
> #
> cd $PBS_O_WORKDIR
> /charmrun namd2 +p8 mytask.namd
>
> i typed:
> qsub mytask
> qrun N
>
> then i use
> qstat -f N
>
> the message feedback showed(i'm sorry i can't copy the orgin message,just
> the meaning)
>
> host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1
> cpu used: 8
>
> it's strange why 4 hosts and 8 cpu used?
> but when i saw ganlia, the cluster status. it show me only one node used
> (fore example ,compute-0-0).both the other two are idle.
> i want to know whether the job was doing by one or two node.
> so i creat a new task specify to compute-0-1,message feedback show no
> resource availabe.
> while the task ended,i checked the information, found that the cpu time per
> step is half of 4 cpus (1 nodes),but the whole time(include wall time) is
> equal.
> Does my namd job allocate to each node?
> please help me!
> thanks
>
> _________________________________________________________________
> ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn
>
- Dan Wright
(dtwright at uiuc.edu)
(http://www.uiuc.edu/~dtwright)

-] ------------------------------ [-] -------------------------------- [-
``Weave a circle round him thrice, / And close your eyes with holy dread,
  For he on honeydew hath fed, / and drunk the milk of Paradise.''
       Samuel Taylor Coleridge, Kubla Khan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031211/417e39b4/attachment-0001.bin

From mjk at sdsc.edu Thu Dec 11 13:16:45 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Thu, 11 Dec 2003 13:16:45 -0800
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
In-Reply-To: <3FD8B153.6000205@inel.gov>
References: <3FD8B153.6000205@inel.gov>
Message-ID: <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu>

Download 3.1 (out very soon now) and poke around. Basically there is a
single SSH host key, and all the nodes have a copy. This kills the
"man in the middle" warning every time you reinstall.

       -mjk

On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote:

>   "Mason J. Katz" <mjk at sdsc.edu> wrote:
>
>   > We've also moved from this method to a single cluster-wide ssh key
>   for
>   > Rocks 3.1.
>
>   How does a single key work? I have successfully set up ssh host
>   based authentication for some non-Rocks systems using
>
>   http://www.omega.telia.net/vici/openssh/
>
>   (Note that OpenSSH_3.7.1p2 requires one more setting in addition
>   to those mentioned in the above url.
>
>   In <dir-of-ssh-conf-files>/ssh_config:
>   EnableSSHKeysign yes)
>
>   But I thought it still requires that each host in the has a key...
>   am I wrong? Do you do it differently?
>
>   Thanks,
>
>   Andrew
>
>   --
>   Andrew Shewmaker, Associate Engineer
>   Phone: 1-208-526-1415
>   Idaho National Eng. and Environmental Lab.
>   P.0. Box 1625, M.S. 3605
>   Idaho Falls, Idaho 83415-3605



From landman at scalableinformatics.com      Thu Dec 11 13:36:44 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 11 Dec 2003 16:36:44 -0500
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
In-Reply-To: <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu>
References: <3FD8B153.6000205@inel.gov>
       <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu>
Message-ID: <1071178604.6164.46.camel@squash.scalableinformatics.com>

Hi Mason:

  Eta? I have a non-functional cluster I think I can make function with
3.1. I would be happy to be a real world beta/gamma tester for it
(immediately, eg. today). Please send me a URL. ...

Joe

On Thu, 2003-12-11 at 16:16, Mason J. Katz wrote:
> Download 3.1 (out very soon now) and poke around. Basically there is a
> single SSH host key, and all the nodes have a copy. This kills the
> "man in the middle" warning every time you reinstall.
>
>     -mjk
>
> On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote:
>
> > "Mason J. Katz" <mjk at sdsc.edu> wrote:
> >
> > > We've also moved from this method to a single cluster-wide ssh key
> > for
> > > Rocks 3.1.
> >
> > How does a single key work? I have successfully set up ssh host
> > based authentication for some non-Rocks systems using
> >
> > http://www.omega.telia.net/vici/openssh/
> >
> > (Note that OpenSSH_3.7.1p2 requires one more setting in addition
> > to those mentioned in the above url.
> >
> > In <dir-of-ssh-conf-files>/ssh_config:
> > EnableSSHKeysign yes)
> >
> > But I thought it still requires that each host in the has a key...
> > am I wrong? Do you do it differently?
> >
> > Thanks,
> >
> > Andrew
> >
> > --
> > Andrew Shewmaker, Associate Engineer
> > Phone: 1-208-526-1415
> > Idaho National Eng. and Environmental Lab.
> > P.0. Box 1625, M.S. 3605
> > Idaho Falls, Idaho 83415-3605
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615



From mjk at sdsc.edu Thu Dec 11 13:34:30 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Thu, 11 Dec 2003 13:34:30 -0800
Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
In-Reply-To: <1071178604.6164.46.camel@squash.scalableinformatics.com>
References: <3FD8B153.6000205@inel.gov>
<52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu>
<1071178604.6164.46.camel@squash.scalableinformatics.com>
Message-ID: <CD814510-2C21-11D8-832A-000A95DA5638@sdsc.edu>

We're too close to send out more beta's right now, but if something bad
happens before friday we'll reconsider. We are shooting for next week
- but absolutely before the holidays. ho ho ho. We recognize that our
delay on getting a current release out there is hurting new clusters,
and just having the latest redhat kernel is going to fix most of these
issues.

     -mjk


On Dec 11, 2003, at 1:36 PM, Joe Landman wrote:

> Hi Mason:
>
>    Eta? I have a non-functional cluster I think I can make function
> with
> 3.1. I would be happy to be a real world beta/gamma tester for it
> (immediately, eg. today). Please send me a URL. ...
>
> Joe
>
> On Thu, 2003-12-11 at 16:16, Mason J. Katz wrote:
>> Download 3.1 (out very soon now) and poke around. Basically there is
>> a
>> single SSH host key, and all the nodes have a copy. This kills the
>> "man in the middle" warning every time you reinstall.
>>
>>     -mjk
>>
>> On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote:
>>
>>> "Mason J. Katz" <mjk at sdsc.edu> wrote:
>>>
>>>> We've also moved from this method to a single cluster-wide ssh key
>>> for
>>>> Rocks 3.1.
>>>
>>> How does a single key work? I have successfully set up ssh host
>>> based authentication for some non-Rocks systems using
>>>
>>> http://www.omega.telia.net/vici/openssh/
>>>
>>> (Note that OpenSSH_3.7.1p2 requires one more setting in addition
>>> to those mentioned in the above url.
>>>
>>> In <dir-of-ssh-conf-files>/ssh_config:
>>> EnableSSHKeysign yes)
>>>
>>> But I thought it still requires that each host in the has a key...
>>> am I wrong? Do you do it differently?
>>>
>>> Thanks,
>>>
>>> Andrew
>>>
>>> --
>>> Andrew Shewmaker, Associate Engineer
>>> Phone: 1-208-526-1415
>>> Idaho National Eng. and Environmental Lab.
>>> P.0. Box 1625, M.S. 3605
>>> Idaho Falls, Idaho 83415-3605
> --
> Joseph Landman, Ph.D
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web : http://scalableinformatics.com
> phone: +1 734 612 4615



From purikk at hotmail.com Thu Dec 11 15:06:17 2003
From: purikk at hotmail.com (Purushotham Komaravolu)
Date: Thu, 11 Dec 2003 18:06:17 -0500
Subject: [Rocks-Discuss]Kernal of Rocks 3.0
References: <200312112001.hBBK1IJ18815@postal.sdsc.edu>
Message-ID: <BAY1-DAV391Zg8eBpx700008b71@hotmail.com>

Hi,
     I am a newbie to Rocks and have a few questions. I would appreciate help
with those.
1) what kernel does latest rocks use, if its not latest can I use latest
kernal and how?
2) is there any way to have more than 1 fronend nodes for failover
redundancy?
3) did anybody install penguin compilers over the cluster
Thanks
Regards,
Puru


From bruno at rocksclusters.org Thu Dec 11 15:42:27 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Thu, 11 Dec 2003 15:42:27 -0800
Subject: [Rocks-Discuss]Kernal of Rocks 3.0
In-Reply-To: <BAY1-DAV391Zg8eBpx700008b71@hotmail.com>
References: <200312112001.hBBK1IJ18815@postal.sdsc.edu> <BAY1-
DAV391Zg8eBpx700008b71@hotmail.com>
Message-ID: <AD988A9F-2C33-11D8-B821-000A95C4E3B4@rocksclusters.org>

> 1) what kernel does latest rocks use, if its not latest can I use
> latest
> kernal and how?
our upcoming release (scheduled to release next week) has kernel
version 2.4.21. additionally, the new release includes documentation on
how to build your own kernel RPM from a kernel.org tarball.

> 2) is there any way to have more than 1 fronend nodes for failover
> redundancy?

no, that has not yet been implemented.

> 3) did anybody install penguin compilers over the cluster

i apologize, but i'm not familiar with the penguin compiler. we do have
experience with gnu compilers, intel compilers and the portland group
compilers. additionally, some folks in the rocks community have also
successfully deployed the lahey compiler.

  - gb



From oconnor at ucsd.edu Thu Dec 11 14:29:46 2003
From: oconnor at ucsd.edu (Edward O'Connor)
Date: Thu, 11 Dec 2003 14:29:46 -0800
Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?
In-Reply-To: <ddptix48s6.fsf@oecpc11.ucsd.edu> (Edward O'Connor's message of
 "Fri, 22 Aug 2003 15:39:05 -0700")
References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu>
      <ddptix48s6.fsf@oecpc11.ucsd.edu>
Message-ID: <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu>

Hi everybody,

I'm trying to bring up some ia64 compute nodes in a cluster with an ia32
frontend. Normally, `cd /home/install; rocks-dist mirror dist` only sets
up the frontend to handle ia32 compute nodes. I tried to manhandle
`rocks-dist mirror` into mirroring the ia64 stuff from
ftp.rocksclusters.org by giving it the --arch=ia64 option, but that
didn't work, so I went ahead and did the mirroring step by hand.

After having done so, `rocks-dist dist` still doesn't do the right
thing. So, adding --arch=ia64 to that command yields this error output:

,----
| # rocks-dist --arch=ia64 dist
| Cleaning distribution
| Resolving versions (RPMs)
| Resolving versions (SRPMs)
| Adding support for rebuild distribution from source
| Creating files (symbolic links - fast)
| Creating symlinks to kickstart files
| Fixing Comps Database
| error - comps file is missing, skipping this step
| Generating hdlist (rpm database)
| error - could not find rpm anaconda-runtime
| error - could not find genhdlist
| Patching second stage loader (eKV, partioning, ...)
| error - could not find second stage, skipping this step
`----
So my question is, what do I need to do to the ia32 frontend to enable
it to kickstart an ia64 compute node? Thanks.


Ted

--
Edward O'Connor
oconnor at ucsd.edu



From gotero at linuxprophet.com Thu Dec 11 21:14:33 2003
From: gotero at linuxprophet.com (Glen Otero)
Date: Thu, 11 Dec 2003 21:14:33 -0800
Subject: Fwd: [Rocks-Discuss]RE: Have anyone successfully build a set of grid
compute nodes using Rocks?
Message-ID: <1279F870-2C62-11D8-AAC6-000A95CD8EC8@linuxprophet.com>

>
>
> We put two Itanium clusters and an x86 cluster together on a grid at
> SC2003 using Rocks 3.1 beta and the Grid Roll. Simple CA is installed
> on the cluster frontends for you, so all one has to do is create and
> exchange certificates and update the grid-mapfiles. This grid was a
> joint collaboration between SDSC, Promicro Systems and Callident.
>
> On Dec 11, 2003, at 12:08 AM, Nai Hong Hwa Francis wrote:
>
>>
>>
>>
>> Hi,
>>
>> Have anyone successfully build a set of grid compute nodes using Rocks
>> 3?
>> Anyone care to share?
>>
>>
>> Nai Hong Hwa Francis
>> Institute of Molecular and Cell Biology (A*STAR)
>> 30 Medical Drive
>> Singapore 117609.
>> DID: (65) 6874-6196
>>
>> -----Original Message-----
>> From: npaci-rocks-discussion-request at sdsc.edu
>> [mailto:npaci-rocks-discussion-request at sdsc.edu]
>> Sent: Thursday, December 11, 2003 11:54 AM
>> To: npaci-rocks-discussion at sdsc.edu
>> Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs
>>
>> Send npaci-rocks-discussion mailing list submissions to
>>    npaci-rocks-discussion at sdsc.edu
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>
>> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>> or, via email, send a message with subject or body 'help' to
>>     npaci-rocks-discussion-request at sdsc.edu
>>
>>   You can reach the person managing the list at
>>      npaci-rocks-discussion-admin at sdsc.edu
>>
>>   When replying, please edit your Subject line so it is more specific
>>   than "Re: Contents of npaci-rocks-discussion digest..."
>>
>>
>>   Today's Topics:
>>
>>      1. RE: Do you have a list of the various models of Gigabit Ethernet
>>   Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis)
>>      2. Rocks 3.0.0 (Terrence Martin)
>>      3. Re: "TypeError: loop over non-sequence" when trying
>>          to build CD distro (V. Rowley)
>>
>>   --__--__--
>>
>>   Message: 1
>>   Date: Thu, 11 Dec 2003 09:45:18 +0800
>>   From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg>
>>   To: <npaci-rocks-discussion at sdsc.edu>
>>   Subject: [Rocks-Discuss]RE: Do you have a list of the various models
>>   of
>>   Gigabit Ethernet Interfaces compatible to Rocks 3?
>>
>>
>>
>>   Hi All,
>>
>>   Do you have a list of the various gigabit Ethernet interfaces that are
>>   compatible to Rocks 3?
>>
>>   I am changing my nodes connectivity from 10/100 to 1000.
>>
>>   Have anyone done that and how are the differences in performance or
>>   turnaround time?
>>
>>
>>
>>   Thanks and Regards
>>
>>   Nai Hong Hwa Francis
>>   Institute of Molecular and Cell Biology (A*STAR)
>>   30 Medical Drive
>>   Singapore 117609.
>>   DID: (65) 6874-6196
>>
>>   -----Original Message-----
>>   From: npaci-rocks-discussion-request at sdsc.edu
>>   [mailto:npaci-rocks-discussion-request at sdsc.edu]=20
>>   Sent: Thursday, December 11, 2003 9:25 AM
>>   To: npaci-rocks-discussion at sdsc.edu
>>   Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs
>>
>>   Send npaci-rocks-discussion mailing list submissions to
>>      npaci-rocks-discussion at sdsc.edu
>>
>>   To subscribe or unsubscribe via the World Wide Web, visit
>>   =09
>>   http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>>   or, via email, send a message with subject or body 'help' to
>>      npaci-rocks-discussion-request at sdsc.edu
>>
>>   You can reach the person managing the list at
>>      npaci-rocks-discussion-admin at sdsc.edu
>>
>>   When replying, please edit your Subject line so it is more specific
>>   than "Re: Contents of npaci-rocks-discussion digest..."
>>
>>
>>   Today's Topics:
>>
>>      1. Non-homogenous legacy hardware (Chris Dwan (CCGB))
>>      2. Error during Make when building a new install floppy (Terrence
>>   Martin)
>>      3. Re: Error during Make when building a new install floppy (Tim
>>   Carlson)
>>      4. Re: Non-homogenous legacy hardware (Tim Carlson)
>>      5. ssh_known_hosts and ganglia (Jag)
>>      6. Re: ssh_known_hosts and ganglia (Mason J. Katz)
>>      7. "TypeError: loop over non-sequence" when trying to build CD
>>   distro (V. Rowley)
>>      8. Re: one node short in "labels" (Greg Bruno)
>>      9. Re: "TypeError: loop over non-sequence" when trying to build CD
>>   distro (Mason J. Katz)
>>     10. Re: "TypeError: loop over non-sequence" when trying
>>           to build CD distro (V. Rowley)
>>     11. Re: "TypeError: loop over non-sequence" when trying to
>>           build CD distro (Tim Carlson)
>>
>>   -- __--__--
>>   Message: 1
>>   Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST)
>>   From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
>>   To: npaci-rocks-discussion at sdsc.edu
>>   Subject: [Rocks-Discuss]Non-homogenous legacy hardware
>>
>>
>>   I am integrating legacy systems into a ROCKS cluster, and have hit a
>>   snag with the auto-partition configuration: The new (old) systems
>>   have
>>   SCSI disks, while old (new) ones contain IDE. This is a non-issue so
>>   long as the initial install does its default partitioning. However, I
>>   have a "replace-auto-partition.xml" file which is unworkable for the
>>   SCSI
>>   based systems since it makes specific reference to "hda" rather than
>>   "sda."
>>
>>   I would like to have a site-nodes/replace-auto-partition.xml file
>>   with a
>>   conditional such that "hda" or "sda" is used, based on the name of the
>>   node (or some other criterion).
>>
>>   Is this possible?
>>
>>   Thanks, in advance.   If this is out there on the mailing list
>>   archives,
>>   a
>>   pointer would be greatly appreciated.
>>
>>   -Chris Dwan
>>    The University of Minnesota
>>
>>   -- __--__--
>>   Message: 2
>>   Date: Wed, 10 Dec 2003 12:09:11 -0800
>>   From: Terrence Martin <tmartin at physics.ucsd.edu>
>>   To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
>>   Subject: [Rocks-Discuss]Error during Make when building a new install
>>   floppy
>>
>>   I get the following error when I try to rebuild a boot floppy for
>>   rocks.
>>
>>   This is with the default CVS checkout with an update today according
>>   to=20
>>   the rocks userguide. I have not actually attempted to make any
>>   changes.
>>
>>   make[3]: Leaving directory=20
>>   `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader'
>>   make[2]: Leaving directory=20
>>   `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3'
>>   strip -o loader         anaconda-7.3/loader/loader
>>   strip: anaconda-7.3/loader/loader: No such file or directory
>>   make[1]: *** [loader] Error 1
>>   make[1]: Leaving directory
>>   `/home/install/rocks/src/rocks/boot/7.3/loader'
>>   make: *** [loader] Error 2
>>
>>   Of course I could avoid all of this together and just put my binary=20
>>   module into the appropriate location in the boot image.
>>
>>   Would it be correct to modify the following image file with my
>>   changes=20
>>   and then write it to a floppy via dd?
>>
>>   /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/
>>   7.3
>>   /en/os/i386/images/bootnet.img
>>
>>   Basically I am injecting an updated e1000 driver with changes to=20
>>   pcitable to support the address of my gigabit cards.
>>
>>   Terrence
>>
>>
>>   -- __--__--
>>   Message: 3
>>   Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST)
>>   From: Tim Carlson <tim.carlson at pnl.gov>
>>   Subject: Re: [Rocks-Discuss]Error during Make when building a new
>>   install floppy
>>   To: Terrence Martin <tmartin at physics.ucsd.edu>
>>   Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
>> Reply-to: Tim Carlson <tim.carlson at pnl.gov>
>>
>> On Wed, 10 Dec 2003, Terrence Martin wrote:
>>
>>> I get the following error when I try to rebuild a boot floppy for
>> rocks.
>>>
>>
>> You can't make a boot floppy with Rocks 3.0. That isn't supported. Or
>> at
>> least it wasn't the last time I checked
>>
>>> Of course I could avoid all of this together and just put my binary
>>> module into the appropriate location in the boot image.
>>>
>>> Would it be correct to modify the following image file with my
>>> changes
>>> and then write it to a floppy via dd?
>>>
>>>
>> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/
>> 7.3
>> /en/os/i386/images/bootnet.img
>>>
>>> Basically I am injecting an updated e1000 driver with changes to
>>> pcitable to support the address of my gigabit cards.
>>
>> Modifiying the bootnet.img is about 1/3 of what you need to do if you
>> go
>> down that path. You also need to work on netstg1.img and you'll need
>> to
>> update the drive in the kernel rpm that gets installed on the box.
>> None
>> of
>> this is trivial.
>>
>> If it were me, I would go down the same path I took for updating the
>> AIC79XX driver
>>
>> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/
>> 003
>> 533.html
>>
>> Tim
>>
>> Tim Carlson
>> Voice: (509) 376 3423
>> Email: Tim.Carlson at pnl.gov
>> EMSL UNIX System Support
>>
>>
>> -- __--__--
>> Message: 4
>> Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST)
>> From: Tim Carlson <tim.carlson at pnl.gov>
>> Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware
>> To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu>
>> Cc: npaci-rocks-discussion at sdsc.edu
>> Reply-to: Tim Carlson <tim.carlson at pnl.gov>
>>
>> On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote:
>>
>>>
>>> I am integrating legacy systems into a ROCKS cluster, and have hit a
>>> snag with the auto-partition configuration: The new (old) systems
>> have
>>> SCSI disks, while old (new) ones contain IDE. This is a non-issue so
>>> long as the initial install does its default partitioning. However,
>>> I
>>> have a "replace-auto-partition.xml" file which is unworkable for the
>> SCSI
>>> based systems since it makes specific reference to "hda" rather than
>>> "sda."
>>
>> If you have just a single drive, then you should be able to skip the
>> "--ondisk" bits of your "part" command
>>
>> Otherwise, you would have first to do something ugly like the
>> following:
>>
>> http://penguin.epfl.ch/slides/kickstart/ks.cfg
>>
>> You could probably (maybe) wrap most of that in an
>> <eval sh=3D"bash">
>> </eval>
>>
>> block in the <main> block.
>>
>> Just guessing.. haven't tried this.
>>
>> Tim
>>
>> Tim Carlson
>> Voice: (509) 376 3423
>> Email: Tim.Carlson at pnl.gov
>> EMSL UNIX System Support
>>
>>
>> -- __--__--
>> Message: 5
>> From: Jag <agrajag at dragaera.net>
>> To: npaci-rocks-discussion at sdsc.edu
>> Date: Wed, 10 Dec 2003 13:21:07 -0500
>> Subject: [Rocks-Discuss]ssh_known_hosts and ganglia
>>
>> I noticed a previous post on this list
>> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/
>> 001934
>> .html) indicating that Rocks distributes ssh keys for all the nodes
>> over
>> ganglia. Can anyone enlighten me as to how this is done?
>>
>> I looked through the ganglia docs and didn't see anything indicating
>> how
>> to do this, so I'm assuming Rocks made some changes. Unfortunately
>> the
>> rocks iso images don't seem to contain srpms, so I'm now coming
>> here.=20
>> What did Rocks do to ganglia to make the distribution of ssh keys
>> work?
>>
>> Also, does anyone know where Rocks SRPMs can be found? I've done
>> quite
>> a bit of searching, but haven't found them anywhere.
>>
>>
>> -- __--__--
>> Message: 6
>> Cc: npaci-rocks-discussion at sdsc.edu
>> From: "Mason J. Katz" <mjk at sdsc.edu>
>> Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia
>> Date: Wed, 10 Dec 2003 14:39:15 -0800
>> To: Jag <agrajag at dragaera.net>
>>
>> Most of the SRPMS are on our FTP site, but we've screwed this up =20
>> before. The SRPMS are entirely Rocks specific so they are of little
>> =20
>> value outside of Rocks. You can also checkout our CVS tree =20
>> (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We
>> =20
>> have a ganglia-python package we created to allow us to write our own
>> =20
>> metrics at a high level than the provide gmetric application. We've
>> =20
>> also moved from this method to a single cluster-wide ssh key for Rocks
>> =20
>> 3.1.
>>
>>    -mjk
>>
>> On Dec 10, 2003, at 10:21 AM, Jag wrote:
>>
>>> I noticed a previous post on this list
>>> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20
>>> 001934.html) indicating that Rocks distributes ssh keys for all the
>> =20
>>> nodes over
>>> ganglia. Can anyone enlighten me as to how this is done?
>>>
>>> I looked through the ganglia docs and didn't see anything indicating
>> =20
>>> how
>>> to do this, so I'm assuming Rocks made some changes. Unfortunately
>> the
>>> rocks iso images don't seem to contain srpms, so I'm now coming here.
>>> What did Rocks do to ganglia to make the distribution of ssh keys
>> work?
>>>
>>> Also, does anyone know where Rocks SRPMs can be found? I've done
>> quite
>>> a bit of searching, but haven't found them anywhere.
>>
>>
>> -- __--__--
>> Message: 7
>> Date: Wed, 10 Dec 2003 14:43:49 -0800
>> From: "V. Rowley" <vrowley at ucsd.edu>
>> To: npaci-rocks-discussion at sdsc.edu
>> Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when
>> trying
>> to build CD distro
>>
>> When I run this:
>>
>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>> rocks-dist
>>
>> --dist=3Dcdrom cdrom
>>
>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Adding support for rebuild distribution from source
>>> Creating files (symbolic links - fast)
>>> Creating symlinks to kickstart files
>>> Fixing Comps Database
>>> Generating hdlist (rpm database)
>>> Patching second stage loader (eKV, partioning, ...)
>>>      patching "rocks-ekv" into distribution ...
>>>      patching "rocks-piece-pipe" into distribution ...
>>>      patching "PyXML" into distribution ...
>>>      patching "expat" into distribution ...
>>>      patching "rocks-pylib" into distribution ...
>>>      patching "MySQL-python" into distribution ...
>>>      patching "rocks-kickstart" into distribution ...
>>>      patching "rocks-kickstart-profiles" into distribution ...
>>>      patching "rocks-kickstart-dtds" into distribution ...
>>>      building CRAM filesystem ...
>>> Cleaning distribution
>>> Resolving versions (RPMs)
>>> Resolving versions (SRPMs)
>>> Creating symlinks to kickstart files
>>> Generating hdlist (rpm database)
>>> Segregating RPMs (rocks, non-rocks)
>>> sh: ./kickstart.cgi: No such file or directory
>>> sh: ./kickstart.cgi: No such file or directory
>>> Traceback (innermost last):
>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>      app.run()
>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>      eval('self.command_%s()' % (command))
>>>   File "<string>", line 0, in ?
>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>      builder.build()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>> segregateRPMS
>>>      for pkg in ks.getSection('packages'):
>>> TypeError: loop over non-sequence
>>
>> Any ideas?
>>
>> --=20
>> Vicky Rowley                              email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network      work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>
>>
>> -- __--__--
>> Message: 8
>> Cc: rocks <npaci-rocks-discussion at sdsc.edu>
>> From: Greg Bruno <bruno at rocksclusters.org>
>> Subject: Re: [Rocks-Discuss]one node short in "labels"
>> Date: Wed, 10 Dec 2003 15:12:49 -0800
>> To: Vincent Fox <vincent_b_fox at yahoo.com>
>>
>>> So I go to the "labels" selection on the web page to print out =
>> the=3D20
>>> pretty labels. What a nice idea by the way!
>>> =3DA0
>>> EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20
>>> 0-12.=3DA0 Any ideas where I should check to fix this?
>>
>> yeah, we found this corner case -- it'll be fixed in the next release.
>>
>> thanks for bug report.
>>
>>   - gb
>>
>>
>> -- __--__--
>> Message: 9
>> Cc: npaci-rocks-discussion at sdsc.edu
>> From: "Mason J. Katz" <mjk at sdsc.edu>
>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
>> trying to build CD distro
>> Date: Wed, 10 Dec 2003 15:16:27 -0800
>> To: "V. Rowley" <vrowley at ucsd.edu>
>>
>> It looks like someone moved the profiles directory to profiles.orig.
>>
>>    -mjk
>>
>>
>> [root at rocks14 install]# ls -l
>> total 56
>> drwxr-sr-x    3 root      wheel        4096 Dec 10 21:16 cdrom
>> drwxrwsr-x    5 root      wheel        4096 Dec 10 20:38 contrib.orig
>> drwxr-sr-x    3 root      wheel        4096 Dec 10 21:07=20
>> ftp.rocksclusters.org
>> drwxr-sr-x    3 root      wheel        4096 Dec 10 20:38=20
>> ftp.rocksclusters.org.orig
>> -r-xrwsr-x    1 root      wheel       19254 Sep 3 12:40 kickstart.cgi
>> drwxr-xr-x    3 root      root         4096 Dec 10 20:38 profiles.orig
>> drwxr-sr-x    3 root      wheel        4096 Dec 10 21:15 rocks-dist
>> drwxrwsr-x    3 root      wheel        4096 Dec 10 20:38
>> rocks-dist.orig
>> drwxr-sr-x     3 root     wheel        4096 Dec 10 21:02 src
>> drwxr-sr-x     4 root     wheel        4096 Dec 10 20:49 src.foo
>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>
>>> When I run this:
>>>
>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20
>>> rocks-dist --dist=3Dcdrom cdrom
>>>
>>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>>
>>>> Cleaning distribution
>>>> Resolving versions (RPMs)
>>>> Resolving versions (SRPMs)
>>>> Adding support for rebuild distribution from source
>>>> Creating files (symbolic links - fast)
>>>> Creating symlinks to kickstart files
>>>> Fixing Comps Database
>>>> Generating hdlist (rpm database)
>>>> Patching second stage loader (eKV, partioning, ...)
>>>>      patching "rocks-ekv" into distribution ...
>>>>      patching "rocks-piece-pipe" into distribution ...
>>>>      patching "PyXML" into distribution ...
>>>>      patching "expat" into distribution ...
>>>>      patching "rocks-pylib" into distribution ...
>>>>      patching "MySQL-python" into distribution ...
>>>>      patching "rocks-kickstart" into distribution ...
>>>>      patching "rocks-kickstart-profiles" into distribution ...
>>>>      patching "rocks-kickstart-dtds" into distribution ...
>>>>      building CRAM filesystem ...
>>>> Cleaning distribution
>>>> Resolving versions (RPMs)
>>>> Resolving versions (SRPMs)
>>>> Creating symlinks to kickstart files
>>>> Generating hdlist (rpm database)
>>>> Segregating RPMs (rocks, non-rocks)
>>>> sh: ./kickstart.cgi: No such file or directory
>>>> sh: ./kickstart.cgi: No such file or directory
>>>> Traceback (innermost last):
>>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>      app.run()
>>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>      eval('self.command_%s()' % (command))
>>>>   File "<string>", line 0, in ?
>>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>      builder.build()
>>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20
>>>> segregateRPMS
>>>>      for pkg in ks.getSection('packages'):
>>>> TypeError: loop over non-sequence
>>>
>>> Any ideas?
>>>
>>> --=20
>>> Vicky Rowley                              email: vrowley at ucsd.edu
>>> Biomedical Informatics Research Network      work: (858) 536-5980
>>> University of California, San Diego           fax: (858) 822-0828
>>> 9500 Gilman Drive
>>> La Jolla, CA 92093-0715
>>>
>>>
>>> See pictures from our trip to China at=20
>>> http://www.sagacitech.com/Chinaweb
>>
>>
>> -- __--__--
>> Message: 10
>> Date: Wed, 10 Dec 2003 16:50:16 -0800
>> From: "V. Rowley" <vrowley at ucsd.edu>
>> To: "Mason J. Katz" <mjk at sdsc.edu>
>> CC: npaci-rocks-discussion at sdsc.edu
>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
>> trying
>> to build CD distro
>>
>> Yep, I did that, but only *AFTER* getting the error. [Thought it
>> was=20
>> generated by the rocks-dist sequence, but apparently not.] Go
>> ahead.=20
>> Move it back. Same difference.
>>
>> Vicky
>>
>> Mason J. Katz wrote:
>>> It looks like someone moved the profiles directory to profiles.orig.
>>> =20
>>>      -mjk
>>> =20
>>> =20
>>> [root at rocks14 install]# ls -l
>>> total 56
>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 21:16 cdrom
>>> drwxrwsr-x     5 root    wheel         4096 Dec 10 20:38 contrib.orig
>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 21:07=20
>>> ftp.rocksclusters.org
>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 20:38=20
>>> ftp.rocksclusters.org.orig
>>> -r-xrwsr-x     1 root    wheel        19254 Sep 3 12:40 kickstart.cgi
>>> drwxr-xr-x     3 root    root          4096 Dec 10 20:38 profiles.orig
>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 21:15 rocks-dist
>>> drwxrwsr-x     3 root    wheel         4096 Dec 10 20:38
>> rocks-dist.orig
>>> drwxr-sr-x     3 root    wheel         4096 Dec 10 21:02 src
>>> drwxr-sr-x     4 root    wheel         4096 Dec 10 20:49 src.foo
>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>> =20
>>>> When I run this:
>>>>
>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20
>>>> rocks-dist --dist=3Dcdrom cdrom
>>>>
>>>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>
>>>>> Cleaning distribution
>>>>> Resolving versions (RPMs)
>>>>> Resolving versions (SRPMs)
>>>>> Adding support for rebuild distribution from source
>>>>> Creating files (symbolic links - fast)
>>>>> Creating symlinks to kickstart files
>>>>> Fixing Comps Database
>>>>> Generating hdlist (rpm database)
>>>>> Patching second stage loader (eKV, partioning, ...)
>>>>>      patching "rocks-ekv" into distribution ...
>>>>>      patching "rocks-piece-pipe" into distribution ...
>>>>>      patching "PyXML" into distribution ...
>>>>>      patching "expat" into distribution ...
>>>>>      patching "rocks-pylib" into distribution ...
>>>>>      patching "MySQL-python" into distribution ...
>>>>>      patching "rocks-kickstart" into distribution ...
>>>>>      patching "rocks-kickstart-profiles" into distribution ...
>>>>>      patching "rocks-kickstart-dtds" into distribution ...
>>>>>      building CRAM filesystem ...
>>>>> Cleaning distribution
>>>>> Resolving versions (RPMs)
>>>>> Resolving versions (SRPMs)
>>>>> Creating symlinks to kickstart files
>>>>> Generating hdlist (rpm database)
>>>>> Segregating RPMs (rocks, non-rocks)
>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>> Traceback (innermost last):
>>>>>    File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>      app.run()
>>>>>    File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>      eval('self.command_%s()' % (command))
>>>>>    File "<string>", line 0, in ?
>>>>>    File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>      builder.build()
>>>>>    File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>      (rocks, nonrocks) =3D self.segregateRPMS()
>>>>>    File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20
>>>>> segregateRPMS
>>>>>      for pkg in ks.getSection('packages'):
>>>>> TypeError: loop over non-sequence
>>>>
>>>>
>>>> Any ideas?
>>>>
>>>> --=20
>>>> Vicky Rowley                              email: vrowley at ucsd.edu
>>>> Biomedical Informatics Research Network      work: (858) 536-5980
>>>> University of California, San Diego           fax: (858) 822-0828
>>>> 9500 Gilman Drive
>>>> La Jolla, CA 92093-0715
>>>>
>>>>
>>>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>> =20
>>> =20
>>> =20
>>
>> --=20
>> Vicky Rowley                              email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network      work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>
>>
>> -- __--__--
>> Message: 11
>> Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST)
>> From: Tim Carlson <tim.carlson at pnl.gov>
>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
>> trying to
>> build CD distro
>> To: "V. Rowley" <vrowley at ucsd.edu>
>> Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu
>> Reply-to: Tim Carlson <tim.carlson at pnl.gov>
>>
>> On Wed, 10 Dec 2003, V. Rowley wrote:
>>
>> Did you remove python by chance? kickstart.cgi calls python directly
>> in
>> /usr/bin/python while rocks-dist does an "env python"
>>
>> Tim
>>
>>> Yep, I did that, but only *AFTER* getting the error. [Thought it was
>>> generated by the rocks-dist sequence, but apparently not.] Go ahead.
>>> Move it back. Same difference.
>>>
>>> Vicky
>>>
>>> Mason J. Katz wrote:
>>>> It looks like someone moved the profiles directory to profiles.orig.
>>>>
>>>>      -mjk
>>>>
>>>>
>>>> [root at rocks14 install]# ls -l
>>>> total 56
>>>> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:16 cdrom
>>>> drwxrwsr-x    5 root     wheel         4096 Dec 10 20:38 contrib.orig
>>>> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:07
>>>> ftp.rocksclusters.org
>>>> drwxr-sr-x    3 root     wheel         4096 Dec 10 20:38
>>>> ftp.rocksclusters.org.orig
>>>> -r-xrwsr-x    1 root     wheel        19254 Sep 3 12:40
>> kickstart.cgi
>>>> drwxr-xr-x    3 root     root          4096 Dec 10 20:38
>> profiles.orig
>>>> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:15 rocks-dist
>>>> drwxrwsr-x    3 root     wheel         4096 Dec 10 20:38
>> rocks-dist.orig
>>>> drwxr-sr-x    3 root     wheel         4096 Dec 10 21:02 src
>>>> drwxr-sr-x    4 root     wheel         4096 Dec 10 20:49 src.foo
>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>>>
>>>>> When I run this:
>>>>>
>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>>>>> rocks-dist --dist=3Dcdrom cdrom
>>>>>
>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>>
>>>>>> Cleaning distribution
>>>>>> Resolving versions (RPMs)
>>>>>> Resolving versions (SRPMs)
>>>>>> Adding support for rebuild distribution from source
>>>>>> Creating files (symbolic links - fast)
>>>>>> Creating symlinks to kickstart files
>>>>>> Fixing Comps Database
>>>>>> Generating hdlist (rpm database)
>>>>>> Patching second stage loader (eKV, partioning, ...)
>>>>>>     patching "rocks-ekv" into distribution ...
>>>>>>     patching "rocks-piece-pipe" into distribution ...
>>>>>>     patching "PyXML" into distribution ...
>>>>>>     patching "expat" into distribution ...
>>>>>>     patching "rocks-pylib" into distribution ...
>>>>>>     patching "MySQL-python" into distribution ...
>>>>>>     patching "rocks-kickstart" into distribution ...
>>>>>>     patching "rocks-kickstart-profiles" into distribution ...
>>>>>>     patching "rocks-kickstart-dtds" into distribution ...
>>>>>>     building CRAM filesystem ...
>>>>>> Cleaning distribution
>>>>>> Resolving versions (RPMs)
>>>>>> Resolving versions (SRPMs)
>>>>>> Creating symlinks to kickstart files
>>>>>> Generating hdlist (rpm database)
>>>>>> Segregating RPMs (rocks, non-rocks)
>>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>>> Traceback (innermost last):
>>>>>>   File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>>     app.run()
>>>>>>   File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>>     eval('self.command_%s()' % (command))
>>>>>>   File "<string>", line 0, in ?
>>>>>>   File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>>     builder.build()
>>>>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>>     (rocks, nonrocks) =3D self.segregateRPMS()
>>>>>>   File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>>>>> segregateRPMS
>>>>>>     for pkg in ks.getSection('packages'):
>>>>>> TypeError: loop over non-sequence
>>>>>
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> --
>>>>> Vicky Rowley                             email: vrowley at ucsd.edu
>>>>> Biomedical Informatics Research Network     work: (858) 536-5980
>>>>> University of California, San Diego          fax: (858) 822-0828
>>>>> 9500 Gilman Drive
>>>>> La Jolla, CA 92093-0715
>>>>>
>>>>>
>>>>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>>>
>>>>
>>>>
>>>
>>> --
>>> Vicky Rowley                             email: vrowley at ucsd.edu
>>> Biomedical Informatics Research Network     work: (858) 536-5980
>>> University of California, San Diego          fax: (858) 822-0828
>>> 9500 Gilman Drive
>>> La Jolla, CA 92093-0715
>>>
>>>
>>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>>
>>>
>>
>>
>>
>>
>> -- __--__--
>> _______________________________________________
>> npaci-rocks-discussion mailing list
>> npaci-rocks-discussion at sdsc.edu
>> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>>
>>
>> End of npaci-rocks-discussion Digest
>>
>>
>> DISCLAIMER:
>> This email is confidential and may be privileged. If you are not the =
>> intended recipient, please delete it and notify us immediately.
>> Please =
>> do not copy or use it for any purpose, or disclose its contents to
>> any =
>> other person as it may be an offence under the Official Secrets Act. =
>> Thank you.
>>
>> --__--__--
>>
>> Message: 2
>> Date: Wed, 10 Dec 2003 18:03:41 -0800
>> From: Terrence Martin <tmartin at physics.ucsd.edu>
>> To: npaci-rocks-discussion at sdsc.edu
>> Subject: [Rocks-Discuss]Rocks 3.0.0
>>
>> I am having a problem on install of rocks 3.0.0 on my new cluster.
>>
>> The python error occurs right after anaconda starts and just before
>> the
>> install asks for the roll CDROM.
>>
>> The error refers to an inability to find or load rocks.file. The error
>> is associated I think with the window that pops up and asks you in put
>> the roll CDROM in.
>>
>> The process I followed to get to this point is
>>
>> Put the Rocks 3.0.0 CDROM into the CDROM drive
>> Boot the system
>> At the prompt type frontend
>> Wait till anaconda starts
>> Error referring to unable to load rocks.file.
>>
>> I have successfully installed rocks on a smaller cluster but that has
>> different hardware. I used the same CDROM for both installs.
>>
>> Any thoughts?
>>
>> Terrence
>>
>>
>>
>> --__--__--
>>
>> Message: 3
>> Date: Wed, 10 Dec 2003 19:52:49 -0800
>> From: "V. Rowley" <vrowley at ucsd.edu>
>> To: npaci-rocks-discussion at sdsc.edu
>> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
>> trying
>> to build CD distro
>>
>> Looks like python is okay:
>>
>>> [root at rocks14 birn-oracle1]# which python
>>> /usr/bin/python
>>> [root at rocks14 birn-oracle1]# python --help
>>> Unknown option: --
>>> usage: python [option] ... [-c cmd | file | -] [arg] ...
>>> Options and arguments (and corresponding environment variables):
>>> -d      : debug output from parser (also PYTHONDEBUG=x)
>>> -i      : inspect interactively after running script, (also
>> PYTHONINSPECT=x)
>>>           and force prompts, even if stdin does not appear to be a
>> terminal
>>> -O      : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x)
>>> -OO     : remove doc-strings in addition to the -O optimizations
>>> -S      : don't imply 'import site' on initialization
>>> -t      : issue warnings about inconsistent tab usage (-tt: issue
>> errors)
>>> -u      : unbuffered binary stdout and stderr (also
>>> PYTHONUNBUFFERED=x)
>>> -v      : verbose (trace import statements) (also PYTHONVERBOSE=x)
>>> -x      : skip first line of source, allowing use of non-Unix forms of
>> #!cmd
>>> -X      : disable class based built-in exceptions
>>> -c cmd : program passed in as string (terminates option list)
>>> file    : program read from script file
>>> -       : program read from stdin (default; interactive mode if a tty)
>>> arg ...: arguments passed to program in sys.argv[1:]
>>> Other environment variables:
>>> PYTHONSTARTUP: file executed on interactive startup (no default)
>>> PYTHONPATH   : ':'-separated list of directories prefixed to the
>>>                 default module search path. The result is sys.path.
>>> PYTHONHOME   : alternate <prefix> directory (or
>> <prefix>:<exec_prefix>).
>>>                 The default module search path uses
>>> <prefix>/python1.5.
>>> [root at rocks14 birn-oracle1]#
>>
>>
>>
>> Tim Carlson wrote:
>>> On Wed, 10 Dec 2003, V. Rowley wrote:
>>>
>>> Did you remove python by chance? kickstart.cgi calls python directly
>> in
>>> /usr/bin/python while rocks-dist does an "env python"
>>>
>>> Tim
>>>
>>>
>>>> Yep, I did that, but only *AFTER* getting the error. [Thought it
>>>> was
>>>> generated by the rocks-dist sequence, but apparently not.] Go
>>>> ahead.
>>>> Move it back. Same difference.
>>>>
>>>> Vicky
>>>>
>>>> Mason J. Katz wrote:
>>>>
>>>>> It looks like someone moved the profiles directory to
>>>>> profiles.orig.
>>>>>
>>>>>    -mjk
>>>>>
>>>>>
>>>>> [root at rocks14 install]# ls -l
>>>>> total 56
>>>>> drwxr-sr-x     3 root     wheel        4096 Dec 10 21:16 cdrom
>>>>> drwxrwsr-x     5 root     wheel        4096 Dec 10 20:38
>>>>> contrib.orig
>>>>> drwxr-sr-x     3 root     wheel        4096 Dec 10 21:07
>>>>> ftp.rocksclusters.org
>>>>> drwxr-sr-x     3 root     wheel        4096 Dec 10 20:38
>>>>> ftp.rocksclusters.org.orig
>>>>> -r-xrwsr-x     1 root     wheel       19254 Sep 3 12:40
>>>>> kickstart.cgi
>>>>> drwxr-xr-x     3 root     root         4096 Dec 10 20:38
>>>>> profiles.orig
>>>>> drwxr-sr-x     3 root     wheel        4096 Dec 10 21:15 rocks-dist
>>>>> drwxrwsr-x     3 root     wheel        4096 Dec 10 20:38
>> rocks-dist.orig
>>>>> drwxr-sr-x     3 root     wheel        4096 Dec 10 21:02 src
>>>>> drwxr-sr-x     4 root     wheel        4096 Dec 10 20:49 src.foo
>>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
>>>>>
>>>>>
>>>>>> When I run this:
>>>>>>
>>>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;
>>>>>> rocks-dist --dist=cdrom cdrom
>>>>>>
>>>>>> on a server installed with ROCKS 3.0.0, I eventually get this:
>>>>>>
>>>>>>
>>>>>>> Cleaning distribution
>>>>>>> Resolving versions (RPMs)
>>>>>>> Resolving versions (SRPMs)
>>>>>>> Adding support for rebuild distribution from source
>>>>>>> Creating files (symbolic links - fast)
>>>>>>> Creating symlinks to kickstart files
>>>>>>> Fixing Comps Database
>>>>>>> Generating hdlist (rpm database)
>>>>>>> Patching second stage loader (eKV, partioning, ...)
>>>>>>>    patching "rocks-ekv" into distribution ...
>>>>>>>    patching "rocks-piece-pipe" into distribution ...
>>>>>>>    patching "PyXML" into distribution ...
>>>>>>>    patching "expat" into distribution ...
>>>>>>>    patching "rocks-pylib" into distribution ...
>>>>>>>    patching "MySQL-python" into distribution ...
>>>>>>>    patching "rocks-kickstart" into distribution ...
>>>>>>>    patching "rocks-kickstart-profiles" into distribution ...
>>>>>>>    patching "rocks-kickstart-dtds" into distribution ...
>>>>>>>    building CRAM filesystem ...
>>>>>>> Cleaning distribution
>>>>>>> Resolving versions (RPMs)
>>>>>>> Resolving versions (SRPMs)
>>>>>>> Creating symlinks to kickstart files
>>>>>>> Generating hdlist (rpm database)
>>>>>>> Segregating RPMs (rocks, non-rocks)
>>>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>>>> sh: ./kickstart.cgi: No such file or directory
>>>>>>> Traceback (innermost last):
>>>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ?
>>>>>>>    app.run()
>>>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run
>>>>>>>    eval('self.command_%s()' % (command))
>>>>>>> File "<string>", line 0, in ?
>>>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
>>>>>>>    builder.build()
>>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
>>>>>>>    (rocks, nonrocks) = self.segregateRPMS()
>>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in
>>>>>>> segregateRPMS
>>>>>>>    for pkg in ks.getSection('packages'):
>>>>>>> TypeError: loop over non-sequence
>>>>>>
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> --
>>>>>> Vicky Rowley                              email: vrowley at ucsd.edu
>>>>>> Biomedical Informatics Research Network      work: (858) 536-5980
>>>>>> University of California, San Diego           fax: (858) 822-0828
>>>>>> 9500 Gilman Drive
>>>>>> La Jolla, CA 92093-0715
>>>>>>
>>>>>>
>>>>>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Vicky Rowley                              email: vrowley at ucsd.edu
>>>> Biomedical Informatics Research Network      work: (858) 536-5980
>>>> University of California, San Diego           fax: (858) 822-0828
>>>> 9500 Gilman Drive
>>>> La Jolla, CA 92093-0715
>>>>
>>>>
>>>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Vicky Rowley                              email: vrowley at ucsd.edu
>> Biomedical Informatics Research Network      work: (858) 536-5980
>> University of California, San Diego           fax: (858) 822-0828
>> 9500 Gilman Drive
>> La Jolla, CA 92093-0715
>>
>>
>> See pictures from our trip to China at
>> http://www.sagacitech.com/Chinaweb
>>
>>
>>
>> --__--__--
>>
>> _______________________________________________
>> npaci-rocks-discussion mailing list
>> npaci-rocks-discussion at sdsc.edu
>> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>>
>>
>> End of npaci-rocks-discussion Digest
>>
>>
>> DISCLAIMER:
>> This email is confidential and may be privileged. If you are not the
>> intended recipient, please delete it and notify us immediately.
>> Please do not copy or use it for any purpose, or disclose its
>> contents to any other person as it may be an offence under the
>> Official Secrets Act. Thank you.
>>
>>
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 35605 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031211/1a0b38fb/attachment-0001.bin

From tmartin at physics.ucsd.edu Fri Dec 12 10:26:58 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Fri, 12 Dec 2003 10:26:58 -0800
Subject: [Rocks-Discuss]ftp.rocksclusters.org mirror?
Message-ID: <3FDA0872.8010405@physics.ucsd.edu>

I was wondering, does the command rocks-dist do anything else besides
call wget on the correct tree at ftp.rocksclusters.org?

I ask because some firewall restrictions on a system I am hesitant to
fiddle are preventing me from running rocks-dist mirror from my head
node. I would like to download the mirror of the rocks distro on another
system, transfer the tree and then run rocks-dist dist to rebuild the
rocks for my compute nodes. Is this reasonable?

Also am I going to run into any problems with rocks 3.0.0 having
installed the head node on a UP system but my compute nodes are SMP? I
am making an assumption that once I get all of the packages into rocks
(currently there is no smp kernels on the head node) the compute nodes
will install the right kernel?

BTW thanks for the help so far, the trick it seems to getting Rocks
3.0.0 on these supermicro systems is to install rocks on the hard drive
in a separate computer and then install the hard disk.

Thanks,

Terrence




From mjk at sdsc.edu Fri Dec 12 10:48:17 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Fri, 12 Dec 2003 10:48:17 -0800
Subject: [Rocks-Discuss]ftp.rocksclusters.org mirror?
In-Reply-To: <3FDA0872.8010405@physics.ucsd.edu>
References: <3FDA0872.8010405@physics.ucsd.edu>
Message-ID: <BF99287A-2CD3-11D8-A2DC-000A95DA5638@sdsc.edu>

- Yes, "rocks-dist mirror" does a python system() call to run the wget
application. It does this several time for the various directories it
needs.

- No, the compute nodes do not need to be the same SMPness of the
frontend. All installations are done with Red Hat Kickstart (plus our
pixie dust) so hardware is auto detected for you. This is not disk
imaging :)

       -mjk

On Dec 12, 2003, at 10:26 AM, Terrence Martin wrote:

>   I was wondering, does the command rocks-dist do anything else besides
>   call wget on the correct tree at ftp.rocksclusters.org?
>
>   I ask because some firewall restrictions on a system I am hesitant to
>   fiddle are preventing me from running rocks-dist mirror from my head
>   node. I would like to download the mirror of the rocks distro on
>   another system, transfer the tree and then run rocks-dist dist to
>   rebuild the rocks for my compute nodes. Is this reasonable?
>
>   Also am I going to run into any problems with rocks 3.0.0 having
>   installed the head node on a UP system but my compute nodes are SMP? I
>   am making an assumption that once I get all of the packages into rocks
>   (currently there is no smp kernels on the head node) the compute nodes
>   will install the right kernel?
>
>   BTW thanks for the help so far, the trick it seems to getting Rocks
>   3.0.0 on these supermicro systems is to install rocks on the hard
>   drive in a separate computer and then install the hard disk.
>
>   Thanks,
>
>   Terrence
>
>



From mjk at sdsc.edu Fri Dec 12 10:54:03 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Fri, 12 Dec 2003 10:54:03 -0800
Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?
In-Reply-To: <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu>
References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu>
<ddptix48s6.fsf@oecpc11.ucsd.edu> <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu>
Message-ID: <8E405599-2CD4-11D8-A2DC-000A95DA5638@sdsc.edu>

We haven't done this for a while, and since our 3.0 release using
different version of Red Hat for x86 and IA64 cross-building
distribution may not work. 3.1.0 (since you are on campus you'll get a
CD set from us next week) uses the same base RH for all architecture so
this should be possible again.

The mirror should have worked:

       # rocks-dist --arch=ia64 mirror

      Should be the ia64 tree from ftp.rocksclusters.org, you can also use
your IA64 DVD mount it on /mnt/cdrom and do a "rocks-dist copycd" to
create the IA64 mirror.

If this works you will the to use the --genhdlist flag w/ rocks-dist.
For example:

          # cd /home/install
          # rocks-dist dist             --- build the x86 distribution
          # rocks-dist --arch=ia64 --genhdlist=rocks-dist/.../i386/.../genhdlist

You'll need to use find to determine the path of the genhdlist
executable in you x86 distribution. This may still fail (since RH
version differ), but it does work when the version are the same for
both archs.

          -mjk

On Dec 11, 2003, at 2:29 PM, Edward O'Connor wrote:

>   Hi everybody,
>
>   I'm trying to bring up some ia64 compute nodes in a cluster with an
>   ia32
>   frontend. Normally, `cd /home/install; rocks-dist mirror dist` only
>   sets
>   up the frontend to handle ia32 compute nodes. I tried to manhandle
>   `rocks-dist mirror` into mirroring the ia64 stuff from
>   ftp.rocksclusters.org by giving it the --arch=ia64 option, but that
>   didn't work, so I went ahead and did the mirroring step by hand.
>
>   After having done so, `rocks-dist dist` still doesn't do the right
>   thing. So, adding --arch=ia64 to that command yields this error output:
>
>   ,----
>   | # rocks-dist --arch=ia64 dist
>   | Cleaning distribution
>   | Resolving versions (RPMs)
>   | Resolving versions (SRPMs)
>   | Adding support for rebuild distribution from source
>   | Creating files (symbolic links - fast)
>   | Creating symlinks to kickstart files
>   | Fixing Comps Database
>   | error - comps file is missing, skipping this step
>   | Generating hdlist (rpm database)
>   | error - could not find rpm anaconda-runtime
>   | error - could not find genhdlist
>   | Patching second stage loader (eKV, partioning, ...)
>   | error - could not find second stage, skipping this step
>   `----
>
>   So my question is, what do I need to do to the ia32 frontend to enable
>   it to kickstart an ia64 compute node? Thanks.
>
>
>   Ted
>
>   --
>   Edward O'Connor
>   oconnor at ucsd.edu



From mjk at sdsc.edu      Fri Dec 12 11:12:59 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Fri, 12 Dec 2003 11:12:59 -0800
Subject: [Rocks-Discuss]I can't use xpbs in rocks
In-Reply-To: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com>
References: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com>
Message-ID: <32F6A3BA-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu>

Unfortunately we don't have a fix here. We've moved to SGE (your can
now use QMon). We do have a PBS roll but we plan to release 3.1 before
the PBS roll is complete.

       -mjk

On Dec 10, 2003, at 8:44 PM, zhong wenyu wrote:

>   Hi,everyone!
>   I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of
>   them.
>   typed:xpbs[enter]
>   showed:xpbs: initialization failed! output: invalid command name
>   "Pref_Init"
>   thanks!
>
>   _________________________________________________________________
>   ?????????????? MSN Messenger: http://messenger.msn.com/cn



From fparnold at chem.northwestern.edu Fri Dec 12 06:52:45 2003
From: fparnold at chem.northwestern.edu (Fred P. Arnold)
Date: Fri, 12 Dec 2003 08:52:45 -0600 (CST)
Subject: [Rocks-Discuss]Gig E on HP ZX6000
Message-ID: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>

Hello,

I know this is a hardware question, not technically a Rocks one, but I
can't find the answer in my HP manuals:

On the ZX6000, there are two ethernet ports, a 10/100 basic/management
port, and a 1000 which is designated the primary interface.
Unfortunately, rocks always identifies the 10/100 as eth0.

Does anyone know how to disable the 10/100 on a ZX6000?    On an IA32, I'd
go into the bios, but these don't technically have one.    We'd like to run
ours on a pure Gig network.

Thanks.

                                     -Fred

                               Frederick P. Arnold, Jr.
                               NUIT, Northwestern U.
                               f-arnold at northwestern.edu



From mjk at sdsc.edu Fri Dec 12 11:16:42 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Fri, 12 Dec 2003 11:16:42 -0800
Subject: [Rocks-Discuss]ScalablePBS.
In-Reply-To: <200311212352.27000.Roy.Dragseth@cc.uit.no>
References: <200311212352.27000.Roy.Dragseth@cc.uit.no>
Message-ID: <B83C8894-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu>

hi Roy,

This should become the basis of the PBS roll (currently openpbs). We
are seeking developers who would like to help write and maintain this
-- I'm not singling you out Roy, although you would be more than
welcome, rather I'm taking advantage of your message to solicit other
volunteers. Anyone?

         -mjk


On Nov 21, 2003, at 2:52 PM, Roy Dragseth wrote:

>   Hi   folks.
>
>   I've been testing ScalablePBS (SPBS) from supercluster.org for a few
>   weeks now
>   and it seems like a fairly good replacement for OpenPBS. Only a few
>   minor
>   changes to the OpenPBS infrastructure were needed to accomplish the
>   neccessary changes in the kickstart generation to make the nodes
>   switch to
>   SPBS.
>
>   SPBS is based on OpenPBS 2.3.12, but incorporates most provided patches
>   (sandia etc) and is actively developed by the same maintainers that
>   develop
>   maui. It scales better than OpenPBS, to around 2K nodes, has better
>   fault
>   tolerance and communicates better with maui. It has, as far as I can
>   see, no
>   user visible changes from OpenPBS.
>
>   I know, a lot of people are moving away from pbs and into sge, I was
>   thinking
>   about making the switch too. The emergence SPBS seems to make the
>   switch
>   unneccessary and I don't have to teach myself (and the users) a new
>   queueing
>   interface...
>
>   Configuration tested:
>   Rocks 3.0.0
>   SPBS 1.0.1p0 (should leave beta phase next month)
>   Maui 3.2.6p6 (available for "Early Access Production")
>
>   SPBS and Maui can be downloaded from http://www.supercluster.org/
>
>   Have a nice weekend,
>   r.
>
>   --
>
>     The Computer Center, University of Troms?, N-9037 TROMS?, Norway.
>             phone:+47 77 64 41 07, fax:+47 77 64 41 00
>        Roy Dragseth, High Performance Computing System Administrator
>        Direct call: +47 77 64 62 56. email: royd at cc.uit.no



From jlkaiser at fnal.gov Fri Dec 12 11:25:58 2003
From: jlkaiser at fnal.gov (Joseph L. Kaiser)
Date: Fri, 12 Dec 2003 13:25:58 -0600
Subject: [Rocks-Discuss](no subject)
Message-ID: <1071257158.3719.9.camel@ajax.kaisergroup.net>

My install of 3.0.0 is crapping out here:

 "/usr/src/build/90289-i386/install// a x
                  x usr/lib/anaconda/comps.py", line        a
x
                    x 153, in __getitem__                   a
x
                    x KeyError: PyXML                       #
x
                    x


Even though PyXML is in the distribution I have built. Is there
anything that can cause this other than the missing RPM?

Thanks,

Joe



From oconnor at soe.ucsd.edu Fri Dec 12 11:36:04 2003
From: oconnor at soe.ucsd.edu (Edward O'Connor)
Date: Fri, 12 Dec 2003 11:36:04 -0800
Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?
In-Reply-To: <8E405599-2CD4-11D8-A2DC-000A95DA5638@sdsc.edu> (Mason J.
 Katz's message of "Fri, 12 Dec 2003 10:54:03 -0800")
References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu>
      <ddptix48s6.fsf@oecpc11.ucsd.edu> <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu>
      <8E405599-2CD4-11D8-A2DC-000A95DA5638@sdsc.edu>
Message-ID: <ddiskl4ymz.fsf@oecpc11.ucsd.edu>

> We haven't done this for a while, and since our 3.0 release using
> different version of Red Hat for x86 and IA64 cross-building
> distribution may not work.

Ahh. After further travails (read below), I'm pretty willing to suspect
that this indeed does not work in Rocks 3.0.0. I'm looking forward to
those 3.1.0 CDs and DVDs next week! :)

> you can also use your IA64 DVD mount it on /mnt/cdrom and do a
> "rocks-dist copycd" to create the IA64 mirror.

Unfortunately, the ia32 frontend machine doesn't have a DVD drive in it.
So I mounted the ia64 ISO image on /mnt/cdrom via a loopback device and
that worked fine.
However, `rocks-dist copycd` seemed to have nuked the ia32 stuff under
/home/install/ftp.rocksclusters.org/, or, if it didn't entirely nuke it,
it made the bare `rocks-dist dist` of your next instructions fail:

> If this works you will the to use the --genhdlist flag w/ rocks-dist.
> For example:
>
>     # cd /home/install
>       # rocks-dist dist --- build the x86 distribution

As this failed, I went ahead and also ran a `rocks-dist mirror`, which
proceeded to download a whole lot of stuff from you guys. After it
finished, `rocks-dist dist` completed without error. I double-checked
and the ia64 mirror from the `rocks-dist copycd` command still appears
to be there.

>        # rocks-dist --arch=ia64 --genhdlist=rocks-dist/.../i386/.../genhdlist

Should there be a `dist` at the end of that? The above command (with the
substitution of the appropriate genhdlist path) appears to be a no-op.
So I appended a `dist` as the idea is for it to create the appropriate
symlinks for ia64 as well, and it bombs out too, in the same way as
before:

,----
| # rocks-dist --arch=ia64 --genhdlist=rocks-dist/7.3/en/os/i386/usr/lib/anaconda-
runtime/genhdlist dist
| Cleaning distribution
| Resolving versions (RPMs)
| Resolving versions (SRPMs)
| Adding support for rebuild distribution from source
| Creating files (symbolic links - fast)
| Creating symlinks to kickstart files
| Fixing Comps Database
| error - comps file is missing, skipping this step
| Generating hdlist (rpm database)
| error creating file /home/install/rocks-
dist/desktop/7.3/en/os/ia64/RedHat/base/hdlist: No such file or directory
| Patching second stage loader (eKV, partioning, ...)
| error - could not find second stage, skipping this step
`----

>   You'll need to use find to determine the path of the genhdlist
>   executable in you x86 distribution. This may still fail (since RH
>   version differ), but it does work when the version are the same for
>   both archs.

I suppose at this point that it's still failing due to the RH version
mismatch, and that getting this to work in 3.0.0 is a lost cause.


Ted

--
Edward O'Connor
oconnor at ucsd.edu
From jared_hodge at iat.utexas.edu Fri Dec 12 12:07:32 2003
From: jared_hodge at iat.utexas.edu (Jared Hodge)
Date: Fri, 12 Dec 2003 14:07:32 -0600
Subject: [Rocks-Discuss]I can't use xpbs in rocks
References: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com> <32F6A3BA-2CD7-11D8-
A2DC-000A95DA5638@sdsc.edu>
Message-ID: <3FDA2004.3020203@iat.utexas.edu>

OK, I've got a fix for this one.
The problem is that xpbs thinks that it's in the directory
/var/tmp/OpenPBS-buildroot/opt/OpenPBS/
Anyway, the path is mangled to get to some of the subroutines. The
rocks guys can figure out a way to prevent this in future releases, but
here's how you can get it working (and pbsmon while were at it):

First fix the scripts:
/opt/OpenPBS/bin/xpbs Need's the following changes:

#set libdir /var/tmp/OpenPBS-buildroot/opt/OpenPBS/lib/xpbs
#set appdefdir /var/tmp/OpenPBS-buildroot/opt/OpenPBS/lib/xpbs
set libdir            /opt/OpenPBS/lib/xpbs
set appdefdir         /opt/OpenPBS/lib/xpbs

/opt/OpenPBS/bin/xpbsmon Needs the same thing plus the first line needs
changed

now do the following:
cd /opt/OpenPBS/lib/xpbs
rm tclIndex
./buildindex `pwd`
cd /opt/OpenPBS/lib/xpbsmon
rm tclIndex
./buildindex `pwd`


That should fix it all up.    I tested this on a 2.3.2 cluster, I assume
it's the same on 3.0.

--
Jared Hodge
The Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
Fax: 512-471-9096
Email: jared_hodge at iat.utexas.edu



Mason J. Katz wrote:

> Unfortunately we don't have a fix here. We've moved to SGE (your can
> now use QMon). We do have a PBS roll but we plan to release 3.1
> before the PBS roll is complete.
>
>     -mjk
>
> On Dec 10, 2003, at 8:44 PM, zhong wenyu wrote:
>
>> Hi,everyone!
>> I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of
>> them.
>> typed:xpbs[enter]
>> showed:xpbs: initialization failed! output: invalid command name
>> "Pref_Init"
>> thanks!
>>
>> _________________________________________________________________
>> ?????????????? MSN Messenger: http://messenger.msn.com/cn
>
>



From jlkaiser at fnal.gov Fri Dec 12 14:39:42 2003
From: jlkaiser at fnal.gov (Joe Kaiser)
Date: Fri, 12 Dec 2003 16:39:42 -0600
Subject: [Rocks-Discuss](no subject)
In-Reply-To: <1071257158.3719.9.camel@ajax.kaisergroup.net>
References: <1071257158.3719.9.camel@ajax.kaisergroup.net>
Message-ID: <1071268782.22030.0.camel@nietzsche.fnal.gov>

Sorry, creating extra links where they don't belong.   Nevermind.

On Fri, 2003-12-12 at 13:25, Joseph L. Kaiser wrote:
> My install of 3.0.0 is crapping out here:
>
> "/usr/src/build/90289-i386/install// a x
>                   x usr/lib/anaconda/comps.py", line      a
> x
>                   x 153, in __getitem__                   a
> x
>                   x KeyError: PyXML                       #
> x
>                   x
>
>
> Even though PyXML is in the distribution I have built. Is there
> anything that can cause this other than the missing RPM?
>
> Thanks,
>
> Joe
--
===================================================================
Joe Kaiser - Systems Administrator

Fermi Lab
CD/OSS-SCS                Never laugh at live dragons.
630-840-6444
jlkaiser at fnal.gov
===================================================================
From jholland at cs.uh.edu Fri Dec 12 14:52:10 2003
From: jholland at cs.uh.edu (Jason Holland)
Date: Fri, 12 Dec 2003 16:52:10 -0600 (CST)
Subject: [Rocks-Discuss]Gig E on HP ZX6000
In-Reply-To:
<Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>
References: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>
Message-ID: <Pine.GSO.4.58.0312121650350.4139@leibnitz.cs.uh.edu>

Fred,

Try flipping the modules in /etc/modules.conf. Flip eth0 with eth1 so
that the gige interface comes up as eth0. Or, just turn off eth0
altogether with 'alias eth0 off'. I think thats the right syntax.

We have 60 zx6000's and I have personally have never found a way to
disable the port.

Jason P Holland
Texas Learning and Computation Center
http://www.tlc2.uh.edu
University of Houston
Philip G Hoffman Hall rm 207A
tel: 713-743-4850

On Fri, 12 Dec 2003, Fred P. Arnold wrote:

>   Hello,
>
>   I know this is a hardware question, not technically a Rocks one, but I
>   can't find the answer in my HP manuals:
>
>   On the ZX6000, there are two ethernet ports, a 10/100 basic/management
>   port, and a 1000 which is designated the primary interface.
>   Unfortunately, rocks always identifies the 10/100 as eth0.
>
>   Does anyone know how to disable the 10/100 on a ZX6000?   On an IA32, I'd
>   go into the bios, but these don't technically have one.   We'd like to run
>   ours on a pure Gig network.
>
>   Thanks.
>
>                                    -Fred
>
>                              Frederick P. Arnold, Jr.
>                              NUIT, Northwestern U.
>                              f-arnold at northwestern.edu
>


From jian at appro.com Fri Dec 12 17:27:51 2003
From: jian at appro.com (Jian Chang)
Date: Fri, 12 Dec 2003 17:27:51 -0800
Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro
Message-ID: <4AE58AD63966B24B99F95CA24C02EB1903414F@hawk.appro.com>

Hello Mason / Puru,

I got your contact information from Bryan Littlefield.
I would like to discuss with you regarding benchmark test systems you might need
down the road.
We can also share with you our findings as to what is compatible in the Opteron
systems.
Please reply with your phone number where I can reach you, and I will call
promptly.

Bryan,

Thank you for the referral.

Best regards,

Jian Chang
Regional Sales Manager
(408) 941-8100 x 202
(800) 927-5464 x 202
(408) 941-8111 Fax
jian at appro.com
www.appro.com

-----Original Message-----
From: Bryan Littlefield [mailto:bryan at UCLAlumni.net]
Sent: Tuesday, December 09, 2003 12:14 PM
To: npaci-rocks-discussion at sdsc.edu; mjk at sdsc.edu
Cc: Jian Chang
Subject: Rocks-Discuss] AMD Opteron - Contact Appro

Hi Mason,

I suggest contacting Appro. We are using Rocks on our Opteron cluster and Appro
would likely love to help. I will contact them as well to see if they could help
getting a opteron machine for testing. Contact info below:

Thanks --Bryan

Jian Chang - Regional Sales Manager
(408) 941-8100 x 202
(800) 927-5464 x 202
(408) 941-8111 Fax
jian at appro.com
http://www.appro.com

npaci-rocks-discussion-request at sdsc.edu wrote:


From: "Mason J. Katz"   <mailto:mjk at sdsc.edu> <mjk at sdsc.edu>
Subject: Re: [Rocks-Discuss]AMD Opteron
Date: Tue, 9 Dec 2003 07:28:51 -0800
To: "purushotham komaravolu"   <mailto:purikk at hotmail.com> <purikk at
hotmail.com>

We have a beta right now that we have sent to a few people. We plan on
a release this month, and AMD_64 will be part of this release along
with the usual x86, IA64 support.

If you want to help accelerate this process please talk to your vendor
about loaning/giving us some hardware for testing. Having access to a
variety of Opteron hardware (we own two boxes) is the only way we can
have good support for this chip.

   -mjk


On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:


Cc: <mailto:npaci-rocks-discussion at sdsc.edu> <npaci-rocks-discussion at
sdsc.edu>


Hello,
            I am a newbie to ROCKS cluster. I wanted to setup clusters
on
32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel
and
AMD).
I found the 64-bit download for Intel on the website but not for AMD.
Does
it work for AMD opteron? if not what is the ETA for AMD-64.
We are planning to but AMD-64 bit machines shortly, and I would like to
volunteer for the beta testing if needed.
Thanks
Regards,
Puru


_______________________________________________
npaci-rocks-discussion mailing list
npaci-rocks-discussion at sdsc.edu
http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion


End of npaci-rocks-discussion Digest

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031212/dec7e41b/attachment-0001.html

From landman at scalableinformatics.com Sat Dec 13 07:50:02 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Sat, 13 Dec 2003 10:50:02 -0500
Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0
Message-ID: <1071330602.4444.56.camel@protein.scalableinformatics.com>

Folks:

  Finally built the 2.4.23 kernel into an RPM via the RedHat tools.   Had
to hack up the spec file a bit, but you can see the results at

http://scalableinformatics.com/downloads/kernels/2.4.23/

These are 2.4.23 with the 2.4.24-pre1 patch (e.g. xfs is in there, woo
hoo!). I had to strip out most of the previous patches as they were
incompatible with .23 (and I don't want to spend time forward porting
them). The spec file, the sources, etc are released under the normal
licenses (GPL). No warranties, use at your own risk, and these are NOT
official Redhat kernels. Don't ask them for support for these, they
won't do it, and they will look at you funny.

That said, I had also checked out the cvs tree to start the "Carlson"
process :) indicated in the list a few months ago (see
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html)
to build a more customized distribution. I got to the

      Build the boot RPM

      cd rocks/src/rocks/boot
      make rpm

point, and lo and behold this is what I see ...

      rm version.mk
      rm arch
      rm -f /local/rocks/src/rocks/boot/.rpmmacros
      rm -f /usr/src/redhat/SOURCES/rocks-boot-3.1.0.tar
      rm -f /usr/src/redhat/SOURCES/rocks-boot-3.1.0.tar.gz
      ...

Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer
has a strong sense of urgency and little time to wait for an operational
cluster). I checked out the system from CVS earlier this week.

Is there any way to switch the build back to 3.0.0?   Or am I really out
of luck at this moment??? Clues/hints welcome.

These kernels might work, though I don't have a method to try them in
the distro yet. They work on the build machine.

      [root at head root]# uname -a
      Linux head.public 2.4.23-1 #1 SMP Sat Dec 13 14:41:06 GMT 2003    i686
unknown

      [root at head root]# rpm -qa | grep -i kernel
      kernel-2.4.23-1
      kernel-BOOT-2.4.23-1
      rocks-kernel-3.0.0-0
      pvfs-kernel-1.6.0-1
      kernel-doc-2.4.23-1
      kernel-source-2.4.23-1
      kernel-smp-2.4.23-1


The spec file is in the above download section, along with a .src.rpm
and other stuff. If anyone does have a clue as to how to build with
3.0.0 given the current cvs, or if there is a tagged set I needed to
get, please let me know.

Joe

--
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
   web: http://scalableinformatics.com
phone: +1 734 612 4615
From tim.carlson at pnl.gov Sat Dec 13 08:31:03 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Sat, 13 Dec 2003 08:31:03 -0800 (PST)
Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0
In-Reply-To: <1071330602.4444.56.camel@protein.scalableinformatics.com>
Message-ID: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>

On Sat, 13 Dec 2003, Joe Landman wrote:

> That said, I had also checked out the cvs tree to start the "Carlson"
> process :) indicated in the list a few months ago (see

yikes.. ! :)

>
> Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer
> has a strong sense of urgency and little time to wait for an operational
> cluster). I checked out the system from CVS earlier this week.

You needed to check out the 3.0.0 tagged version

ROCKS_3_0_0_i386

Off thread, but it would seem to me that the numbering scheme for ROCKS
got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new
3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3
based and the new 3.1 will be RH 3.0 based. Not that it matters. Just
curious.

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From phil at sdsc.edu Sat Dec 13 08:51:29 2003
From: phil at sdsc.edu (Philip Papadopoulos)
Date: Sat, 13 Dec 2003 08:51:29 -0800
Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0
In-Reply-To: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>
References: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>
Message-ID: <3FDB4391.4080405@sdsc.edu>



Tim Carlson wrote:

>On Sat, 13 Dec 2003, Joe Landman wrote:
>
>
>
>>That said, I had also checked out the cvs tree to start the "Carlson"
>>process :) indicated in the list a few months ago (see
>>
>>
>
>yikes.. ! :)
>
>
>
>>Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer
>>has a strong sense of urgency and little time to wait for an operational
>>cluster). I checked out the system from CVS earlier this week.
>>
>>
>
>You needed to check out the 3.0.0 tagged version
>
>ROCKS_3_0_0_i386
>
this is correct.

>
>Off thread, but it would seem to me that the numbering scheme for ROCKS
>got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new
>3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3
>based and the new 3.1 will be RH 3.0 based. Not that it matters. Just
>curious.
>
I blame Bruno ...
We moved to 3.0 because rolls is very different from the way 2.3.2 was
put together -- this
wasn't a minor change and so a subminor revision number didn't make sense.

 3.0 --> 3.1 change from 7.3 to recompiled RHEL, change from PBS as
default to
SGE as default.    .... OK, you could argue that this is also
a major change and shouldn't have a minor version #. We didn't want to
go from 3.0 to 4.0 for
some non-definable reasons :-), but mostly it's that 3.0 and 3.1 feel
pretty similar in terms of the
way they are put together (with rolls).
-P

>
>Tim
>
>Tim Carlson
>Voice: (509) 376 3423
>Email: Tim.Carlson at pnl.gov
>EMSL UNIX System Support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031213/69aa41fa/attachment-0001.html

From landman at scalableinformatics.com Sat Dec 13 11:14:51 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Sat, 13 Dec 2003 14:14:51 -0500
Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0
In-Reply-To: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>
References: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>
Message-ID: <1071342891.4445.58.camel@protein.scalableinformatics.com>

Thanks. Magic incantations, and I have the "Carlson" process
implemented. Ok, next step is the roll-my-own ... more later


On Sat, 2003-12-13 at 11:31, Tim Carlson wrote:
> On Sat, 13 Dec 2003, Joe Landman wrote:
>
> > That said, I had also checked out the cvs tree to start the "Carlson"
> > process :) indicated in the list a few months ago (see
>
> yikes.. ! :)
>
> >
> > Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer
> > has a strong sense of urgency and little time to wait for an operational
> > cluster). I checked out the system from CVS earlier this week.
>
> You needed to check out the 3.0.0 tagged version
>
> ROCKS_3_0_0_i386
>
> Off thread, but it would seem to me that the numbering scheme for ROCKS
> got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new
> 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3
> based and the new 3.1 will be RH 3.0 based. Not that it matters. Just
> curious.
>
> Tim
>
> Tim Carlson
> Voice: (509) 376 3423
> Email: Tim.Carlson at pnl.gov
> EMSL UNIX System Support



From wyzhong78 at msn.com Mon Dec 15 00:02:15 2003
From: wyzhong78 at msn.com (zhong wenyu)
Date: Mon, 15 Dec 2003 16:02:15 +0800
Subject: [Rocks-Discuss]about add-extra-nic
Message-ID: <BAY3-F40JRkRy9Iwgel00056a6d@hotmail.com>

Hi,everyone!
my compute node'mb is msi 9141,on which there are one 1000M nic and one
100M nic.I plan to use 100M net to control and 1000M for application.so I
use 100M switch to connect compute nodes from frontend,a 1000M switch to
connect compute nodes each other not include frontend.
 at my first time to install the compute node,i found it "waiting for dhcp
ip information" to long,and ican not finish a installing.i think the 1000M
nic must answer for it.so i disabled it in BIOS.after that,the installing
worked,the compute nodes appeared.
 then i want to add the extar nic.i use the command and shoo-node,the
compute node go to rebooting(between the rebooting i enabled the nic) ,and
come into "waiting for dhcp ip information" again.
so i disabled it again and restart, the node reinstall all right, finished
with no trouble.even i can see the boot message "start eth1....[ok]"!but i
can only found error by use "ifconfig eth1" even after i enable the 1000M
nic again!
thanks and regards!

_________________________________________________________________
?????????????? MSN Messenger: http://messenger.msn.com/cn



From Roy.Dragseth at cc.uit.no Mon Dec 15 02:31:51 2003
From: Roy.Dragseth at cc.uit.no (Roy Dragseth)
Date: Mon, 15 Dec 2003 11:31:51 +0100
Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends?
In-Reply-To: <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu>
References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu>
<ddptix48s6.fsf@oecpc11.ucsd.edu> <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu>
Message-ID: <200312151131.51410.Roy.Dragseth@cc.uit.no>

Hi.

I've been running a setup like this for something like this for over a year
now, it will not (ever?) work right out of the box due to some kernel
problems.

rocks-dist --arch ia64   dist

will most likely crash a ia32 frontend. The ia32 kernel doesn't like to mount
a cramfs image generated on a ia64 machine, it gives me a kernel panic.

Here is a rough guide to get this kind of setup going.

1. Setup the ia32 as usual, but allow root write access to /export by
inserting "no_root_squash" as an option in /etc/exports.

2. create a "fake" ia64 frontend using one of the ia64 nodes, let it configure
eth0 by dhcp an let the ia32 frontend think it is a compute node.

3. on the fake frontend you turn off the nis daemons except ypbind.

4. edit /etc/auto.home to mount /home from the ia32 frontend and restart
autofs.

5. on the fake frontend you do a rocks-dist copycd to dump the ia64 dvd into
the /home/install.

6. Now you can do a rocks-dist dist on the ia64 box.

7. At last you need a symlink to make the ia32 frontend happy:
      ln -s enterprise/2.1AW/en/os/ia64 rocks-dist/7.3/en/os/ia64

Now you can boot up your ia64 nodes from the ia32 frontend.
After you are confident that your ia64 nodes are installed correctly you can
reinstall the ia64 frontend as a regular compute node. Subsequent rocks-dist
dist can be run on any ia64 compute node as long as it has the
anaconda-runtime and rocks-dist rpms installed.

Hope this helps,
r.


--

  The Computer Center, University of Troms?, N-9037 TROMS? Norway.
            phone:+47 77 64 41 07, fax:+47 77 64 41 00
     Roy Dragseth, High Performance Computing System Administrator
       Direct call: +47 77 64 62 56. email: royd at cc.uit.no



From Roy.Dragseth at cc.uit.no Mon Dec 15 04:28:15 2003
From: Roy.Dragseth at cc.uit.no (Roy Dragseth)
Date: Mon, 15 Dec 2003 13:28:15 +0100
Subject: [Rocks-Discuss]Gig E on HP ZX6000
In-Reply-To: <Pine.GSO.4.58.0312121650350.4139@leibnitz.cs.uh.edu>
References: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu>
<Pine.GSO.4.58.0312121650350.4139@leibnitz.cs.uh.edu>
Message-ID: <200312151328.15826.Roy.Dragseth@cc.uit.no>

I had similar problems on our HP rx2600 boxes and found a way to make the
kernel ignore the 100Mb/s NIC by adding this append line in elilo.conf:

append="reserve=0xd00,64"

See my post
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003483.html

for details on how to figure out this parameter.

Remark: this has to be modified both in the elilo.conf and elilo-ks.conf in
/boot/efi/efi/redhat/. The problem is that cluster-kickstart overwrites
these files at every reboot and the setup is hardcoded into the
cluster-kickstart executable so you need to figure out a way to work around
this. I grabbed cluster-kickstart.c from cvs, did the neccessary mods to it
and installed the new one on every compute node.

r.

--

  The Computer Center, University of Troms?, N-9037 TROMS? Norway.
            phone:+47 77 64 41 07, fax:+47 77 64 41 00
     Roy Dragseth, High Performance Computing System Administrator
       Direct call: +47 77 64 62 56. email: royd at cc.uit.no



From fds at sdsc.edu Mon Dec 15 11:31:01 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 15 Dec 2003 11:31:01 -0800
Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0
In-Reply-To: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>
References: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov>
Message-ID: <37508BEC-2F35-11D8-804D-000393A4725A@sdsc.edu>

We did indeed move version scheming. We used to be "Redhat minus 5" so
a RH 7.3-based Rocks was called 2.3.x. This became mute when Redhat
quickly went from 8 to 9 to Enterprise 3. So we decided to be selfish
and move to 3.0.0 when we made a big internal change (Rolls and the end
of monolithic Rocks).

3.1.0 is an minor number revision, which corresponds to how much has
changed in the Rocks code, not the underlying Redhat system. A bugfix
release would be 3.1.1, etc...

We hope this versioning scheme will be more resilient to linux system
changes (which are out of our control), while keeping the focus on the
Rocks structure.


On Dec 13, 2003, at 8:31 AM, Tim Carlson wrote:

> Off thread, but it would seem to me that the numbering scheme for ROCKS
> got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new
> 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH
> 7.3
> based and the new 3.1 will be RH 3.0 based. Not that it matters. Just
> curious.
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From jlkaiser at fnal.gov Mon Dec 15 11:43:43 2003
From: jlkaiser at fnal.gov (Joseph L. Kaiser)
Date: Mon, 15 Dec 2003 13:43:43 -0600
Subject: [Rocks-Discuss]problem forcing a kernel
Message-ID: <1071517423.3719.4.camel@ajax.kaisergroup.net>

Hi,

I am trying to install this kernel:

kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following
whether I put it in the force directory of my distro or the regular RPMS
directory or contrib:

During package installation it gives me this:


/mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be
opened. This is due to a missing file, a bad package, or bad media.
Press <return> to try again.


The file is there. The media is the network.    I have installed the
package on other systems by hand. Any ideas?

Thanks,

Joe
From tmartin at physics.ucsd.edu Mon Dec 15 15:58:51 2003
From: tmartin at physics.ucsd.edu (Terrence Martin)
Date: Mon, 15 Dec 2003 15:58:51 -0800
Subject: [Rocks-Discuss]removing a node from the cluster
Message-ID: <3FDE4ABB.6030302@physics.ucsd.edu>

How does one go about removing a node from the cluster? Is there a
straight forward way to do this?

Terrence



From ebpeele2 at pams.ncsu.edu Mon Dec 15 16:42:47 2003
From: ebpeele2 at pams.ncsu.edu (Elliot Peele)
Date: Mon, 15 Dec 2003 19:42:47 -0500
Subject: [Rocks-Discuss]removing a node from the cluster
In-Reply-To: <3FDE4ABB.6030302@physics.ucsd.edu>
References: <3FDE4ABB.6030302@physics.ucsd.edu>
Message-ID: <1071535367.1871.1.camel@localhost.localdomain>

insert-ethers --replace hostname

Select compute from the menu then exit insert-ethers.

Elliot

On Mon, 2003-12-15 at 18:58, Terrence Martin wrote:
> How does one go about removing a node from the cluster? Is there a
> straight forward way to do this?
>
> Terrence
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031215/
ebf9581b/attachment-0001.bin

From phil at sdsc.edu Mon Dec 15 16:44:29 2003
From: phil at sdsc.edu (Philip Papadopoulos)
Date: Mon, 15 Dec 2003 16:44:29 -0800
Subject: [Rocks-Discuss]removing a node from the cluster
In-Reply-To: <3FDE4ABB.6030302@physics.ucsd.edu>
References: <3FDE4ABB.6030302@physics.ucsd.edu>
Message-ID: <3FDE556D.4040100@sdsc.edu>

insert-ethers --replace "compute-0-0"
select "compute" from the menu
and then hit f1 to exit.

This will re-create all of the files that have host names and remove
the node (you are essentially replacing the node named "compute-0-0" with
the empty set).

PBS will likely   be unhappy with this change -- If I remember correctly,
it has an
additional file that it creates when a node is added to the queuing
system -- when the
node doesn't appear in the host table, it gets cranky. You should look in
/opt/OpenPBS/server_priv/nodes to solve this problem -- suppose you want
to delete
compute-0-0.

# qmgr -c "delete node compute-0-0"
# insert-ethers --replace "compute-0-0"


-P




Terrence Martin wrote:

> How does one go about removing a node from the cluster? Is there a
> straight forward way to do this?
>
> Terrence


--
==   Philip Papadopoulos, Ph.D.
==   Program Director for                 San Diego Supercomputer Center
==      Grid and Cluster Computing        9500 Gilman Drive
==   Ph: (858) 822-3628                   University of California, San Diego
==   FAX: (858) 822-5407                  La Jolla, CA 92093-0505




From gotero at linuxprophet.com Mon Dec 15 16:52:23 2003
From: gotero at linuxprophet.com (Glen Otero)
Date: Mon, 15 Dec 2003 16:52:23 -0800
Subject: [Rocks-Discuss]removing a node from the cluster
In-Reply-To: <1071535367.1871.1.camel@localhost.localdomain>
References: <3FDE4ABB.6030302@physics.ucsd.edu>
<1071535367.1871.1.camel@localhost.localdomain>
Message-ID: <1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com>

On Dec 15, 2003, at 4:42 PM, Elliot Peele wrote:

> insert-ethers --replace hostname
>
> Select compute from the menu then exit insert-ethers.

Then run:

# insert-ethers --update

to update the database

Check the database entries with:
# dbreport hosts

Glen

>
> Elliot
>
> On Mon, 2003-12-15 at 18:58, Terrence Martin wrote:
>> How does one go about removing a node from the cluster? Is there a
>> straight forward way to do this?
>>
>> Terrence
>>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772



From landman at scalableinformatics.com Mon Dec 15 17:13:29 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 15 Dec 2003 20:13:29 -0500
Subject: [Rocks-Discuss]removing a node from the cluster
In-Reply-To: <1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com>
References: <3FDE4ABB.6030302@physics.ucsd.edu>
<1071535367.1871.1.camel@localhost.localdomain>
<1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com>
Message-ID: <3FDE5C39.1030503@scalableinformatics.com>

Harumph:

       rmnode nasty_compute_node
       insert-ethers --update

(rmnode at   http://scalableinformatics.com/downloads/rmnode.gz).

I thought insert-ethers had a simple version of this.   All rmnode is, is
a hacked version of one of the other rocks tools.

Joe



Glen Otero wrote:

>
> On Dec 15, 2003, at 4:42 PM, Elliot Peele wrote:
>
>> insert-ethers --replace hostname
>>
>> Select compute from the menu then exit insert-ethers.
>
>
> Then run:
>
> # insert-ethers --update
>
> to update the database
>
> Check the database entries with:
>
> # dbreport hosts
>
> Glen
>
>>
>> Elliot
>>
>> On Mon, 2003-12-15 at 18:58, Terrence Martin wrote:
>>
>>> How does one go about removing a node from the cluster? Is there a
>>> straight forward way to do this?
>>>
>>> Terrence
>>>
> Glen Otero, Ph.D.
> Linux Prophet
> 619.917.1772


--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615




From csamuel at vpac.org Mon Dec 15 18:06:47 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Tue, 16 Dec 2003 13:06:47 +1100
Subject: [Rocks-Discuss]ScalablePBS.
In-Reply-To: <B83C8894-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu>
References: <200311212352.27000.Roy.Dragseth@cc.uit.no> <B83C8894-2CD7-11D8-
A2DC-000A95DA5638@sdsc.edu>
Message-ID: <200312161306.55651.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 13 Dec 2003 06:16 am, Mason J. Katz wrote:

>   This should become the basis of the PBS roll (currently openpbs). We
>   are seeking developers who would like to help write and maintain this
>   -- I'm not singling you out Roy, although you would be more than
>   welcome, rather I'm taking advantage of your message to solicit other
>   volunteers. Anyone?

I think we might be interested in getting involved with this, we migrated from
OpenPBS to ScalablePBS some time ago and spent quite a bit of time tracking
down memory leaks and the like with DJ and friends at SuperCluster.

We've also started using Rocks on a cluster that we manage for one of our
member institutions and a colleague of mine is having fun trying to get it to
go onto an Itanium cluster at the moment plus we should have some Opteron
boxes arriving in a month or so for a mini-cluster which we'd like to run
Rocks on.

Currently we install Rocks on the cluster and then remove PBS and MAUI RPM's
and install SPBS and the 3.2.6 version of MAUI we have access to, so a
version that came with SPBS ready to go would make life a lot simpler for us.
:-)

cheers!
Chris
- --
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/3mi3O2KABBYQAh8RAuSLAJ9Bx/5aCF8kRjHFapUpiASQUJeCTwCcD9y7
Y/ZM38t0J8r5dAYj1MdiUWA=
=bCIS
-----END PGP SIGNATURE-----



From bruno at rocksclusters.org Mon Dec 15 18:30:03 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 15 Dec 2003 18:30:03 -0800
Subject: [Rocks-Discuss]removing a node from the cluster
In-Reply-To: <3FDE5C39.1030503@scalableinformatics.com>
References: <3FDE4ABB.6030302@physics.ucsd.edu>
<1071535367.1871.1.camel@localhost.localdomain>
<1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com>
<3FDE5C39.1030503@scalableinformatics.com>
Message-ID: <C13C5DE4-2F6F-11D8-B821-000A95C4E3B4@rocksclusters.org>

>   Harumph:
>
>          rmnode nasty_compute_node
>          insert-ethers --update
>
>   (rmnode at   http://scalableinformatics.com/downloads/rmnode.gz).
>
>   I thought insert-ethers had a simple version of this.   All rmnode is,
>   is a hacked version of one of the other rocks tools.

actually, since v3.0.0, i think it does:

http://www.rocksclusters.org/rocks-documentation/3.0.0/faq-
configuration.html#REMOVE-NODE

    - gb



From bruno at rocksclusters.org Mon Dec 15 19:40:49 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 15 Dec 2003 19:40:49 -0800
Subject: [Rocks-Discuss]problem forcing a kernel
In-Reply-To: <1071517423.3719.4.camel@ajax.kaisergroup.net>
References: <1071517423.3719.4.camel@ajax.kaisergroup.net>
Message-ID: <A3F73894-2F79-11D8-B821-000A95C4E3B4@rocksclusters.org>

>   I am trying to install this kernel:
>
>   kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following
>   whether I put it in the force directory of my distro or the regular
>   RPMS
>   directory or contrib:
>
>   During package installation it gives me this:
>
>
>   /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be
>   opened. This is due to a missing file, a bad package, or bad media.
>   Press <return> to try again.
>
>
>   The file is there. The media is the network.    I have installed the
>   package on other systems by hand. Any ideas?

just to be sure, do you run the following after you copy the RPM into
the force directory:

          # cd /home/install
          # rocks-dist dist

    - gb



From bruno at rocksclusters.org Mon Dec 15 19:56:51 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 15 Dec 2003 19:56:51 -0800
Subject: [Rocks-Discuss]Adding partitions that are not reformatted under hard boots
or shoot-node
In-Reply-To: <3FD68B06.9010709@phys.ufl.edu>
References: <3FD68B06.9010709@phys.ufl.edu>
Message-ID: <E12881B4-2F7B-11D8-B821-000A95C4E3B4@rocksclusters.org>

sorry for the late response.

i recently tested the manual partitioning procedure on our upcoming
release and there was a bug. a fix has been committed for the next
release -- so manual partitioning will work on 3.1.0 as explained in
the 3.0.0 documentation.

    - gb


On Dec 9, 2003, at 6:55 PM, Jorge L. Rodriguez wrote:

>   Hi,
>
>   How do I add an extra partition to my compute nodes and retain the
>   data on all non / partitions when system hard boots or is shot?
>   I tried the suggestion in the documentation under "Customizing your
>   ROCKS Installation" where you replace the auto-partition.xml but hard
>   boots or shoot-nodes on these reformat all partitions instead of just
>   the /. I have also tried to modify the installclass.xml so that an
>   extra partition is added into the python code see below. This does
>   mostly what I want but now I can't shoot-node even though a hard boot
>   reinstalls without reformatting all but /. Is this the right approach?
>   I'd rather avoid having to replace installclass since I don't really
>   want to partition all nodes this way but if I must I will.
>
>   Jorge
>
>                         #
>                         # set up the root partition
>                         #
>                         args = [ "/" , "--size" , "4096",
>                                 "--fstype", "&fstype;",
>                                 "--ondisk", devnames[0] ]
>                         KickstartBase.definePartition(self, id, args)
>
>   # ---- Jorge, I added this args
>                         args = [ "/state/partition1" , "--size" ,
>   "55000",
>                                 "--fstype", "&fstype;",
>                                 "--ondisk", devnames[0] ]
>                         KickstartBase.definePartition(self, id, args)
>   # -----
>                         args = [ "swap" , "--size" , "1000",
>                                 "--ondisk", devnames[0] ]
>                         KickstartBase.definePartition(self, id, args)
>
>                         #
>                         # greedy partitioning
>                         #
>   # ----- Jorge, I change this from i = 1
>                         i = 2
>   # -----
>                         for devname in devnames:
>                                 partname = "/state/partition%d" % (i)
>                                 args = [ partname, "--size", "1",
>                                         "--fstype", "&fstype;",
>                                         "--grow", "--ondisk", devname ]
>                                 KickstartBase.definePartition(self, id,
>   args)
>
>                                 i = i + 1
>
>
>



From jlkaiser at fnal.gov Mon Dec 15 20:17:52 2003
From: jlkaiser at fnal.gov (Joseph L. Kaiser)
Date: Mon, 15 Dec 2003 22:17:52 -0600
Subject: [Rocks-Discuss]problem forcing a kernel
In-Reply-To: <A3F73894-2F79-11D8-B821-000A95C4E3B4@rocksclusters.org>
References: <1071517423.3719.4.camel@ajax.kaisergroup.net>
 <A3F73894-2F79-11D8-B821-000A95C4E3B4@rocksclusters.org>
Message-ID: <1071548271.3720.0.camel@ajax.kaisergroup.net>

yup
On Mon, 2003-12-15 at 21:40, Greg Bruno wrote:
> > I am trying to install this kernel:
> >
> > kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following
> > whether I put it in the force directory of my distro or the regular
> > RPMS
> > directory or contrib:
> >
> > During package installation it gives me this:
> >
> >
> > /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be
> > opened. This is due to a missing file, a bad package, or bad media.
> > Press <return> to try again.
> >
> >
> > The file is there. The media is the network. I have installed the
> > package on other systems by hand. Any ideas?
>
> just to be sure, do you run the following after you copy the RPM into
> the force directory:
>
>     # cd /home/install
>     # rocks-dist dist
>
>   - gb
>



From Roy.Dragseth at cc.uit.no Tue Dec 16 02:13:50 2003
From: Roy.Dragseth at cc.uit.no (Roy Dragseth)
Date: Tue, 16 Dec 2003 11:13:50 +0100
Subject: [Rocks-Discuss]ScalablePBS.
In-Reply-To: <B83C8894-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu>
References: <200311212352.27000.Roy.Dragseth@cc.uit.no> <B83C8894-2CD7-11D8-
A2DC-000A95DA5638@sdsc.edu>
Message-ID: <200312161113.50076.Roy.Dragseth@cc.uit.no>

On Friday 12 December 2003 20:16, Mason J. Katz wrote:
> This should become the basis of the PBS roll (currently openpbs). We
> are seeking developers who would like to help write and maintain this
> -- I'm not singling you out Roy, although you would be more than
> welcome, rather I'm taking advantage of your message to solicit other
> volunteers. Anyone?
>

I talked to my boss and he gave me thumbs up, so I'll be glad to take care of
the Maui/PBS roll of rocks.

I'd love to see some more hands in the air as maintainers/testers...

r.


--

  The Computer Center, University of Troms?, N-9037 TROMS? Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
     Roy Dragseth, High Performance Computing System Administrator
       Direct call: +47 77 64 62 56. email: royd at cc.uit.no



From daniel.kidger at quadrics.com Tue Dec 16 07:08:44 2003
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Tue, 16 Dec 2003 15:08:44 +0000
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
In-Reply-To:
<20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>
References:
<20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>
Message-ID: <3FDF1FFC.60501@quadrics.com>

Glen et al.

>I recently had the same problem when building a quadrics cluster on Rocks 2.3.2
>with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The problem is
>definitely in the naming of the rpms, in that anaconda running on the compute
>nodes is not going to recognize kernel rpms that begin with 'qsnet' as potential
>boot options. Unfortunately, being under a severe time contraint, I resorted to
>manually installing the qsnet kernel on all nodes of the cluster, which isn't
>the Rocks way. The long term solution is to mangle the kernel makefiles so that
>the qsnet kernel rpms have conventional kernel rpm names, which is what Greg's
>post referred to.

    I have been thinking about this.

I reckon that the long term solution is *not* to rename the kernel that
we use. (nor indeed to change the naming convention of any other kernels
that people want to work on).   As well as the triplet version numbering
and the architecture, the kernel naming that we use includes the kernel
source tree (Redhat, Suse, LSY, Vanilia, ..) and our partch level
version numering triplet.
   Quadrics cannot be the only people who need freedom to include extra
information in our naming convention for kernels.
The solution must lie in either annaconda itself or more likely a
cleaner way to include extra kernel(s) as well as the stock one in the
compute node install process.
Using extend-nodes.xml this works apart from niggles about the
/boot/grub/menu.lst that our kernel post-instal;l configures getting
clobbered by Rocks.

Yours,
Daniel.


gotero at linuxprophet.com wrote:

>Daniel-
>
>
>

--
Yours,
Daniel.
--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------




From mjk at sdsc.edu Tue Dec 16 07:09:56 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 16 Dec 2003 07:09:56 -0800
Subject: [Rocks-Discuss]ScalablePBS.
In-Reply-To: <200312161113.50076.Roy.Dragseth@cc.uit.no>
References: <200311212352.27000.Roy.Dragseth@cc.uit.no> <B83C8894-2CD7-11D8-
A2DC-000A95DA5638@sdsc.edu> <200312161113.50076.Roy.Dragseth@cc.uit.no>
Message-ID: <E89F1F82-2FD9-11D8-A2DC-000A95DA5638@sdsc.edu>

Fanstastic! I think this puts us at three people that have volunteered
to help out on this. I will followup on this and help organize,
support, and do some of the development also. But I'm going to push
this back until after we get 3.1 out which looks like monday.

     -mjk

On Dec 16, 2003, at 2:13 AM, Roy Dragseth wrote:

> On Friday 12 December 2003 20:16, Mason J. Katz wrote:
>> This should become the basis of the PBS roll (currently openpbs). We
>> are seeking developers who would like to help write and maintain this
>> -- I'm not singling you out Roy, although you would be more than
>> welcome, rather I'm taking advantage of your message to solicit other
>> volunteers. Anyone?
>>
>
> I talked to my boss and he gave me thumbs up, so I'll be glad to take
> care of
> the Maui/PBS roll of rocks.
>
> I'd love to see some more hands in the air as maintainers/testers...
>
> r.
>
>
> --
>
>    The Computer Center, University of Troms?, N-9037 TROMS? Norway.
>            phone:+47 77 64 41 07, fax:+47 77 64 41 00
>       Roy Dragseth, High Performance Computing System Administrator
>       Direct call: +47 77 64 62 56. email: royd at cc.uit.no



From mjk at sdsc.edu Tue Dec 16 07:37:04 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 16 Dec 2003 07:37:04 -0800
Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0)
In-Reply-To: <3FDF1FFC.60501@quadrics.com>
References:
<20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>
<3FDF1FFC.60501@quadrics.com>
Message-ID: <B3192AFA-2FDD-11D8-A2DC-000A95DA5638@sdsc.edu>

  If you rename the linux kernel to include other arbitrary strings the
RedHat Kickstart installer will not recognize it as a kernel. This
means you loose probing for the correct x86 cpu (386/486/585/686) and
probing for SMP vs. uni. This implies you would need to re-write the
anaconda code to do this for arbitrarily named packages, if you could
convince RedHat to do this great, but it's not worth our development
time to do this ourselves when properly named kernel packages work
wonderfully. The unfortunate reality is the kernel RPM is not just
another package -- it has some special installation logic to optimize
for you hardware. Sure they could have done this better, but they do a
darn good job as is.

This is not a Rocks issue, it means you have   created a package that
does not work with RedHat. I understand why    you need to include extra
strings in the kernel name, but suggest that   there are several
alternatives to this that don't break RedHat   kickstart. For example,
you could:

      - Write a kernel version module to report on /proc/qsnet_kernel the
same information.

     - Have the kernel RPM install a /usr/doc/qsnet/VERSION file

      - Have a subpackage of the kernel rpm that include the extra strings
(and extra docs).

      - Stop patching the kernel and only use a module. True some things
require kernel patches, but almost all driver changes can go into
modules only. This was not always true a few years ago, the module
system has improved a lot.

We've faced numerous issues like this with RedHat in creating Rocks,
and for every issue we have found a work around that keeps us w/in the
RedHat way of doing things. This is not always optimal for development
but always yields a simpler, and more supportable, system.

     -mjk


On Dec 16, 2003, at 7:08 AM, Dan Kidger wrote:

> Glen et al.
>
>> I recently had the same problem when building a quadrics cluster on
>> Rocks 2.3.2
>> with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The
>> problem is
>> definitely in the naming of the rpms, in that anaconda running on the
>> compute
>> nodes is not going to recognize kernel rpms that begin with 'qsnet'
>> as potential
>> boot options. Unfortunately, being under a severe time contraint, I
>> resorted to
>> manually installing the qsnet kernel on all nodes of the cluster,
>> which isn't
>> the Rocks way. The long term solution is to mangle the kernel
>> makefiles so that
>> the qsnet kernel rpms have conventional kernel rpm names, which is
>> what Greg's
>> post referred to.
>
>     I have been thinking about this.
>
> I reckon that the long term solution is *not* to rename the kernel
> that we use. (nor indeed to change the naming convention of any other
> kernels that people want to work on).     As well as the triplet version
> numbering and the architecture, the kernel naming that we use includes
> the kernel source tree (Redhat, Suse, LSY, Vanilia, ..) and our partch
> level version numering triplet.
>    Quadrics cannot be the only people who need freedom to include extra
> information in our naming convention for kernels.
> The solution must lie in either annaconda itself or more likely a
> cleaner way to include extra kernel(s) as well as the stock one in the
> compute node install process. Using extend-nodes.xml this works apart
> from niggles about the /boot/grub/menu.lst that our kernel
> post-instal;l configures getting clobbered by Rocks.
>
> Yours,
> Daniel.
>
>
> gotero at linuxprophet.com wrote:
>
>> Daniel-
>>
>>
>
> --
> Yours,
> Daniel.
>
> --------------------------------------------------------------
> Dr. Dan Kidger, Quadrics Ltd.        daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK           0117 915 5505
> ----------------------- www.quadrics.com --------------------
>



From dtwright at uiuc.edu Tue Dec 16 11:45:55 2003
From: dtwright at uiuc.edu (Dan Wright)
Date: Tue, 16 Dec 2003 13:45:55 -0600
Subject: [Rocks-Discuss]a minor ganglia question
Message-ID: <20031216194554.GH26246@uiuc.edu>

Hello all,

I'm in the process of setting up a 3.0.0 cluster and have a question about the
"Physical view" in ganglia. In this view (which is quite cool BTW :) is shows
higher-numbered nodes on top and lower-numbered nodes on bottom:

compute-0-12
...
compute-0-2
compute-0-1
compute-0-0

and my cluster is physically reversed from that:

compute-0-0
compute-0-1
compute-0-2
...
compute-0-12

Is there an easy way to switch this display around so it matches the real
physical layout? I poked around and ganglia for a few minutes and didn't see
anything obvious, so I thought I'd ask before I actually start wasting time on
this :)

Thanks,

- Dan Wright
(dtwright at uiuc.edu)
(http://www.scs.uiuc.edu/)
(UNIX Systems Administrator, School of Chemical Sciences, UIUC)
(333-1728)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031216/28f3eb5a/attachment-0001.bin

From purikk at hotmail.com Tue Dec 16 12:34:51 2003
From: purikk at hotmail.com (Purushotham Komaravolu)
Date: Tue, 16 Dec 2003 15:34:51 -0500
Subject: [Rocks-Discuss]hardware-setup for the Rocks cluster
References: <200312162016.hBGKGuJ05160@postal.sdsc.edu>
Message-ID: <BAY1-DAV575EPSM0omP0000cb94@hotmail.com>

Hi All,
       We are trying to setup rocks cluster with 1 front and 20 computing
nodes.
Frontend:
 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache
  2) Dual port Gigabit Ethernet
  3) 1 GB DDR RAM
   4) 3* 200 GB EIDE ULTRA ATA 100

Compute nodes:
     1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache
  2) Dual port Gigabit Ethernet
  3) 1 GB DDR RAM
   4) 41 GB UDMA EIDE
1 HP Procurve 24 port switch


Does the setup look ok?
Does Rocks support the following features
Remote power monitoring for individual nodes

*Temperature monitoring of individual processors

*Power sequencing on startup to prevent possible power spiking

*Remote power-down and reset of system and nodes

*Serial access to nodes

*Disk cloning

*Plug-In Extensible Architecture

*Image Manager

and also

How should be the disk setup, does all the disks need to be attached to
frontend and compute nodes have small 3 or 4 GB disks?

Can someone point me to a clustering software which supports all above
features if Rocks does'nt support them.

thanks a lot

Regards,

Puru




From purikk at hotmail.com Tue Dec 16 12:39:19 2003
From: purikk at hotmail.com (Purushotham Komaravolu)
Date: Tue, 16 Dec 2003 15:39:19 -0500
Subject: [Rocks-Discuss]Java Rocks cluster
Message-ID: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com>

I am a newbie to ROCKS
I have a question about running Java on a Rockster.
 Is it possible that I can start only one JVM on one machine and the
 task be run distributed on the cluster? It is a multi-threaded application.
Like say, I have an application with 100 threads. Can I have 50 threads run on one
machine and 50 on another by launching the application(jvm) on one machine?(similar
to SUN Firebird) I dont want to use MPI or any special code.
Thanks
Sincerely
Puru
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031216/ee12ac80/attachment-0001.html

From mjk at sdsc.edu   Tue Dec 16 13:20:24 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 16 Dec 2003 13:20:24 -0800
Subject: [Rocks-Discuss]Java Rocks cluster
In-Reply-To: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com>
References: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com>
Message-ID: <A9849F18-300D-11D8-A2DC-000A95DA5638@sdsc.edu>

There are a few research projects that do map java threads onto cluster
compute nodes processes. At the IEEE Cluster '03 conference a couple
weeks ago in Hong Kong there were a few interesting Java talks on this
subject. You can see the schedule at the following link and do some
google research for more info. I think the papers will be online
soon...

http://www.csis.hku.hk/cluster2003/advance-program.html

Rocks 3.1 will include a Java Roll, but this is nothing more than Sun's
Java sdk/rte and doesn't do any cluster magic for you.


       -mjk

On Dec 16, 2003, at 12:39 PM, Purushotham Komaravolu wrote:

>   I am a newbie to ROCKS
>   I have a question about running Java on a?Rockster.
>   ?Is it possible that I can start only one JVM on one machine and the
>   ?task?be run distributed on the cluster? It is a multi-threaded
>   application.?
>   Like say, I have an application with 100 threads.?Can I have 50
>   threads run on one machine and 50 on another by?launching the
>   application(jvm) on one machine?(similar to SUN Firebird)?I dont want
>   to use MPI or any?special code.
>   Thanks
>   Sincerely
>   Puru



From phil at sdsc.edu Tue Dec 16 13:38:48 2003
From: phil at sdsc.edu (Philip Papadopoulos)
Date: Tue, 16 Dec 2003 13:38:48 -0800
Subject: [Rocks-Discuss]hardware-setup for the Rocks cluster
In-Reply-To: <BAY1-DAV575EPSM0omP0000cb94@hotmail.com>
References: <200312162016.hBGKGuJ05160@postal.sdsc.edu> <BAY1-
DAV575EPSM0omP0000cb94@hotmail.com>
Message-ID: <3FDF7B68.3030302@sdsc.edu>


Purushotham Komaravolu wrote:

>Hi All,
>        We are trying to setup rocks cluster with 1 front and 20 computing
>nodes.
>Frontend:
> 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache
> 2) Dual port Gigabit Ethernet
> 3) 1 GB DDR RAM
>   4) 3* 200 GB EIDE ULTRA ATA 100
>
>Compute nodes:
>     1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache
> 2) Dual port Gigabit Ethernet
> 3) 1 GB DDR RAM
>   4) 41 GB UDMA EIDE
>1 HP Procurve 24 port switch
>
>
>Does the setup look ok?
>
Setup looks fine.

>
>
>Does Rocks support the following features
>Remote power monitoring for individual nodes
>
>*Temperature monitoring of individual processors
>
Not directly -- there isn't a completely general solution to this --
though lmsensors
is good for non-server boards. However, nothing prevents you from
adding the
proper software. It's fairly easy to add metrics to ganglia if you have
the baseline drivers
for your particular temp monitoring sw.

>
>*Power sequencing on startup to prevent possible power spiking
>
>*Remote power-down and reset of system and nodes
>
>*Serial access to nodes
>
All of these generally require another network (serial, lights-out
management, etc).
We don't assume any of these extra networks exist. Again, layering that
functionality
a top of rocks is very very straightforward. See the FAQ for how to add
packages to nodes.

>
>*Disk cloning
>
No. Emphatically No. Disk cloning is not anywhere in the rocks vocabulary.
We have distributions (Redhat + Rocks + Cluster tools + your own
software) and
a way to generate a kickstart file in a programatic way. Disk cloning
assumes homogeneity
of hardware (we don't), requires a custom after market installer to fix
up a node after
an image is put on a node (we use Redhat as the installer), requires a
completely different
image for every different functional type of node (frontend, compute,
nfs, web, pvfs, etc).

>
>*Plug-In Extensible Architecture
>
Uh. Yeah. That's the whole point. Again see the FAQ of how you add
packages.
Rolls is an additional extension mechanism that allows you to add
larger chunks of functionality
at Cluster build time. We extend base rocks with Grid Software,
Schedulers, Java, and
community-specific software stacks. You should wait (about 5 days) for
the final
release of 3.1.0 to see how rolls works.

>
>*Image Manager
>
Definitely No. There are no images in Rocks. We have distributions and
appliance types.
A graph description of appliances is melded with distributions to define
a complete
node. Shared configuration is truly shared. None of that happens with
images -- the base
software and the configuration are locked together.

>
>and also
>
>How should be the disk setup, does all the disks need to be attached to
>frontend and compute nodes have small 3 or 4 GB disks?
>
Nodes must be disk full. Type and size (8GB is probably minimal given
the size of Linux these
days). You can put as many disks as you want on your frontend and have
it double as
an NFS server for your cluster (default). You can build other NFS
servers easily (and manage
them as easily as you do a compute node).

>
>Can someone point me to a clustering software which supports all above
>features if Rocks does'nt support them.
>
Sorry. Doesn't exist. Pick the things that you can live without today
(but would
want to add tomorrow).

-P

>
>thanks a lot
>
>Regards,
>
>Puru
>
>
>
>
>
>
--
==   Philip Papadopoulos, Ph.D.
==   Program Director for                  San Diego Supercomputer Center
==      Grid and Cluster Computing         9500 Gilman Drive
==   Ph: (858) 822-3628                    University of California, San Diego
==   FAX: (858) 822-5407                   La Jolla, CA 92093-0505




From mjk at sdsc.edu Tue Dec 16 13:38:59 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 16 Dec 2003 13:38:59 -0800
Subject: [Rocks-Discuss]hardware-setup for the Rocks cluster
In-Reply-To: <BAY1-DAV575EPSM0omP0000cb94@hotmail.com>
References: <200312162016.hBGKGuJ05160@postal.sdsc.edu> <BAY1-
DAV575EPSM0omP0000cb94@hotmail.com>
Message-ID: <421F6254-3010-11D8-A2DC-000A95DA5638@sdsc.edu>

On Dec 16, 2003, at 12:34 PM, Purushotham Komaravolu wrote:

>   Hi All,
>          We are trying to setup rocks cluster with 1 front and 20
>   computing
>   nodes.
>   Frontend:
>    1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache
>     2) Dual port Gigabit Ethernet
>     3) 1 GB DDR RAM
>      4) 3* 200 GB EIDE ULTRA ATA 100
>
>   Compute nodes:
>        1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache
>     2) Dual port Gigabit Ethernet
>     3) 1 GB DDR RAM
>      4) 41 GB UDMA EIDE
>   1 HP Procurve 24 port switch
>
>
>   Does the setup look ok?

Sounds good, if you have device driver issues just wait until next week
when 3.1 comes out, this will have a new kernel and more supported
hardware.

> Does Rocks support the following features
> Remote power monitoring for individual nodes

Ethernet addressable power strips can be used for this.

> *Temperature monitoring of individual processors

No, although a ganglia module can be created to do this. The problem
is there isn't a common standard out there for *all* hardware right
now.

> *Power sequencing on startup to prevent possible power spiking
Ethernet addressable power strips can be used for this.

> *Remote power-down and reset of system and nodes

Yes (using sw). For hw control you would need a remote management
board in every node, or ethernet addressable power stips.

> *Serial access to nodes

No, Rocks using ssh and ethernet for this.   But you can add your own
serial port concentrator if you need.

> *Disk cloning

Nope, this doesn't scale in both system and people time. Rocks uses
RedHat's Kickstart to build the disk image on each node in a cluster
programmatically. This is extremely fast -- in fact a 128 node cluster
can be built from scratch (including hardware integration) in under 2
hours, and the entire cluster can be reinstalled in around 12 minutes.
We did this as a demonstration of Rock's scalability at SC'03 (we even
have a movie of it).

> *Plug-In Extensible Architecture

Yes. You can add to the cluster database and extend our utilities.
Everything is open.

> *Image Manager

Rocks does not do system imaging. We have a utility called rocks-dist
that builds distributions for you. This combined with the XML profile
graph gives you what you want here.

> How should be the disk setup, does all the disks need to be attached to
> frontend and compute nodes have small 3 or 4 GB disks?

Buy the smallest modern HD you can for the compute node (4 GB is fine).
   By default the frontend serves user directories over NFS so you
should have more storage on the frontend node.


     -mjk



From landman at scalableinformatics.com Tue Dec 16 13:43:51 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 16 Dec 2003 16:43:51 -0500
Subject: [Rocks-Discuss]Java Rocks cluster
In-Reply-To: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com>
References: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com>
Message-ID: <1071611031.9903.77.camel@squash.scalableinformatics.com>

Hi Puru:

  Java threads are shared memory objects at this moment. You would need
to look at thread-migration schemes to layer atop the process, and a
distributed shared memory model to handle memory issues. I don't think
Java natively supports this, so you will likely have to appeal to some
other method.

  Moreover, shared memory across slower cluster network fabrics is
painful at best. If you are going to work on a single system image
machine with shared memory, you want the fastest/best fabric you can
get.

  If it is easier to re-architect your code to be independent worker
processes, you could write it using JVMs and simple sockets or similar.
If it is threaded, you may have problems parallelizing it on a cluster.

Joe

On Tue, 2003-12-16 at 15:39, Purushotham Komaravolu wrote:
> I am a newbie to ROCKS
> I have a question about running Java on a Rockster.
> Is it possible that I can start only one JVM on one machine and the
> task be run distributed on the cluster? It is a multi-threaded
> application.
> Like say, I have an application with 100 threads. Can I have 50
> threads run on one machine and 50 on another by launching the
> application(jvm) on one machine?(similar to SUN Firebird) I dont want
> to use MPI or any special code.
> Thanks
> Sincerely
> Puru
--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615



From rscarce at caci.com Tue Dec 16 10:56:18 2003
From: rscarce at caci.com (Reed Scarce)
Date: Tue, 16 Dec 2003 13:56:18 -0500
Subject: [Rocks-Discuss]grub / boot / fdisk problem
Message-ID: <OF2C6AD168.EB3D778E-ON85256DFE.0067CF1C-85256DFE.006812B4@caci.com>

I installed Rocks on a primary master hard drive.
It became necessary to re-install I took an
identical hd and made it primary master. The first drive, which boots
fine, was left off the system to act as an archive, to mount after the
new system was up and running.
The new system was installed and works great, now to correctly install
the old drive as primary slave, reboot, mount and copy the scripts and
configs to the new system!
There the problem began.
When I boot either drive as primary master and only primary drive,
they boot fine.
When I connect either drive correctly configured and recognized by the
BIOS, as primary or secondary slave - grub gives a GRUB prompt and
won't boot.
Something interesting, when booted from a floppy (mkbootdisk)from the
new disk, in /var/log/dmesg both drives are visible but fdisk reports
the partition table is empty - so I can't mount the drive from a
floppy boot.
dmesg is like this: (my comments)
hda: ST34321A, ... (pri master)
hdb: ST34321A, ... (pri slave)
hdc: FX4010M, ATAPI CD/DVD-ROM drive (secnd master)
hdd: ST320420A, ... (secnd slave)
ide0 at ... (ide pri chain)
ide1 at ... (ide secnd chain)
hda: 8404830 sectors ... (good)
hdb: 8404830 sectors ... (good)
hdd: 39851760 sectors ... (good)
ide-floppy driver ... (ok)
Partition check: (<---<<<this is where it gets interesting)
hda:
hdb:
hdd: hdd1 hdd2 hdd3 (<---<<<that's right, hdd is now the boot drive.
Even if I boot without the floppy, hdd is the boot drive.)

Any suggestons?




Reed Scarce
Systems Engineer
CACI, Inc.
1100 N. Glebe Rd
Arlington, VA 22201
(703) 841-3045
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031216/498124c7/attachment-0001.html

From ShiYi.Yue at astrazeneca.com Tue Dec 16 14:05:46 2003
From: ShiYi.Yue at astrazeneca.com (ShiYi.Yue at astrazeneca.com)
Date: Tue, 16 Dec 2003 23:05:46 +0100
Subject: [Rocks-Discuss]hardware compability check wirh Rocks 3.00
Message-ID: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net>

hi,

I was wondering if there is a way to set a hardware compability check in the
kickstart of Rocks, and give us an oppotunity to add the drvers once the
uncompatible hardware was detected.

I have some PCs with Broadcom Gbit 10/100/1000 network cards, It looks Rocks
3.0 was not happy with these network cards. The only way I can do now
(without rebuild the distribution) is to replace these cards. I am afraid
this type of situation will happen again and again since RH7.3 is getting
older and older.
I hope I were wrong and someone can point me a solution.
Shi-Yi
shiyi.yue at astrazeneca.com



From mjk at sdsc.edu Tue Dec 16 14:55:38 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 16 Dec 2003 14:55:38 -0800
Subject: [Rocks-Discuss]hardware compability check wirh Rocks 3.00
In-Reply-To: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net>
References: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net>
Message-ID: <F7910D2D-301A-11D8-A2DC-000A95DA5638@sdsc.edu>

We've been thinking about this off and on for over a year -- it's a
pretty hard problem. The real trick to supporting all hardware is
keeping the boot kernel current. We've let our releases get old and
more and more people are seeing hardware support issues.

Rocks 3.1 (out next week) will include the latest RedHat kernel from
RHEL 3.0. This will fix most of the hardware support issues out there.
   When we release please download 3.1 and try it with you hardware, if
this still fails please let us know. Thanks.

          -mjk


On Dec 16, 2003, at 2:05 PM, ShiYi.Yue at astrazeneca.com wrote:

>   hi,
>
>   I was wondering if there is a way to set a hardware compability check
>   in the
>   kickstart of Rocks, and give us an oppotunity to add the drvers once
>   the
>   uncompatible hardware was detected.
>
>   I have some PCs with Broadcom Gbit 10/100/1000 network cards, It looks
>   Rocks
>   3.0 was not happy with these network cards. The only way I can do now
>   (without rebuild the distribution) is to replace these cards. I am
>   afraid
>   this type of situation will happen again and again since RH7.3 is
>   getting
>   older and older.
>   I hope I were wrong and someone can point me a solution.
>   Shi-Yi
>   shiyi.yue at astrazeneca.com



From msherman at informaticscenter.info Tue Dec 16 16:25:45 2003
From: msherman at informaticscenter.info (Mark Sherman)
Date: Tue, 16 Dec 2003 17:25:45 -0700
Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro
Message-ID: <20031217002545.17912.qmail@webmail-2-2.mesa1.secureserver.net>

Hello,
I'm an administrator on a pure i386 cluster under Rocks 3.0.0, and our clients are
pushing us to include some Opteron nodes. I'm trying to find out the feasibility of
such an addition. I know there's been a lot of talk about Opterons on the rocks
list, so I'm wondering if someone can give a boiled-down can-do can't-do maybe-but-
we-haven't-tested-it-yet kind of status.
With that, I'd say I'm probaly willing to be a pseudo-beta site and give feedback
on how the system works.
Thank you very much, and keep up the good work. I love the Rocks system.
~M
______________________________________________
Mark Sherman
Computing Systems Administrator
Informatics Center
Massachusetts Biomedical Initiatives
Worcester MA 01605
508-797-4200
msherman at informaticscenter.info
----------------------~-----------------------


>   -------- Original Message --------
>   Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro
>   From: "Jian Chang" <jian at appro.com>
>   Date: Fri, December 12, 2003 6:27 pm
>   To: "Bryan Littlefield" <bryan at UCLAlumni.net>,
>   npaci-rocks-discussion at sdsc.edu, mjk at sdsc.edu
>
>   Hello Mason / Puru,
>
>   I got your contact information from Bryan Littlefield.
>   I would like to discuss with you regarding benchmark test systems you
>   might need down the road.
>   We can also share with you our findings as to what is compatible in the
>   Opteron systems.
>   Please reply with your phone number where I can reach you, and I will
>   call promptly.
>
>   Bryan,
>
>   Thank you for the referral.
>
>   Best regards,
>
>   Jian Chang
>   Regional Sales Manager
>   (408) 941-8100 x 202
>   (800) 927-5464 x 202
>   (408) 941-8111 Fax
>   jian at appro.com
>   www.appro.com
>
>   -----Original Message-----
>   From: Bryan Littlefield [mailto:bryan at UCLAlumni.net]
>   Sent: Tuesday, December 09, 2003 12:14 PM
>   To: npaci-rocks-discussion at sdsc.edu; mjk at sdsc.edu
>   Cc: Jian Chang
>   Subject: Rocks-Discuss] AMD Opteron - Contact Appro
>
>   Hi Mason,
>
>   I suggest contacting Appro. We are using Rocks on our Opteron cluster
>   and Appro would likely love to help. I will contact them as well to see
>   if they could help getting a opteron machine for testing. Contact info
>   below:
>
>   Thanks --Bryan
>
>   Jian Chang - Regional Sales Manager
>   (408) 941-8100 x 202
>   (800) 927-5464 x 202
>   (408) 941-8111 Fax
>   jian at appro.com
>   http://www.appro.com
>
>   npaci-rocks-discussion-request at sdsc.edu wrote:
>
>
>   From: "Mason J. Katz"   <mailto:mjk at sdsc.edu> <mjk at sdsc.edu>
>   Subject: Re: [Rocks-Discuss]AMD Opteron
>   Date: Tue, 9 Dec 2003 07:28:51 -0800
>   To: "purushotham komaravolu"   <mailto:purikk at hotmail.com>
>   <purikk at hotmail.com>
>
>   We have a beta right now that we have sent to a few people.   We plan on
>
>   a release this month, and AMD_64 will be part of this release along
>   with the usual x86, IA64 support.
>
>   If you want to help accelerate this process please talk to your vendor
>
>   about loaning/giving us some hardware for testing.   Having access to a
>
>   variety of Opteron hardware (we own two boxes) is the only way we can
>   have good support for this chip.
>
>      -mjk
>
>
>   On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote:
>
>
>   Cc: <mailto:npaci-rocks-discussion at sdsc.edu>
>   <npaci-rocks-discussion at sdsc.edu>
>
>
>   Hello,
>               I am a newbie to ROCKS cluster. I wanted to setup clusters
>
>   on
>   32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel
>   and
>   AMD).
>   I found the 64-bit download for Intel on the website but not for AMD.
>   Does
>   it work for AMD opteron? if not what is the ETA for AMD-64.
>   We are planning to but AMD-64 bit machines shortly, and I would like
>   to
>   volunteer for the beta testing if needed.
>   Thanks
>   Regards,
>   Puru
>
>
>   _______________________________________________
>   npaci-rocks-discussion mailing list
>   npaci-rocks-discussion at sdsc.edu
>   http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>
>
> End of npaci-rocks-discussion Digest


From fds at sdsc.edu Tue Dec 16 18:04:47 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Tue, 16 Dec 2003 18:04:47 -0800
Subject: [Rocks-Discuss]a minor ganglia question
In-Reply-To: <20031216194554.GH26246@uiuc.edu>
References: <20031216194554.GH26246@uiuc.edu>
Message-ID: <63C818CD-3035-11D8-8652-000393A4725A@sdsc.edu>

Dan,

Good question. Unfortunately this behavior is hardwired into stock
Ganglia, not the Rocks-specific pages that we have more control over.

The good news is that I wrote the code for this page :) Its easy to fix
if you would like to do it yourself.

Edit the file /var/www/html/ganglia/functions.php. On line 386, you
should see:

            krsort($racks[$rack]);

To get the ordering you desire, change this to:

            ksort($racks[$rack]);

Thats it. You should see the high-numbered compute nodes at the bottom
of the rack. I will see if we can get a config file button on the page
to give this option for a later release of Ganglia.

-Federico

On Dec 16, 2003, at 11:45 AM, Dan Wright wrote:

>   Hello all,
>
>   I'm in the process of setting up a 3.0.0 cluster and have a question
>   about the
>   "Physical view" in ganglia. In this view (which is quite cool BTW :)
>   is shows
>   higher-numbered nodes on top and lower-numbered nodes on bottom:
>
>   compute-0-12
>   ...
>   compute-0-2
>   compute-0-1
>   compute-0-0
>
>   and my cluster is physically reversed from that:
>
>   compute-0-0
>   compute-0-1
>   compute-0-2
>   ...
>   compute-0-12
>
> Is there an easy way to switch this display around so it matches the
> real
> physical layout? I poked around and ganglia for a few minutes and
> didn't see
> anything obvious, so I thought I'd ask before I actually start wasting
> time on
> this :)
>
> Thanks,
>
> - Dan Wright
> (dtwright at uiuc.edu)
> (http://www.scs.uiuc.edu/)
> (UNIX Systems Administrator, School of Chemical Sciences, UIUC)
> (333-1728)
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From csamuel at vpac.org Tue Dec 16 18:49:22 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 17 Dec 2003 13:49:22 +1100
Subject: [Rocks-Discuss]a minor ganglia question
In-Reply-To: <20031216194554.GH26246@uiuc.edu>
References: <20031216194554.GH26246@uiuc.edu>
Message-ID: <200312171349.24485.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 17 Dec 2003 06:45 am, Dan Wright wrote:

> Is there an easy way to switch this display around so it matches the real
> physical layout?

I think this is why they tell you to install the compute nodes from the bottom
of the rack. :-)

cheers,
Chris
- --
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/38QyO2KABBYQAh8RAo+vAJ0XcP6tBJpwjxYnicEQkysRslWmmQCcDpeb
K8bNCLgiF5umMiJ/59ICN70=
=57YJ
-----END PGP SIGNATURE-----
From hermanns at tupi.dmt.upm.es Wed Dec 17 00:08:19 2003
From: hermanns at tupi.dmt.upm.es (Miguel Hermanns)
Date: Wed, 17 Dec 2003 09:08:19 +0100
Subject: [Rocks-Discuss]Creation of a hardware compatibility list?
Message-ID: <3FE00EF3.4020809@tupi.dmt.upm.es>

Since one of the strong features of Rocks is the posibility of fast
deployment of clusters, wouldn't it be of interest to create a hardware
compatibility list on the web page of Rocks? This list could be filled
in by the users of Rocks with their experience and the hardware they
have. In this way somebody interested in building a cluster as fast as
possible could check the list and buy something absolutely 100%
compatible with Rocks.

I know that in principle one could check the compatibility list of RH,
but my own experience was negative in that aspect (I installed an
Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was
unable to recognize it).

Miguel




From mjk at sdsc.edu Wed Dec 17 09:03:00 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 17 Dec 2003 09:03:00 -0800
Subject: [Rocks-Discuss]Creation of a hardware compatibility list?
In-Reply-To: <3FE00EF3.4020809@tupi.dmt.upm.es>
References: <3FE00EF3.4020809@tupi.dmt.upm.es>
Message-ID: <DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu>

We have thought about this, and have some ideas on how to setup a
useful page. Something like the old Linux laptop hardware list but
simpler to mine for data. It's been on our long list of things to do
for a while now :)

       -mjk

On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote:

>   Since one of the strong features of Rocks is the posibility of fast
>   deployment of clusters, wouldn't it be of interest to create a
>   hardware compatibility list on the web page of Rocks? This list could
>   be filled in by the users of Rocks with their experience and the
>   hardware they have. In this way somebody interested in building a
>   cluster as fast as possible could check the list and buy something
>   absolutely 100% compatible with Rocks.
>
>   I know that in principle one could check the compatibility list of RH,
>   but my own experience was negative in that aspect (I installed an
>   Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was
>   unable to recognize it).
>
>   Miguel
>
From junkscarce at hotmail.com Wed Dec 17 09:31:21 2003
From: junkscarce at hotmail.com (Reed Scarce)
Date: Wed, 17 Dec 2003 17:31:21 +0000
Subject: [Rocks-Discuss]fidsk reports all zeros, need actual
Message-ID: <BAY1-F978XKPl5GDrPi0003db4e@hotmail.com>

Good ol' fdisk "print" on my compute node give me a line:
Device Boot Start End Blocks Id System

but no data.

Extra Functionality's "print" reports
Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID
1 00 0      0    0   0   0    0    0      0   0
2 00 0      0    0   0   0    0    0      0   0
3 00 0      0    0   0   0    0    0      0   0
4 00 0      0    0   0   0    0    0      0   0

How can I retrieve the information necessary for scripted information at
node installation time?

TIA
--RRS

_________________________________________________________________
Enjoy the holiday season with great tips from MSN.
http://special.msn.com/network/happyholidays.armx



From dtwright at uiuc.edu Wed Dec 17 11:49:53 2003
From: dtwright at uiuc.edu (Dan Wright)
Date: Wed, 17 Dec 2003 13:49:53 -0600
Subject: [Rocks-Discuss]a minor ganglia question
In-Reply-To: <200312171349.24485.csamuel@vpac.org>
References: <20031216194554.GH26246@uiuc.edu> <200312171349.24485.csamuel@vpac.org>
Message-ID: <20031217194953.GS26246@uiuc.edu>

Eh...whatever ;-) I started using rocks with 2.2.1 (when there was no
physical layout display) and haven't read the manual again since :)

Chris Samuel said:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, 17 Dec 2003 06:45 am, Dan Wright wrote:
>
> > Is there an easy way to switch this display around so it matches the real
> > physical layout?
>
> I think this is why they tell you to install the compute nodes from the bottom
> of the rack. :-)
>
> cheers,
> Chris
> - --
> Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
> Victorian Partnership for Advanced Computing http://www.vpac.org/
> Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
>
> iD8DBQE/38QyO2KABBYQAh8RAo+vAJ0XcP6tBJpwjxYnicEQkysRslWmmQCcDpeb
> K8bNCLgiF5umMiJ/59ICN70=
> =57YJ
> -----END PGP SIGNATURE-----
>
- Dan Wright
(dtwright at uiuc.edu)
(http://www.uiuc.edu/~dtwright)

-] ------------------------------ [-] -------------------------------- [-
``Weave a circle round him thrice, / And close your eyes with holy dread,
  For he on honeydew hath fed, / and drunk the milk of Paradise.''
       Samuel Taylor Coleridge, Kubla Khan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031217/
a3718aef/attachment-0001.bin

From dtwright at uiuc.edu Wed Dec 17 11:51:00 2003
From: dtwright at uiuc.edu (Dan Wright)
Date: Wed, 17 Dec 2003 13:51:00 -0600
Subject: [Rocks-Discuss]a minor ganglia question
In-Reply-To: <63C818CD-3035-11D8-8652-000393A4725A@sdsc.edu>
References: <20031216194554.GH26246@uiuc.edu>
<63C818CD-3035-11D8-8652-000393A4725A@sdsc.edu>
Message-ID: <20031217195100.GT26246@uiuc.edu>

Federico,

Thanks! That'll make this easy enough... maybe next time I'll read the
manual and install the machines in the rocks-recommended order as another
poster suggested :)

Federico Sacerdoti said:
> Dan,
>
> Good question. Unfortunately this behavior is hardwired into stock
> Ganglia, not the Rocks-specific pages that we have more control over.
>
> The good news is that I wrote the code for this page :) Its easy to fix
> if you would like to do it yourself.
>
> Edit the file /var/www/html/ganglia/functions.php. On line 386, you
> should see:
>
>          krsort($racks[$rack]);
>
> To get the ordering you desire, change this to:
>
>          ksort($racks[$rack]);
>
> Thats it. You should see the high-numbered compute nodes at the bottom
> of the rack. I will see if we can get a config file button on the page
> to give this option for a later release of Ganglia.
>
> -Federico
>
> On Dec 16, 2003, at 11:45 AM, Dan Wright wrote:
>
> >Hello all,
> >
> >I'm in the process of setting up a 3.0.0 cluster and have a question
> >about the
> >"Physical view" in ganglia. In this view (which is quite cool BTW :)
> >is shows
> >higher-numbered nodes on top and lower-numbered nodes on bottom:
> >
> >compute-0-12
> >...
> >compute-0-2
> >compute-0-1
> >compute-0-0
> >
> >and my cluster is physically reversed from that:
> >
> >compute-0-0
> >compute-0-1
> >compute-0-2
> >...
> >compute-0-12
> >
> >Is there an easy way to switch this display around so it matches the
> >real
> >physical layout? I poked around and ganglia for a few minutes and
> >didn't see
> >anything obvious, so I thought I'd ask before I actually start wasting
> >time on
> >this :)
> >
> >Thanks,
> >
> >- Dan Wright
> >(dtwright at uiuc.edu)
> >(http://www.scs.uiuc.edu/)
> >(UNIX Systems Administrator, School of Chemical Sciences, UIUC)
> >(333-1728)
> >
> Federico
>
> Rocks Cluster Group, San Diego Supercomputing Center, CA
>
- Dan Wright
(dtwright at uiuc.edu)
(http://www.uiuc.edu/~dtwright)

-] ------------------------------ [-] -------------------------------- [-
``Weave a circle round him thrice, / And close your eyes with holy dread,
  For he on honeydew hath fed, / and drunk the milk of Paradise.''
       Samuel Taylor Coleridge, Kubla Khan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-
discussion/attachments/20031217/620937b3/attachment-0001.bin

From bruno at rocksclusters.org Wed Dec 17 12:52:30 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 17 Dec 2003 12:52:30 -0800
Subject: [Rocks-Discuss]fidsk reports all zeros, need actual
In-Reply-To: <BAY1-F978XKPl5GDrPi0003db4e@hotmail.com>
References: <BAY1-F978XKPl5GDrPi0003db4e@hotmail.com>
Message-ID: <EDF0DAE8-30D2-11D8-B821-000A95C4E3B4@rocksclusters.org>

>   Good ol' fdisk "print" on my compute node give me a line:
>   Device Boot Start End Blocks Id System
>
>   but no data.
>
>   Extra Functionality's "print" reports
>   Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID
>   1 00 0      0    0   0   0    0    0      0   0
>   2 00 0      0    0   0   0    0    0      0   0
>   3 00 0      0    0   0   0    0    0      0   0
>   4 00 0      0    0   0   0    0    0      0   0
>
>   How can I retrieve the information necessary for scripted information
>   at node installation time?

this should answer your question:

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-February/
001388.html

    - gb



From anand at novaglobal.com.sg Wed Dec 17 20:14:45 2003
From: anand at novaglobal.com.sg (Anand Vaidya)
Date: Wed, 17 Dec 2003 23:14:45 -0500
Subject: [Rocks-Discuss]Creation of a hardware compatibility list?
In-Reply-To: <DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu>
References: <3FE00EF3.4020809@tupi.dmt.upm.es>
<DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu>
Message-ID: <200312172314.48434.anand@novaglobal.com.sg>

Why not create a Wiki? Wiki is easy enough to install (60seconds?) and just
the right tool for user-driven projects like Rocks.

Nice example of wiki wiki webs are http://en.wikipedia.org/ or even my
favourite GentooServer project has a very nice wiki at http://
www.subverted.net/wakka/wakka.php?wakka=MainPage (Though not related to
clustering)

Regards,
Anand
On Wednesday 17 December 2003 12:03, Mason J. Katz wrote:
> We have thought about this, and have some ideas on how to setup a
> useful page. Something like the old Linux laptop hardware list but
> simpler to mine for data. It's been on our long list of things to do
> for a while now :)
>
>     -mjk
>
> On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote:
> > Since one of the strong features of Rocks is the posibility of fast
> > deployment of clusters, wouldn't it be of interest to create a
> > hardware compatibility list on the web page of Rocks? This list could
> > be filled in by the users of Rocks with their experience and the
> > hardware they have. In this way somebody interested in building a
> > cluster as fast as possible could check the list and buy something
> > absolutely 100% compatible with Rocks.
> >
> > I know that in principle one could check the compatibility list of RH,
> > but my own experience was negative in that aspect (I installed an
> > Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was
> > unable to recognize it).
> >
> > Miguel

-



From mjk at sdsc.edu Thu Dec 18 08:02:14 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Thu, 18 Dec 2003 08:02:14 -0800
Subject: [Rocks-Discuss]Creation of a hardware compatibility list?
In-Reply-To: <200312172314.48434.anand@novaglobal.com.sg>
References: <3FE00EF3.4020809@tupi.dmt.upm.es>
<DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu>
<200312172314.48434.anand@novaglobal.com.sg>
Message-ID: <8BA1598E-3173-11D8-9543-000A95DA5638@sdsc.edu>

I've been thinking about a rocks wiki for a few months now, but I'm a
bit paranoid about the lack of authentication for updates (basically
anyone can modify your site).

If there is interest out there, we could just set one up, leave it
alone, and let our users worry about the content. Done well this could
have information on:

     -   hardware issues
     -   bugs reports
     -   feature requests
     -   contributed documentation (to be moved into our users manual)
     -   etc

Basically a simple version of sourceforge (we have no plans to move to
sourceforge -- the interface and bandwidth both stink). Ideas....?

     -mjk

On Dec 17, 2003, at 8:14 PM, Anand Vaidya wrote:
> Why not create a Wiki? Wiki is easy enough to install (60seconds?) and
> just
> the right tool for user-driven projects like Rocks.
>
> Nice example of wiki wiki webs are http://en.wikipedia.org/ or even my
> favourite GentooServer project has a very nice wiki at http://
> www.subverted.net/wakka/wakka.php?wakka=MainPage (Though not related to
> clustering)
>
> Regards,
> Anand
>
> On Wednesday 17 December 2003 12:03, Mason J. Katz wrote:
>> We have thought about this, and have some ideas on how to setup a
>> useful page. Something like the old Linux laptop hardware list but
>> simpler to mine for data. It's been on our long list of things to do
>> for a while now :)
>>
>>    -mjk
>>
>> On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote:
>>> Since one of the strong features of Rocks is the posibility of fast
>>> deployment of clusters, wouldn't it be of interest to create a
>>> hardware compatibility list on the web page of Rocks? This list could
>>> be filled in by the users of Rocks with their experience and the
>>> hardware they have. In this way somebody interested in building a
>>> cluster as fast as possible could check the list and buy something
>>> absolutely 100% compatible with Rocks.
>>>
>>> I know that in principle one could check the compatibility list of
>>> RH,
>>> but my own experience was negative in that aspect (I installed an
>>> Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was
>>> unable to recognize it).
>>>
>>> Miguel
>
> -



From hermanns at tupi.dmt.upm.es Fri Dec 19 00:47:11 2003
From: hermanns at tupi.dmt.upm.es (Miguel Hermanns)
Date: Fri, 19 Dec 2003 09:47:11 +0100
Subject: [Rocks-Discuss]Creation of a hardware compatibility list?
Message-ID: <3FE2BB0F.4060908@tupi.dmt.upm.es>

//>>I've been thinking about a rocks wiki for a few months now, but I'm a
bit paranoid about the lack of authentication for updates (basically
anyone can modify your site).<<

One possible filter could be that only the users of the registered
clusters can modify the wiki (So that when you summit the data of the
cluster you also include a user and a password), although in that case I
would be excluded, since our cluster has been unable to work with Rocks
yet :-(.

>> - hardware issues
>>   -   bugs reports
 >>   -   feature requests
 >>   -   contributed documentation (to be moved into our users manual)
 >>   -   etc

So for example the cluster register could be editable by the registered
users (each one only its entry) and could include a description of the
installed hardware (not just the processor, but also the motherboard
model, the hard disks, NICs, etc). So everybody interested in building a
cluster could go to the register, have a look and click on the different
clusters that are similar to the one in mind. After that with just a
click the user could review the hardware configuration and the
encountered problems.

This would also be greate if the Rocks clusters get updated, because
then their builders could go and update their entry without needing to
summit an email to the Rocks team, hence avoinding giving them extra work.

In order to include the not yet working Rocks clusters, the database of
clusters (with the corresponding users and passwords) could be extended
by them, but their entries would not be shown on the Rocks register
until they are fully working. In this way information on the hardware
incompatibilities can be collected and could be shown on a different
part of www.rocksclusters.org.

The feature requests would still be handled through the maillist and for
the contributed documentation I would place the sourcefiles in readonly
mode on the ftp server and if somebody goes and makes modifications on
them, then the new version should be emailed to the persons in charge of
the docs to give their approval.

Miguel



From jkreuzig at uci.edu Fri Dec 19 16:58:58 2003
From: jkreuzig at uci.edu (James Kreuziger)
Date: Fri, 19 Dec 2003 16:58:58 -0800 (PST)
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <1062015636.6781.100.camel@babylon.physics.ncsu.edu>
References: <1062015636.6781.100.camel@babylon.physics.ncsu.edu>
Message-ID: <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu>

Ok, I need some help here. I've managed to setup
my frontend node, and it is up and running. I have
my 8 nodes all connected up to a Dell Power Connect 5224.
I can access the switch through a serial terminal and
get a command line interface. The little lights on the
front of the switch are blinking, so that's good.

However, I can't get the switch recognized by insert-ethers.
I've even managed to change the IP of the switch through
the CLI, but I can't see the switch from the frontend node.
I can't telnet, get the web interface or anything. I haven't
saved the configuration, so a reboot of the switch will
reset the values.

I'm grasping at straws here. I'm not a network engineer,
so I could use some help getting this thing configured.
If anybody can help me out, contact me by email.

Thanks,

-Jim

*************************************************
Jim Kreuziger
jkreuzig at uci.edu
949-824-4474
*************************************************




From tim.carlson at pnl.gov Fri Dec 19 17:24:22 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Fri, 19 Dec 2003 17:24:22 -0800 (PST)
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu>
Message-ID: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>

On Fri, 19 Dec 2003, James Kreuziger wrote:

I think we need a Rocks FAQ

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-August/002762.html

You need to turn on fast-link.

>   Ok, I need some help here. I've managed to setup
>   my frontend node, and it is up and running. I have
>   my 8 nodes all connected up to a Dell Power Connect 5224.
>   I can access the switch through a serial terminal and
>   get a command line interface. The little lights on the
>   front of the switch are blinking, so that's good.
>
>   However, I can't get the switch recognized by insert-ethers.
>   I've even managed to change the IP of the switch through
>   the CLI, but I can't see the switch from the frontend node.
>   I can't telnet, get the web interface or anything. I haven't
>   saved the configuration, so a reboot of the switch will
>   reset the values.
>
>   I'm grasping at straws here. I'm not a network engineer,
>   so I could use some help getting this thing configured.
>
>   If anybody can help me out, contact me by email.
>
>   Thanks,
>
>   -Jim
>
>   *************************************************
>   Jim Kreuziger
>   jkreuzig at uci.edu
>   949-824-4474
>   *************************************************
>
>
>

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From Georgi.Kostov at umich.edu Fri Dec 19 17:34:15 2003
From: Georgi.Kostov at umich.edu (Georgi Kostov)
Date: Fri, 19 Dec 2003 20:34:15 -0500
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu>
References: <1062015636.6781.100.camel@babylon.physics.ncsu.edu>
<Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu>
Message-ID: <1071884055.3fe3a717b3efc@carrierpigeon.mail.umich.edu>

Jim,

I have a 5224 here. What are your config settings on the switch? I.e. IP,
sub-net mask, gateway settings - for both the switch and the interface of the
head-node on which the 5224 is connected (I assume it's on the private subnet,
so the subnet is something like 10.0.0.0/255.0.0.0 with the frontend internal
interface (eth0) as 10.0.1.1, right?)

One thing to try on the head node is use (as root) "tcpdump eth0", and watch for
packets. To avoid clutter, I would either turn the rest (compute nodes, etc.)
off, or filter them out with settings on tcpdump.

With some more info we should be able to tease this out.

--Georgi

Michigan Center for Biological Information (MCBI)
University of Michigan
3600 Green Court, Suite 700
Ann Arbor, MI 48105-1570
Phone/Fax: (734) 998-9236/8571
kostov at umich.edu
www.ctaalliance.org



Quoting James Kreuziger <jkreuzig at uci.edu>:

>   Ok, I need some help here. I've managed to setup
>   my frontend node, and it is up and running. I have
>   my 8 nodes all connected up to a Dell Power Connect 5224.
>   I can access the switch through a serial terminal and
>   get a command line interface. The little lights on the
>   front of the switch are blinking, so that's good.
>
>   However, I can't get the switch recognized by insert-ethers.
>   I've even managed to change the IP of the switch through
>   the CLI, but I can't see the switch from the frontend node.
>   I can't telnet, get the web interface or anything. I haven't
>   saved the configuration, so a reboot of the switch will
>   reset the values.
>
>   I'm grasping at straws here. I'm not a network engineer,
>   so I could use some help getting this thing configured.
>
>   If anybody can help me out, contact me by email.
>
>   Thanks,
>
>   -Jim
>
>   *************************************************
>   Jim Kreuziger
>   jkreuzig at uci.edu
>   949-824-4474
>   *************************************************
>
>
>


From daniel.kidger at quadrics.com Mon Dec 22 01:45:47 2003
From: daniel.kidger at quadrics.com (Dan Kidger)
Date: Mon, 22 Dec 2003 09:45:47 +0000
Subject: Fwd: Re: [Rocks-Discuss]Dell Power Connect 5224
Message-ID: <200312220945.47665.daniel.kidger@quadrics.com>


----------    Forwarded Message   ----------

Subject: Re: [Rocks-Discuss]Dell Power Connect 5224
Date: Mon, 22 Dec 2003 09:38:41 +0000
From: Dan Kidger <daniel.kidger at quadrics.com>
To: Georgi Kostov <Georgi.Kostov at umich.edu>
Cc: paci-rocks-discussion at sdsc.edu

>   Quoting James Kreuziger <jkreuzig at uci.edu>:
>   > Ok, I need some help here. I've managed to setup
>   > my frontend node, and it is up and running. I have
>   > my 8 nodes all connected up to a Dell Power Connect 5224.
>   > I can access the switch through a serial terminal and
>   > get a command line interface. The little lights on the
>   > front of the switch are blinking, so that's good.
>   >
>   > However, I can't get the switch recognized by insert-ethers.
>   > I've even managed to change the IP of the switch through
>   > the CLI, but I can't see the switch from the frontend node.
>   > I can't telnet, get the web interface or anything. I haven't
>   > saved the configuration, so a reboot of the switch will
>   > reset the values.

I don't know much about the 5224 per se. but I do know that much of the time
emebedded devices *have* to be rebooted to pick up new settings for their IP.

once done - I would try pinging the switch's IP abnd then doing 'arp -a' to
 see its MAC address (which should match that on the white sticky label on
 the back)
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

-------------------------------------------------------

--
Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------




From daniel.kidger at quadrics.com Mon Dec 22 09:03:56 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 22 Dec 2003 17:03:56 -0000
Subject: [Rocks-Discuss]RE:Writing a Roll ?
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com>

Folks,
   I have made good headway in adding software and its configuration using extend-
compute.xml and now have a robust system. (the head node install is still rather
manual though :-( )

I would now like to move to doing this as a Roll. However I am not sureof the best
way of proceeding - there appears to be little documentation - either on HOWTO or
on the underlying concepts.

I have mounted the HPC_roll.iso and    browsed around:
 - the image seems to consists of 2    subdirectories - in the same style as RedHat
CD's
 - as expected ./SRPMS contains the    source RPMs, and ./RedHat/RPMS contains binary
RPMs
     ( the latter contains many more   RPMs than there is an SRPM for. )

There is no obvious configuration information until you explore:
  roll-hpc-kickstart-3.0.0-0.noarch.rpm
This seems to contain lots of XML which at first glance is hard to decifer.

So my question is:
   Should we be writing our own rolls, and if so how ? (examples?)


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

>


From daniel.kidger at quadrics.com Mon Dec 22 09:08:21 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 22 Dec 2003 17:08:21 -0000
Subject: [Rocks-Discuss]shucks.
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622461C9@tardis0.quadrics.com>

# rpm -ql roll-hpc-kickstart |xargs -l grep -inH sucks

/export/home/install/profiles/current/nodes/force-smp.xml:21: IBM sucks
/export/home/install/profiles/current/nodes/ganglia-server.xml:134: perl sucks
/export/home/install/profiles/current/nodes/ganglia-server.xml:148: Switch from
ISC to RedHat's pump. Pump sucks but it is standard so
/export/home/install/profiles/current/nodes/sendmail-masq.xml:31: m4 sucks

:-)

Have a good Christmas,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------



From fds at sdsc.edu Mon Dec 22 10:22:54 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 22 Dec 2003 10:22:54 -0800
Subject: [Rocks-Discuss]RE:Writing a Roll ?
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com>
Message-ID: <DBF30128-34AB-11D8-8652-000393A4725A@sdsc.edu>

You are right, we have little documentation on creating new rolls. I
have lamented to Greg about this, and he has done the same to me.
Basically we have been so busy trying to get the 3.1.0 release out that
we haven't put our nose to the grindstone about the Developer docs.

Here is a little primer since it sounds like you are indeed ready.

1. The first thing to realize is that rolls are not build from
"scratch", but are done from the safe confines of our build
environment. This environment is the directory:

[your local rocks CVS sandbox]/src/roll/

You must checkout the Rocks CVS tree to get this. Instructions about
how to do this (anonymously) are at http://cvs/rocksclusters.org/.

Once you have this build environment on your frontend system, you are
ready for the next step to building your roll. You should make a new
directory here called "quadrics" - the name matters as it will be the
identifier for your roll from now on.
1. Now the best thing I can tell you is to look at the "hpc" and "sge"
roll (two of our most mature) for the directory structure in
"quadrics". Its fairly straightforward, and mirrors what we do for the
base. The "nodes" directory will hold your "extend-compute.xml", etc.
(more on this later). The "roll-quadrics-kickstart.noarch.rpm" is made
automatically for your from information in these directories.

2. The "src" dir holds anything you need to compile. Anything in src
should deposit an RPM package in the "RPMS" directory when its build is
finished.

3. You type "make roll" to start the build process.     It will take a bit
of study for you to get things correct, but suffice     it to say that you
will have an iso file suitable for burning when you     are done. Thank
bruno for this sweet fact - everything is automatic     except your
intellectual property :)

One more word on your XML files. Our philosophy of rolls is not to use
the "extend/replace" strategy that we advocate for customization. As a
roll builder, you are at the grass-roots level, and can rise above
simple customization techniques.

Your roll should define a "quadrics.xml" node in the kickstart graph.
You define the node in the file "roll/quadrics/nodes/quadrics.xml" and
the edges in the file "roll/quadrics/graphs/default/quadrics.xml". Look
at the SGE roll for a good example of this. By defining your
configuration this way, you have more power to do complex tasks
(different configuration for different appliance types), and to leave
room for future growth.

Good luck, and we hope and pray for a good technical writer that will
do this process justice.

-Federico

On Dec 22, 2003, at 9:03 AM, daniel.kidger at quadrics.com wrote:

>   Folks,
>      I have made good headway in adding software and its configuration
>   using extend-compute.xml and now have a robust system. (the head node
>   install is still rather manual though :-( )
>
>   I would now like to move to doing this as a Roll. However I am not
>   sureof the best way of proceeding - there appears to be little
>   documentation - either on HOWTO or on the underlying concepts.
>
>   I have mounted the HPC_roll.iso and   browsed around:
>    - the image seems to consists of 2   subdirectories - in the same style
>   as RedHat CD's
>    - as expected ./SRPMS contains the   source RPMs, and ./RedHat/RPMS
>   contains binary RPMs
>       ( the latter contains many more   RPMs than there is an SRPM for. )
>
>   There is no obvious configuration information until you explore:
>     roll-hpc-kickstart-3.0.0-0.noarch.rpm
>   This seems to contain lots of XML which at first glance is hard to
>   decifer.
>
> So my question is:
>    Should we be writing our own rolls, and if so how ? (examples?)
>
>
> Yours,
> Daniel.
>
> --------------------------------------------------------------
> Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
> ----------------------- www.quadrics.com --------------------
>
>>
>>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From mjk at sdsc.edu Mon Dec 22 11:07:32 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 22 Dec 2003 11:07:32 -0800
Subject: [Rocks-Discuss]shucks.
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622461C9@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622461C9@tardis0.quadrics.com>
Message-ID: <18168448-34B2-11D8-8AD9-000A95DA5638@sdsc.edu>

If these are the worst CVS log comments you've found you aren't looking
very hard. The only one here I'm compelled to clarify is IBM. There
are around 3-5 ways of probing the chipset to determine if the box is
SMP, RedHat supports the most common ones which everyone in the world
except IBM use. This forced us to patch anaconda to detect SMP for IBM
hardware (or in this case just force it) -- didn't these guys invent
the PC?

          -mjk

On Dec 22, 2003, at 9:08 AM, daniel.kidger at quadrics.com wrote:

>
>   # rpm -ql roll-hpc-kickstart |xargs -l grep -inH sucks
>
>   /export/home/install/profiles/current/nodes/force-smp.xml:21: IBM
>   sucks
>   /export/home/install/profiles/current/nodes/ganglia-server.xml:134:
>   perl sucks
>   /export/home/install/profiles/current/nodes/ganglia-server.xml:148:
>   Switch from ISC to RedHat's pump. Pump sucks but it is standard so
>   /export/home/install/profiles/current/nodes/sendmail-masq.xml:31: m4
>   sucks
>
>   :-)
>
>   Have a good Christmas,
>   Daniel.
>
>   --------------------------------------------------------------
>   Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
> ----------------------- www.quadrics.com --------------------



From mjk at sdsc.edu Mon Dec 22 11:13:30 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Mon, 22 Dec 2003 11:13:30 -0800
Subject: [Rocks-Discuss]RE:Writing a Roll ?
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com>
Message-ID: <EDBC4D7D-34B2-11D8-9250-000A95DA5638@sdsc.edu>

http://cvs.rocksclusters.org

In the rocks/src/roll directory you can see several roll examples, all
of which are build be typing "make roll". THe
roll-*-kickstart.*.noarch.rpm is the real magic that includes the XML
profiles that are grafted onto the base kickstart graph.

     -mjk

On Dec 22, 2003, at 9:03 AM, daniel.kidger at quadrics.com wrote:

> Folks,
>    I have made good headway in adding software and its configuration
> using extend-compute.xml and now have a robust system. (the head node
> install is still rather manual though :-( )
>
> I would now like to move to doing this as a Roll. However I am not
> sureof the best way of proceeding - there appears to be little
> documentation - either on HOWTO or on the underlying concepts.
>
> I have mounted the HPC_roll.iso and browsed around:
> - the image seems to consists of 2 subdirectories - in the same style
> as RedHat CD's
> - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS
> contains binary RPMs
>     ( the latter contains many more RPMs than there is an SRPM for. )
>
> There is no obvious configuration information until you explore:
>   roll-hpc-kickstart-3.0.0-0.noarch.rpm
> This seems to contain lots of XML which at first glance is hard to
> decifer.
>
> So my question is:
>    Should we be writing our own rolls, and if so how ? (examples?)
>
>
> Yours,
> Daniel.
>
> --------------------------------------------------------------
> Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
> ----------------------- www.quadrics.com --------------------
>
>>
From daniel.kidger at quadrics.com Mon Dec 22 11:12:17 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 22 Dec 2003 19:12:17 -0000
Subject: [Rocks-Discuss]RE:Writing a Roll ?
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357D1@tardis0.quadrics.com>

Federico,

> Here is a little primer since it sounds like you are indeed ready.
> --- many very informative lines deleted ---

   thanks for that long reply. :-)
I am currently pulling a copy of the source tree from cvs.rocksclusters.org
(194MB of rocks/doc alone !)


Just a couple of questions for now:
 1. Do rolls have to be CD based ?
   (during development I would probably get through a lot of CDROMs - but more
importantly it would get a bit fiddly - to be keep walking round to the CD-writer -
then nipping of to the room with the cluster in every time)

 2. Do I have to reinstall the headnode from scratch each time I want to test a
roll ?
(even if the roll only affects RPMs that get installed on compute nodes)

 3. Can a CD contain multiple rolls?
    (Once mature - a cluster may have quite a few rolls: pbs, sge, gm, IB, etc.
    and Quadrics would proably have two - the (open-source) hardware
drivers,MPI,etc and also RMS - our (closed-source) cluster Resource Manager.)

 4. What subset of the cvs tree does a Roll developer need? The whole tree is
clearly rather excessive.

  5. I am a little concerned about the amount of bloat needed to install our five
RPMs as a Roll.(The RPMs are already prebuilt by our own internal build
proceedures).
So taking another case - lets say the Intel Compilers - These have 4 RPMs (plus a
little sed-ery of their config files and pasting in the license file). Would these
be best installed as a Roll or as a simple extend-compute.xml as I have currently?

Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------



From sjenks at uci.edu Mon Dec 22 11:17:07 2003
From: sjenks at uci.edu (Stephen Jenks)
Date: Mon, 22 Dec 2003 11:17:07 -0800
Subject: [Rocks-Discuss]rocks-dist suggestion
Message-ID: <6F2FB100-34B3-11D8-88FD-000A95B96C68@uci.edu>
Hi ROCKS folks,

Just a suggestion for when you guys are bored after the 3.1 release 8-)

I ran into some trouble installing some updates to a ROCKS 3.0 cluster
that could easily be solved with some checking in rocks-dist:

I put the openssh and other updates in the proper contrib directory
under /home/install and ran "rock-dist dist" which properly updated the
distribution.

The problem occurred when I tried to reload the computed nodes - the
install failed when it hit any of the RPMs in the contrib directory. It
turns out the protections on those RPMs was set to 600 because I had
copied them out of root's home directory, thus they couldn't be read by
the server to send them down to the compute nodes. After fixing the
permissions, all was well.

So rocks-dist should check (and possibly fix) permissions on files that
will be included in the kickstart distribution. I realize that the
mistake was entirely mine, but I'm probably not the only one to ever
forget to set permissions correctly and the tool could easily catch
such mistakes.

Thanks for putting together such a useful cluster distribution!

Steve Jenks



From msherman at informaticscenter.info Mon Dec 22 11:50:03 2003
From: msherman at informaticscenter.info (Mark Sherman)
Date: Mon, 22 Dec 2003 12:50:03 -0700
Subject: [Rocks-Discuss]MPI and memory + node rescue
Message-ID: <20031222195003.7688.qmail@webmail4.mesa1.secureserver.net>

just for future consideration...
any time I need to look at a system without booting it or it's ability to boot I
just throw in the knoppix cd.
www.knoppix.org
______________________________________________
Mark Sherman
Computing Systems Administrator
Informatics Center
Massachusetts Biomedical Initiatives
Worcester MA 01605
508-797-4200
msherman at informaticscenter.info
----------------------~-----------------------


>   -------- Original Message --------
>   Subject: Re: [Rocks-Discuss]MPI and memory + node rescue
>   From: "Trond SAUE" <saue at quantix.u-strasbg.fr>
>   Date: Thu, November 27, 2003 1:38 am
>   To: "Stephen P. Lebeau" <lebeau at openbiosystems.com>
>   Cc: npaci-rocks-discussion at sdsc.edu
>
>   On 2003.11.26 16:52, Stephen P. Lebeau wrote:
>   > If you go here, they talk about creating a Linux floppy
>   > repair disk. Make sure to read the README file... they
>   > require that you make a 1.68MB floppy ( README explains how )
>   >
>   > http://www.tux.org/pub/people/kent-robotti/looplinux/rip/
>   >
>   > If that doesn't work...
>   >
>   > http://www.toms.net/rb/download.html
>   >
>   > I've actually used this one before.
>   >
>   > -S
>   >
>   In order to have a look at the disk of my crashed node, I downloaded
>   RIP-2.2-1680.bin from the first site, but I was not able to boot
>   properly. However, tomsrtbt-2.0.103 from the second site worked very
>   well and allowed me to reboot the node as well as mount its disk to
>   look at messages. Unfortunately, they did not really tell me anything
>   more...However, it might be an idea for a future release of ROCKS to
>   include a second "standalone" boot option for the computer nodes, so
>   that one can access them independent of the frontend....
>        All the best,
>            Trond Saue
>   --
>   Trond SAUE                                (DIRAC:
>   http://dirac.chem.sdu.dk/)
>   Laboratoire de Chimie Quantique et Mod?lisation Mol?culaire
>   Universite Louis Pasteur ; 4, rue Blaise Pascal ; F-67000 STRASBOURG
>   t?l: 03 90 24 13 01   fax: 03 90 24 15 89   email: saue at quantix.u-
>   strasbg.fr


From daniel.kidger at quadrics.com Mon Dec 22 11:51:16 2003
From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com)
Date: Mon, 22 Dec 2003 19:51:16 -0000
Subject: [Rocks-Discuss]rocks-dist suggestion
Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622461CD@tardis0.quadrics.com>

> Just a suggestion for when you guys are bored after the 3.1
> release 8-)

>   The problem occurred when I tried to reload the computed nodes - the
>   install failed when it hit any of the RPMs in the contrib
>   directory. It
>   turns out the protections on those RPMs was set to 600 because I had
>   copied them out of root's home directory, thus they couldn't
>   be read by
>   the server to send them down to the compute nodes. After fixing the
>   permissions, all was well.

This is a 'me-too' reply.

Rocks reads the RPMs using http - hence they need to be readable by user apache.
With symlinks - it is all too easy even if the RPMs are 644 for the directory tree
to be somewhere not walakable by a 3rd party userid like apache.


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


>


From fds at sdsc.edu Mon Dec 22 15:26:01 2003
From: fds at sdsc.edu (Federico Sacerdoti)
Date: Mon, 22 Dec 2003 15:26:01 -0800
Subject: [Rocks-Discuss]RE:Writing a Roll ?
In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357D1@tardis0.quadrics.com>
References: <30062B7EA51A9045B9F605FAAC1B4F622357D1@tardis0.quadrics.com>
Message-ID: <34B2A95C-34D6-11D8-8652-000393A4725A@sdsc.edu>

On Dec 22, 2003, at 11:12 AM, daniel.kidger at quadrics.com wrote:

> Federico,
>
>> Here is a little primer since it sounds like you are indeed ready.
>> --- many very informative lines deleted ---
>
>
> Just a couple of questions for now:
> 1. Do rolls have to be CD based ?
>    (during development I would probably get through a lot of CDROMs -
> but more importantly it would get a bit fiddly
> - to be keep walking round to the CD-writer - then nipping of to the
> room with the cluster in every time)
>
For distribution, the rolls should probably be cd based. For
development, however, that is not necessary. There is a make target
which will compile your source, and "install" the roll into your local
distribution. This is "make intodist" and assumes you are building on a
frontend node. You would follow this call with a call to "rocks-dist
dist" in the "/home/install" directory.

Of course, this makes most sense for rolls that affect compute nodes.
To test parts of your roll that affect frontend functionality, you
still need to use the CDs.

> 2. Do I have to reinstall the headnode from scratch each time I want
> to test a roll ?
> (even if the roll only affects RPMs that get installed on compute
> nodes)

See comment above. We're working on a way to fully install frontends
over the network, but it will not make it into the new release.

>
> 3. Can a CD contain multiple rolls?
>     (Once mature - a cluster may have quite a few rolls: pbs, sge, gm,
> IB, etc.
>     and Quadrics would proably have two - the (open-source) hardware
> drivers,MPI,etc and also RMS - our (closed-source) cluster Resource
> Manager.)

There is some support for this, we call them "Metarolls". We know they
are important, and we have some support for them now. The build process
for them is a bit different, and wont arrive for this release but soon
after.

> 4. What subset of the cvs tree does a Roll developer need? The whole
> tree is clearly rather excessive.
>
There are definately areas of the tree not necessary for roll building.
Its always safest to have everything, but you're welcome to crop and
test.

>     5. I am a little concerned about the amount of bloat needed to
>   install our five RPMs as a Roll.(The RPMs are already prebuilt by our
>   own internal build proceedures).
>   So taking another case - lets say the Intel Compilers - These have 4
>   RPMs (plus a little sed-ery of their config files and pasting in the
>   license file). Would these be best installed as a Roll or as a simple
>   extend-compute.xml as I have currently?

It is better to put them in a roll. We are have ways to combine,
distribute, sort, etc. these rolls, and they form a nice capsule of
software to introduce into the system. I understand that pulling the
whole source tree seems a bit excessive, but it is rather standard
practice for working on an open project.

Plus only the developer needs the source, the consumer does not.

Good luck, and we're glad someone is asking the questions. Rolls are
intended for outside construction, and we need to document the process.
:)

-Federico

>
> Yours,
> Daniel.
>
> --------------------------------------------------------------
> Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
> One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
> ----------------------- www.quadrics.com --------------------
>
>
Federico

Rocks Cluster Group, San Diego Supercomputing Center, CA



From tlinden at pcu.helsinki.fi Tue Dec 23 05:28:35 2003
From: tlinden at pcu.helsinki.fi (=?ISO-8859-15?Q?Tomas_Lind=E9n?=)
Date: Tue, 23 Dec 2003 15:28:35 +0200 (EET)
Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart?
Message-ID: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi>

To reinstall a cluster I use the command
cluster-fork /boot/kickstart/cluster-kickstart
Now since all 32 nodes have been PXE installed this means that the
reinstallation is performed by first doing a PXE-boot to load the
installation kernel. My problem is that sometimes a few nodes fail
during this reinstallation process. The failing nodes seem to be different
whenever this problem occurs. The really strange thing is that after
more than a day or so some nodes somehow manage to finish the
reinstallation process!

Sometimes the whole cluster comes up fine without any lost node.

The problematic nodes _seem_ to get the installation kernel with PXE, so
it might be not a PXE problem but something odd that happens later?

Has anyone seen anything like this before?

I'm aware of a bug in the RedHat installation kernel
on Athlon systems when trying to run with a serial console.
  https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001988.html
This is why I run the installation kernel without a serial console, but
this makes debugging difficult because the serial console only shows
output during the PXE boot process. No output is generated by the
installation kernel itself. The next output is generated when
the node has finished the installation and loads the final kernel which
runs fine with a serial console.

This is using Rocks 2.3.2 on a 32 node cluster with Tyan Tiger MPX
S2466N-4M motherboards and dual Athlon MP CPUs with no graphics
adapters, so the system has a 32 port serial console switch. The
motherboards have integrated 100 Mb/s 3Com 3C920 NICs (in practice a
3C905 NIC). The switch is made by Enterasys. The frontend private NIC is
also running at 100 Mb/s. When doing the cluster reinstallation the
network bandwidth over the frontend NIC saturates at 12,5 MB/s. Maybe
some packets are lost because of this?

The frontend private ethernet connection will be upgraded to Gb/s.
Hopefully this will solve this reinstallation problem.

Do you have any other ideas how to solve this problem?

Best regards,                               Tomas Lind?n
--------------------------------------------------------------------------
I           ,                                                            I
I Tomas Linden                   Helsinki Institute of Physics (HIP)     I
I Tomas.Linden at Helsinki.FI       P.O. Box 64 (Gustaf H?llstr?min katu 2) I
I phone: +358-9-191 505 63       FIN-00014 UNIVERSITY OF HELSINKI        I
I fax:   +358-9-191 505 53       Finland                                 I
I WWW: http://www.physics.helsinki.fi/~tlinden/eindex.html               I
--------------------------------------------------------------------------


From kjcruz at ece.uprm.edu Tue Dec 23 05:31:26 2003
From: kjcruz at ece.uprm.edu (Kennie Cruz)
Date: Tue, 23 Dec 2003 09:31:26 -0400 (AST)
Subject: [Rocks-Discuss]Error installing the compute node
Message-ID: <Pine.LNX.4.58.0312230921290.23333@alambique.ece.uprm.edu>

Hi,
I am trying to kickstart the compute nodes with Rocks 3.0.0, the frontend
is already working. I revised the FAQ question 7.1.2, the services (dhcpd,
httpd, mysqld and autofs) are running, but running kickstar.cgi from the
command line give an error:

     error - cannot kickstart external nodes

I made a quick search on the list, but without any success.

The compute node gets the assigned IP and insert-ethers detect the
appliance without any trouble, but fails to run the kickstart.cgi from the
frontend. The web server error log says something like this:

 [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed header
 from script. Bad header=# @Copyright@: /var/www/html/install/kickstart.cgi

While the access log says this:

 10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET
 /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0"
 500 587 "-" "-"

I ran insert-ethers with the Ethernet Switches option. My nodes are
connected via 3 managed ethernet switches.

Any help will be appreciated.

Thanks in advance.

--
Kennie J. Cruz Gutierrez, System Administrator
Department of Electrical and Computer Engineering
University of Puerto Rico, Mayaguez Campus
Work Phone: (787) 832-4040 x 3798
Email: Kennie.Cruz at ece.uprm.edu
Web: http://ece.uprm.edu/~kennie/

[2003-12-23/09:21]
Black holes are created when God divides by zero!


From bruno at rocksclusters.org Tue Dec 23 08:33:39 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 23 Dec 2003 08:33:39 -0800
Subject: [Rocks-Discuss]Error installing the compute node
In-Reply-To: <Pine.LNX.4.58.0312230921290.23333@alambique.ece.uprm.edu>
References: <Pine.LNX.4.58.0312230921290.23333@alambique.ece.uprm.edu>
Message-ID: <C33DF11A-3565-11D8-B821-000A95C4E3B4@rocksclusters.org>

just to be clear, did you execute:

# cd /home/install
# ./kickstart.cgi --client compute-0-0


 - gb
On Dec 23, 2003, at 5:31 AM, Kennie Cruz wrote:

>   Hi,
>
>   I am trying to kickstart the compute nodes with Rocks 3.0.0, the
>   frontend
>   is already working. I revised the FAQ question 7.1.2, the services
>   (dhcpd,
>   httpd, mysqld and autofs) are running, but running kickstar.cgi from
>   the
>   command line give an error:
>
>         error - cannot kickstart external nodes
>
>   I made a quick search on the list, but without any success.
>
>   The compute node gets the assigned IP and insert-ethers detect the
>   appliance without any trouble, but fails to run the kickstart.cgi from
>   the
>   frontend. The web server error log says something like this:
>
>     [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed
>   header
>     from script. Bad header=# @Copyright@:
>   /var/www/html/install/kickstart.cgi
>
>   While the access log says this:
>
>     10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET
>     /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0"
>     500 587 "-" "-"
>
>   I ran insert-ethers with the Ethernet Switches option. My nodes are
>   connected via 3 managed ethernet switches.
>
>   Any help will be appreciated.
>
>   Thanks in advance.
>
>   --
>   Kennie J. Cruz Gutierrez, System Administrator
>   Department of Electrical and Computer Engineering
>   University of Puerto Rico, Mayaguez Campus
>   Work Phone: (787) 832-4040 x 3798
>   Email: Kennie.Cruz at ece.uprm.edu
>   Web: http://ece.uprm.edu/~kennie/
>
>   [2003-12-23/09:21]
>   Black holes are created when God divides by zero!



From daniel.kidger at quadrics.com Tue Dec 23 09:03:49 2003
From: daniel.kidger at quadrics.com (Daniel Kidger)
Date: Tue, 23 Dec 2003 17:03:49 +0000
Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart?
In-Reply-To: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi>
References: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi>
Message-ID: <3FE87575.5060807@quadrics.com>

Tomas Lind?n wrote:

>To reinstall a cluster I use the command
> cluster-fork /boot/kickstart/cluster-kickstart
>Now since all 32 nodes have been PXE installed this means that the
>reinstallation is performed by first doing a PXE-boot to load the
>installation kernel. My problem is that sometimes a few nodes fail
>during this reinstallation process.
>
Although I haven't PXE installed a Rocks cluster of this size I have
done PXE-based installs of (larger) RedHat clusters using a customised
kickstart file. What can go wrong is that I have seen timeouts if too
made nodes dhcp/tftp for their installer kernel simultaneously. You
could try and increase the timeout or better not do too many at once -
say start 8 at a time every 30 seconds. There is plenty of precedence of
this in say the automated installer of the Alphaserver SC Tru64clusters.
  Also outside of Rocks I have seen folk use mutiple 'sub-master' nodes
to act as tftp/http fileservers during the install process. It would be
interesting to see what the Rocks developers vision is for the scalable
installation of large clusters.

--
Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------




From mjk at sdsc.edu Tue Dec 23 09:44:14 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 23 Dec 2003 09:44:14 -0800
Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart?
In-Reply-To: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi>
References: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi>
Message-ID: <9F7E8D1C-356F-11D8-8281-000A95DA5638@sdsc.edu>

The problems is PXE has an extremely short timeout, and once it fails
it does not retry. Since this is a BIOS thing, there isn't a lot to
do. If you boot your compute nodes off of CDs (and avoid PXE), the
problem goes away. This is because even if the DHCP timeouts we've
modified our installation to be extremely aggressive in DHCP request
and the entire installation process will actually watchdog timeout and
restart if needed. Unfortunately, the PXE timeout cannot be fixed in
the same way.

Our experience shows PXE to scale to 128 nodes for a mass re-install
using current hardware. Older CPUs may shows issues. The only answer
right now for this is to stage your re-install so the PXE server can
handle the load. This load is actually very low, but the PXE server
for Linux is still maturing.

     -mjk
On Dec 23, 2003, at 5:28 AM, Tomas Lind?n wrote:

>   To reinstall a cluster I use the command
>     cluster-fork /boot/kickstart/cluster-kickstart
>   Now since all 32 nodes have been PXE installed this means that the
>   reinstallation is performed by first doing a PXE-boot to load the
>   installation kernel. My problem is that sometimes a few nodes fail
>   during this reinstallation process. The failing nodes seem to be
>   different
>   whenever this problem occurs. The really strange thing is that after
>   more than a day or so some nodes somehow manage to finish the
>   reinstallation process!
>
>   Sometimes the whole cluster comes up fine without any lost node.
>
>   The problematic nodes _seem_ to get the installation kernel with PXE,
>   so
>   it might be not a PXE problem but something odd that happens later?
>
>   Has anyone seen anything like this before?
>
>   I'm aware of a bug in the RedHat installation kernel
>   on Athlon systems when trying to run with a serial console.
>
>   https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/
>   001988.html
>   This is why I run the installation kernel without a serial console, but
>   this makes debugging difficult because the serial console only shows
>   output during the PXE boot process. No output is generated by the
>   installation kernel itself. The next output is generated when
>   the node has finished the installation and loads the final kernel which
>   runs fine with a serial console.
>
>   This is using Rocks 2.3.2 on a 32 node cluster with Tyan Tiger MPX
>   S2466N-4M motherboards and dual Athlon MP CPUs with no graphics
>   adapters, so the system has a 32 port serial console switch. The
>   motherboards have integrated 100 Mb/s 3Com 3C920 NICs (in practice a
>   3C905 NIC). The switch is made by Enterasys. The frontend private NIC
>   is
>   also running at 100 Mb/s. When doing the cluster reinstallation the
>   network bandwidth over the frontend NIC saturates at 12,5 MB/s. Maybe
>   some packets are lost because of this?
>
>   The frontend private ethernet connection will be upgraded to Gb/s.
>   Hopefully this will solve this reinstallation problem.
>
>   Do you have any other ideas how to solve this problem?
>
>   Best regards,                               Tomas Lind?n
>   -----------------------------------------------------------------------
>   ---
>   I           ,
>     I
>   I Tomas Linden                   Helsinki Institute of Physics (HIP)
>     I
>   I Tomas.Linden at Helsinki.FI       P.O. Box 64 (Gustaf H?llstr?min katu
>   2) I
>   I phone: +358-9-191 505 63       FIN-00014 UNIVERSITY OF HELSINKI
>     I
>   I fax:   +358-9-191 505 53       Finland
>     I
>   I WWW: http://www.physics.helsinki.fi/~tlinden/eindex.html
>     I
>   -----------------------------------------------------------------------
>   ---



From Timothy.Carlson at pnl.gov Tue Dec 23 08:57:07 2003
From: Timothy.Carlson at pnl.gov (Carlson, Timothy S)
Date: Tue, 23 Dec 2003 08:57:07 -0800
Subject: [Rocks-Discuss]Error installing the compute node
Message-ID: <A383F042472668459D642266F8B41692056B9F@pnlmse24.pnl.gov>

The problem he is having is that he chose "ethernet switches" when
running insert-ethers. He should have chosen "Compute nodes".

Only choose "ethernet switches" when you are assigning an IP address to
an ethernet switch with DHCP. If your managed switches already have IP
addresses, then just install "compute nodes"

Tim

-----Original Message-----
From: Greg Bruno [mailto:bruno at rocksclusters.org]
Sent: Tuesday, December 23, 2003 8:34 AM
To: Kennie Cruz
Cc: npaci-rocks-discussion at sdsc.edu
Subject: Re: [Rocks-Discuss]Error installing the compute node


just to be clear, did you execute:

# cd /home/install
# ./kickstart.cgi --client compute-0-0


    - gb




On Dec 23, 2003, at 5:31 AM, Kennie Cruz wrote:

>   Hi,
>
>   I am trying to kickstart the compute nodes with Rocks 3.0.0, the
>   frontend
>   is already working. I revised the FAQ question 7.1.2, the services
>   (dhcpd,
>   httpd, mysqld and autofs) are running, but running kickstar.cgi from
>   the
>   command line give an error:
>
>         error - cannot kickstart external nodes
>
>   I made a quick search on the list, but without any success.
>
> The compute node gets the assigned IP and insert-ethers detect the
> appliance without any trouble, but fails to run the kickstart.cgi from

> the frontend. The web server error log says something like this:
>
>    [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed
> header
>    from script. Bad header=# @Copyright@:
> /var/www/html/install/kickstart.cgi
>
> While the access log says this:
>
>    10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET
>    /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks
HTTP/1.0"
>    500 587 "-" "-"
>
> I ran insert-ethers with the Ethernet Switches option. My nodes are
> connected via 3 managed ethernet switches.
>
> Any help will be appreciated.
>
> Thanks in advance.
>
> --
> Kennie J. Cruz Gutierrez, System Administrator
> Department of Electrical and Computer Engineering
> University of Puerto Rico, Mayaguez Campus
> Work Phone: (787) 832-4040 x 3798
> Email: Kennie.Cruz at ece.uprm.edu
> Web: http://ece.uprm.edu/~kennie/
>
> [2003-12-23/09:21]
> Black holes are created when God divides by zero!



From purikk at hotmail.com Tue Dec 23 12:48:30 2003
From: purikk at hotmail.com (Purushotham Komaravolu)
Date: Tue, 23 Dec 2003 15:48:30 -0500
Subject: [Rocks-Discuss]beowulf and rocks
Message-ID: <BAY1-DAV43JrOq93dSA00011dba@hotmail.com>

Hi,
     I keep people mentioning about beowulf and Rocks, can somebody point me
the differnece between them. They they just two different solutions for
Clusters?
Thanks
Regards,
Puru


From tim.carlson at pnl.gov Tue Dec 23 13:19:39 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Tue, 23 Dec 2003 13:19:39 -0800 (PST)
Subject: [Rocks-Discuss]beowulf and rocks
In-Reply-To: <BAY1-DAV43JrOq93dSA00011dba@hotmail.com>
Message-ID: <Pine.LNX.4.44.0312231314420.25800-100000@localhost.localdomain>
On Tue, 23 Dec 2003, Purushotham Komaravolu wrote:

>      I keep people mentioning about beowulf and Rocks, can somebody point me
> the differnece between them. They they just two different solutions for
> Clusters?

Beowulf is a loose definition for a cluster of machines (typically off the
shelf hardware). Beowulf is not software.

Rocks is a software solution to manage your beowulf.

You can compare rocks/oscar/scyld/ as software systems for your beowulf
cluster.

Read Robert Brown's book on beowulfs at this URL

http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book/beowulf_book/index.html

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From dlane at ap.stmarys.ca Tue Dec 23 14:53:51 2003
From: dlane at ap.stmarys.ca (Dave Lane)
Date: Tue, 23 Dec 2003 18:53:51 -0400
Subject: [Rocks-Discuss]beowulf and rocks
In-Reply-To: <BAY1-DAV43JrOq93dSA00011dba@hotmail.com>
Message-ID: <5.2.0.9.0.20031223185219.01b444e8@ap.stmarys.ca>

At 03:48 PM 12/23/2003 -0500, Purushotham Komaravolu wrote:
>Hi,
>      I keep people mentioning about beowulf and Rocks, can somebody point me
>the differnece between them. They they just two different solutions for
>Clusters?

Beowulf is a loosely-defined generic term (that I won't attempt do define
now!), while Rocks is one of the several software distributions that
implement a beowulf cluster.

... Dave



From junkscarce at hotmail.com Tue Dec 23 15:43:05 2003
From: junkscarce at hotmail.com (Reed Scarce)
Date: Tue, 23 Dec 2003 23:43:05 +0000
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
Message-ID: <BAY1-F147XhOous6jec0001512f@hotmail.com>

Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml
lies code like this commented code:
<post>
/bin/mkdir /mnt/plc/ <-- works -->
/bin/mkdir /mnt/plc/plc_data <-- works -->
/bin/ln -s /mnt/plc_data /data1 <-- works -->
/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, source
exists -->
</post>

I don't understand why the ln to a directory succeeds but a ln to a script
fails.

BTW, Dr. Landman, I've attempted to use your build.pl but it seems to faill
with:
Can't stat `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm
.
(my note: the path ends at RPMS) I swear I thought I saw a solution to this
once but I can't find it again.
Upon reinstallation with the file your tool created
(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda threw
back the exception: Traceback (innermost last): file
"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
configFileData) File
"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 in
run
ok save debug


TIA Reed Scarce

_________________________________________________________________
Tired of slow downloads? Compare online deals from your local high-speed
providers now. https://broadband.msn.com



From landman at scalableinformatics.com Tue Dec 23 16:17:58 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 23 Dec 2003 19:17:58 -0500
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
In-Reply-To: <BAY1-F147XhOous6jec0001512f@hotmail.com>
References: <BAY1-F147XhOous6jec0001512f@hotmail.com>
Message-ID: <1072225078.4501.82.camel@protein.scalableinformatics.com>

Hi Reed:

  Which version of finishing server fails on which version of ROCKS? It
looks like 3.0. I am up to 3.1.0 now. With a little bit of modification
I could make it work with 2.3.2. Likely just a single line to point to
the right path.

  Let me know and I'll see what I can do. I would recommend using the
3.1.0 environment, as it is a significant (read as massive) improvement
over previous versions. If you (and others) need it to work with older
(pre-3.0) versions of ROCKS, I think I can handle that. Let me know.

Joe

On Tue, 2003-12-23 at 18:43, Reed Scarce wrote:
> Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml
> lies code like this commented code:
> <post>
>   /bin/mkdir /mnt/plc/ <-- works -->
>   /bin/mkdir /mnt/plc/plc_data <-- works -->
>   /bin/ln -s /mnt/plc_data /data1 <-- works -->
>   /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, source
>   exists -->
>   </post>
>
>   I don't understand why the ln to a directory succeeds but a ln to a script
>   fails.
>
>   BTW, Dr. Landman, I've attempted to use your build.pl but it seems to faill
>   with:
>   Can't stat `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm



From mjk at sdsc.edu Tue Dec 23 16:35:13 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 23 Dec 2003 16:35:13 -0800
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
In-Reply-To: <BAY1-F147XhOous6jec0001512f@hotmail.com>
References: <BAY1-F147XhOous6jec0001512f@hotmail.com>
Message-ID: <09B1C3EA-35A9-11D8-8281-000A95DA5638@sdsc.edu>

"man chkconfig"

If you use chkconfig you do not need to create the rc*.d/* files and
they are put in place for you.

       -mjk

On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:

>   Within /export/home/install/profiles/2.3.2/site-nodes
>   extend-compute.xml lies code like this commented code:
>   <post>
>   /bin/mkdir /mnt/plc/ <-- works -->
>   /bin/mkdir /mnt/plc/plc_data <-- works -->
>   /bin/ln -s /mnt/plc_data /data1 <-- works -->
>   /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln,
>   source exists -->
>   </post>
>
>   I don't understand why the ln to a directory succeeds but a ln to a
>   script fails.
>
>   BTW, Dr. Landman, I've attempted to use your build.pl but it seems to
>   faill with:
>   Can't stat
>   `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .
>   (my note: the path ends at RPMS) I swear I thought I saw a solution
>   to this once but I can't find it again.
>   Upon reinstallation with the file your tool created
>   (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda
>   threw back the exception: Traceback (innermost last): file
>   "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
>   configFileData) File
>   "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line
>   427 in run
>   ok save debug
>
>
>   TIA Reed Scarce
>
>   _________________________________________________________________
>   Tired of slow downloads? Compare online deals from your local
>   high-speed providers now. https://broadband.msn.com



From jkreuzig at uci.edu Tue Dec 23 19:53:16 2003
From: jkreuzig at uci.edu (James Kreuziger)
Date: Tue, 23 Dec 2003 19:53:16 -0800 (PST)
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
Message-ID: <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>

Thanks everybody for the info. I was aware of the fast-link issue;
However, after enabling it, we still were unable to see the switch
from the frontend. We had a laptop hooked up to the switch via serial
and ethernet and was able to turn on the fast-link, and assign an
IP address. After that, the web-based interface came up on the laptop.
Still, no response on the switch from the frontend.

So after great gnashing of teeth, and dozens of re-installs of the
frontend, success! The problem? The extra nic card on the frontend.
We had bought the frontend with a dual 1GB card and a single 100MB card.
Whenever the single nic card is installed, the system always takes this
as eth0. This is something that was staring us right in the face, so
that's why it probably took so long to figure out.

After 3 years of trying to find the money, we finally have our first
8 node cluster up!

-Jim

*************************************************
Jim Kreuziger
jkreuzig at uci.edu
949-824-4474
*************************************************




From landman at scalableinformatics.com Tue Dec 23 20:23:35 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 23 Dec 2003 23:23:35 -0500
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>
References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
<Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>
Message-ID: <3FE914C7.3050001@scalableinformatics.com>

Hi James:

    One of the things I do first time I boot up a new head node is to map
the ethernet ports. I take out all but one of the network wires, and
make sure there is real network traffic. A ping on the subnet is fine.
Then I tcpdump the network port. What is suprising to me is how many
times the assumed network eth0 is mapped differently. Then by hand,
after mapping the rest of the ports, I manually modify the
/etc/modules.conf file to reflect what I need.

  Just a suggestion. Having been bitten enough, I find simple sanity
checks help reduce the size or dimensionality of the space of possible
problems. This makes these debugging sessions usually faster, and
allows for better characterization of the issue.

Joe

James Kreuziger wrote:

>Thanks everybody for the info. I was aware of the fast-link issue;
>However, after enabling it, we still were unable to see the switch
>from the frontend. We had a laptop hooked up to the switch via serial
>and ethernet and was able to turn on the fast-link, and assign an
>IP address. After that, the web-based interface came up on the laptop.
>Still, no response on the switch from the frontend.
>
>So after great gnashing of teeth, and dozens of re-installs of the
>frontend, success! The problem? The extra nic card on the frontend.
>We had bought the frontend with a dual 1GB card and a single 100MB card.
>Whenever the single nic card is installed, the system always takes this
>as eth0. This is something that was staring us right in the face, so
>that's why it probably took so long to figure out.
>
>After 3 years of trying to find the money, we finally have our first
>8 node cluster up!
>
>-Jim
>
>*************************************************
>Jim Kreuziger
>jkreuzig at uci.edu
>949-824-4474
>*************************************************
>
>
>

--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615




From bruno at rocksclusters.org Tue Dec 23 21:26:08 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 23 Dec 2003 21:26:08 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64
Message-ID: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org>
Version 3.1.0 (Matterhorn) of the Rocks cluster distribution is
released and now supports three processor families: Intel IA-32, Intel
Itanium Processor Family, and AMD Opteron. ?This is the released
version of software that was used to build a fully-functioning 128-node
grid-enabled cluster?in under 2 hours on opening night ?last month
at?SC2003 in Phoenix, AZ. ?Rocks is developed by the Grid and Cluster
Computing Group at SDSC and by partners at the University of
California, Berkeley, Scalable Systems in Singapore, and individual
open-source software developers.

This is a co-release for x86 (Pentium, Athlon, and others), Itanium2
(IA-64) and Opteron (x86-64) based clusters. Software is freely
available for download to burn onto a bootable CD set for x86 and
x86-64 or a single DVD for Itanium2. Versions for all processor
families are available at http://www.rocksclusters.org/.

Introduced on Version 3.0.0, this version enhances the ?roll? mechanism
to enable users, communities and others to easily add on optional
software and configuration. ?These optional ?Roll CDs? extend the
system by integrating seamlessly and automatically into the management
and packaging mechanisms used by base software. For all intents and
purposes, rolls appear as if they are part of the original CD
distribution. ?A number of defined extension rolls are freely available
and include HPC, Sun Grid Engine, Grid (based on NMI), Java and Intel
Compiler. An important feature is that new rolls can be created or
updated independently of the core distribution. This fundamentally
enables science teams and communities to add on domain-specific
software packages, define a particular grid configuration, or simply
modify any of the default configuration or package settings.

New features in NPACI Rocks 3.1.0 include:

- Opteron Support
- Sun Grid Engine as default queuing System
- Upgraded Ganglia server and client, used for collecting and
visualizing cluster-wide monitoring metrics
- Upgraded MPICH-GM and Myrinet GM 2.0 for the latest Rev D. Cards
- Rocks-developed 411 information system to replace Network Information
Service (NIS)
- Updated SSH version 3.7.1 with no login delay
- Several Optional Software Rolls including:
      - NSF Middleware Initiative version R4 grid distribution
      - Java 2
      - Intel Compilers for x86 and ia64

Rocks 3.1.0 is derived from Red Hat?s publicly available source
packages (SRPMS) used in portions of their Enterprise Linux 3.0 Product
Line. All SRPMs ?have been recompiled to enable redistribution. ?All
available updates for these packages have been pre-applied.
Rocks-specific software and standard cluster and grid community
software is then added to create a complete clustering toolkit. ?All
Rocks source code is available in a public CVS repository.



From angel at miami.edu Wed Dec 24 13:14:59 2003
From: angel at miami.edu (Angel Li)
Date: Wed, 24 Dec 2003 16:14:59 -0500
Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64
In-Reply-To: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org>
References: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org>
Message-ID: <3FEA01D3.8080204@miami.edu>

Hi,

I currently have a cluster running Rocks 3.0 and I'm considering
upgrading to 3.1. Now that SGE is the default batch queue, is maui
working? Also, the Intel compiler roll is included. What licensing
issues will I encounter? We currently have a license for version 7.

Thanks,

Angel



From bruno at rocksclusters.org Wed Dec 24 14:14:46 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 24 Dec 2003 14:14:46 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64
In-Reply-To: <3FEA01D3.8080204@miami.edu>
References: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org>
<3FEA01D3.8080204@miami.edu>
Message-ID: <94F9D6F6-365E-11D8-B821-000A95C4E3B4@rocksclusters.org>

> I currently have a cluster running Rocks 3.0 and I'm considering
> upgrading to 3.1. Now that SGE is the default batch queue, is maui
> working?

maui and pbs are currently not available in rocks 3.1, but it will be
soon.

maui and pbs will be included in its own roll -- that effort will be
driven by roy dragseth from the University of Troms?.

> Also, the Intel compiler roll is included. What licensing issues will
> I encounter? We currently have a license for version 7.

i'm not sure how the licenses transfer between versions.

after you bring up a frontend with the intel roll, the following link
is available on the frontend's home page:

        http://www.intel.com/software/products/distributors/rock_cluster.htm

after you purchase a license, you just need to copy the license into
the appropriate directory and then start compiling.

for fortran, the appropriate directory is:

        /opt/intel_fc_80/licenses

and for C, the appropriate directory is:

        /opt/intel_cc_80/licenses

also, the intel roll contains a pre-built MPICH environment -- it is
found under /opt/mpich/intel.

  - gb



From cdwan at mail.ahc.umn.edu Wed Dec 24 14:17:28 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Wed, 24 Dec 2003 16:17:28 -0600 (CST)
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <3FE914C7.3050001@scalableinformatics.com>
References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
 <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>
 <3FE914C7.3050001@scalableinformatics.com>
Message-ID: <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu>

Once upon a time, I decided to install a third interface in a rocks head
node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a
data network. At boot time *everything* was broken.

To make a long story less long, the system had remapped itself with the
new gig card as eth0, and the other two shifted up by one. That was
really close to "no fun at all."

Happy holidays!   I'm burning the new release right now!

-C


From michal at harddata.com Wed Dec 24 15:05:43 2003
From: michal at harddata.com (Michal Jaegermann)
Date: Wed, 24 Dec 2003 16:05:43 -0700
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu>; from
cdwan@mail.ahc.umn.edu on Wed, Dec 24, 2003 at 04:17:28PM -0600
References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
<Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>
<3FE914C7.3050001@scalableinformatics.com>
<Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu>
Message-ID: <20031224160543.A25886@mail.harddata.com>

On Wed, Dec 24, 2003 at 04:17:28PM -0600, Chris Dwan (CCGB) wrote:
>
> Once upon a time, I decided to install a third interface in a rocks head
> node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a
> data network. At boot time *everything* was broken.

I still cannot understand why people insists on NOT using 'nameif'
utility. All network interfaces can be named whichever way you want
and they will not move regardless how many NICs you will add or
remove as long as MACs are not changed. If you replace a card with
a different one then /etc/mactab needs to be edited to reflect your
new configuration. On clients nodes with an automatic reinstall
this indeed is not practical but for your front end machine this is
another story.

It is indeed the case that default startup scripts from Red Hat 7.3
need some simple additions as interface (re)naming need to be done
before NICs are brought up for the first time. In RH9 and FC1
'nameif' will be used "automagically" if HWADDR variable is defined
(and with a correct value).

Of course if you have different drivers for different NICs, and they
are loaded as modules, then names can be assigned by editing
/etc/modules.conf

  Michal


From bruno at rocksclusters.org Wed Dec 24 15:41:25 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 24 Dec 2003 15:41:25 -0800
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <20031224160543.A25886@mail.harddata.com>
References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
<Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>
<3FE914C7.3050001@scalableinformatics.com>
<Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu>
<20031224160543.A25886@mail.harddata.com>
Message-ID: <AFFB44D8-366A-11D8-B821-000A95C4E3B4@rocksclusters.org>

>> Once upon a time, I decided to install a third interface in a rocks
>> head
>> node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested)
>> for a
>> data network. At boot time *everything* was broken.
>
> I still cannot understand why people insists on NOT using 'nameif'
> utility. All network interfaces can be named whichever way you want
> and they will not move regardless how many NICs you will add or
> remove as long as MACs are not changed. If you replace a card with
> a different one then /etc/mactab needs to be edited to reflect your
> new configuration. On clients nodes with an automatic reinstall
> this indeed is not practical but for your front end machine this is
> another story.
>
> It is indeed the case that default startup scripts from Red Hat 7.3
> need some simple additions as interface (re)naming need to be done
> before NICs are brought up for the first time. In RH9 and FC1
> 'nameif' will be used "automagically" if HWADDR variable is defined
> (and with a correct value).

michal,

for this release, we looked at your suggestion of using nameif -- we
did a quick prototype and it looks like it will be the right thing to
do. we sketched out a design and found that the full solution will
require many pieces (database changes, installer changes and the
obvious XML file changes). we left this out of 3.1.0 but it is towards
the top of our list for the next release.

thanks for the suggestion of nameif -- it is suggestions like that
which help us to define the direction of rocks.

  - gb
From landman at scalableinformatics.com Wed Dec 24 16:08:54 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 24 Dec 2003 19:08:54 -0500
Subject: [Rocks-Discuss]Dell Power Connect 5224
In-Reply-To: <20031224160543.A25886@mail.harddata.com>
References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov>
<Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu>
<3FE914C7.3050001@scalableinformatics.com>
<Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu>
<20031224160543.A25886@mail.harddata.com>
Message-ID: <3FEA2A96.3060405@scalableinformatics.com>


Michal Jaegermann wrote:

>On Wed, Dec 24, 2003 at 04:17:28PM -0600, Chris Dwan (CCGB) wrote:
>
>
>>Once upon a time, I decided to install a third interface in a rocks head
>>node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a
>>data network. At boot time *everything* was broken.
>>
>>
>
>I still cannot understand why people insists on NOT using 'nameif'
>utility. All network interfaces can be named whichever way you want
>and they will not move regardless how many NICs you will add or
>remove as long as MACs are not changed. If you replace a card with
>a different one then /etc/mactab needs to be edited to reflect your
>new configuration. On clients nodes with an automatic reinstall
>this indeed is not practical but for your front end machine this is
>another story.
>
>
Agreed, though as far as I can tell, nameif is not used in the
/etc/init.d scripts. It is used by ifup, so you would have to set HWADDR
on each interface in the /etc/sysconfig/.../ifcfg-eth* files (the ...
refers to that RH9 and RHEL3 have moved where these things sit from what
we were used to in RH7.x). Still need to map the interfaces though, to
see which physical port corresponds to which device/mac address. With
that in hand, you can set up the HWADDR or just swap cables. With the
advent of the folks making exactly the right length cables (e.g. not
giving any play, and placing them under tension while plugged in...) the
cable swap doesnt work well for mapping on some systems. Moreover, on a
fair number of systems I have played with, the BIOS is setup so that if
they PXE boot, they are doing so from the address that the installed
version of ROCKS would see as eth1. Annoying.


--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615
From junkscarce at hotmail.com Fri Dec 26 15:35:57 2003
From: junkscarce at hotmail.com (Reed Scarce)
Date: Fri, 26 Dec 2003 23:35:57 +0000
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
Message-ID: <BAY1-F88Kxt8zPdqJL900052b1b@hotmail.com>

The line:

chkconfig --level 3 gpm on

works great from the command line, not in extend-compute.xml. Thanks for
the new tool though, always glad. The line above is in a block without
<eval shell="bash"> tags. I'll keep trying and rtm. Is it possible this is
a 2.6.2 issue? The live environment restricts me from using a more recent
version.


>From: "Mason J. Katz" <mjk at sdsc.edu>
>To: "Reed Scarce" <junkscarce at hotmail.com>
>CC: npaci-rocks-discussion at sdsc.edu
>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
>Date: Tue, 23 Dec 2003 16:35:13 -0800
>
>"man chkconfig"
>
>If you use chkconfig you do not need to create the rc*.d/* files and they
>are put in place for you.
>
>     -mjk
>
>On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:
>
>>Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml
>>lies code like this commented code:
>><post>
>>/bin/mkdir /mnt/plc/ <-- works -->
>>/bin/mkdir /mnt/plc/plc_data <-- works -->
>>/bin/ln -s /mnt/plc_data /data1 <-- works -->
>>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln,
>>source exists -->
>></post>
>>
>>I don't understand why the ln to a directory succeeds but a ln to a script
>>fails.
>>
>>BTW, Dr. Landman, I've attempted to use your build.pl but it seems to
>>faill with:
>>Can't stat
>>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .
>>(my note: the path ends at RPMS) I swear I thought I saw a solution to
>>this once but I can't find it again.
>>Upon reinstallation with the file your tool created
>>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda
>>threw back the exception: Traceback (innermost last): file
>>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
>>configFileData) File
>>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 in
>>run
>>ok save debug
>>
>>
>>TIA Reed Scarce
>>
>>_________________________________________________________________
>>Tired of slow downloads? Compare online deals from your local high-speed
>>providers now. https://broadband.msn.com
>

_________________________________________________________________
Worried about inbox overload? Get MSN Extra Storage now!
http://join.msn.com/?PAGE=features/es



From mjk at sdsc.edu Fri Dec 26 16:46:22 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Fri, 26 Dec 2003 16:46:22 -0800
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
In-Reply-To: <BAY1-F88Kxt8zPdqJL900052b1b@hotmail.com>
References: <BAY1-F88Kxt8zPdqJL900052b1b@hotmail.com>
Message-ID: <1759D2DF-3806-11D8-98D0-000A95DA5638@sdsc.edu>

Not sure if this answers your question.   But..

The <eval></eval> blocks are for code to be run on the kickstart server
(the one the generates the kickstart file). Code outside of the eval
blocks is run on the kickstarting host.

     -mjk


On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote:

> The line:
>
> chkconfig --level 3 gpm on
>
> works great from the command line, not in extend-compute.xml. Thanks
> for the new tool though, always glad. The line above is in a block
> without <eval shell="bash"> tags. I'll keep trying and rtm. Is it
> possible this is a 2.6.2 issue? The live environment restricts me
> from using a more recent version.
>
>
>> From: "Mason J. Katz" <mjk at sdsc.edu>
>> To: "Reed Scarce" <junkscarce at hotmail.com>
>> CC: npaci-rocks-discussion at sdsc.edu
>> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation
>> fails
>> Date: Tue, 23 Dec 2003 16:35:13 -0800
>>
>> "man chkconfig"
>>
>> If you use chkconfig you do not need to create the rc*.d/* files and
>> they are put in place for you.
>>
>>    -mjk
>>
>> On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:
>>
>>> Within /export/home/install/profiles/2.3.2/site-nodes
>>> extend-compute.xml lies code like this commented code:
>>> <post>
>>> /bin/mkdir /mnt/plc/ <-- works -->
>>> /bin/mkdir /mnt/plc/plc_data <-- works -->
>>> /bin/ln -s /mnt/plc_data /data1 <-- works -->
>>> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln,
>>> source exists -->
>>> </post>
>>>
>>> I don't understand why the ln to a directory succeeds but a ln to a
>>> script fails.
>>>
>>> BTW, Dr. Landman, I've attempted to use your build.pl but it seems
>>> to faill with:
>>> Can't stat
>>> `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .
>>> (my note: the path ends at RPMS) I swear I thought I saw a solution
>>> to this once but I can't find it again.
>>> Upon reinstallation with the file your tool created
>>> (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm)
>>> anaconda threw back the exception: Traceback (innermost last): file
>>> "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
>>> configFileData) File
>>> "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line
>>> 427 in run
>>> ok save debug
>>>
>>>
>>> TIA Reed Scarce
>>>
>>> _________________________________________________________________
>>> Tired of slow downloads? Compare online deals from your local
>>> high-speed providers now. https://broadband.msn.com
>>
>
> _________________________________________________________________
> Worried about inbox overload? Get MSN Extra Storage now!
> http://join.msn.com/?PAGE=features/es



From apseyed at bu.edu Sat Dec 27 12:32:40 2003
From: apseyed at bu.edu (apseyed at bu.edu)
Date: Sat, 27 Dec 2003 15:32:40 -0500
Subject: [Rocks-Discuss]Re: npaci-rocks-discussion digest, Vol 1 #663 - 2 msgs
In-Reply-To: <200312272013.hBRKDbJ15227@postal.sdsc.edu>
References: <200312272013.hBRKDbJ15227@postal.sdsc.edu>
Message-ID: <1072557160.3fedec68d07d6@www.bu.edu>

For what its worth,

Why don't you try specifying the absolute path (/sbin/chkconfig) and setting
debug flags and output file. (If you can confirm /sbin is in $PATH during the
life of the script nevermind the first suggestion.)

echo "got to chkconfig beginning" > /tmp/ks.log
/sbin/chkconfig --level 3 gpm on
echo "go to chkconfig end" >> /tmp/ks.log
/sbin/chkconfig --list | grep gpm >> /tmp/ks.log

-Patrice


Quoting npaci-rocks-discussion-request at sdsc.edu:

>   Send npaci-rocks-discussion mailing list submissions to
>       npaci-rocks-discussion at sdsc.edu
>
>   To subscribe or unsubscribe via the World Wide Web, visit
>       http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>   or, via email, send a message with subject or body 'help' to
>       npaci-rocks-discussion-request at sdsc.edu
>
>   You can reach the person managing the list at
>       npaci-rocks-discussion-admin at sdsc.edu
>
>   When replying, please edit your Subject line so it is more specific
>   than "Re: Contents of npaci-rocks-discussion digest..."
>
>
>   Today's Topics:
>
>      1. Re: Extend-compute.xml issue, ln creation fails (Reed Scarce)
>      2. Re: Extend-compute.xml issue, ln creation fails (Mason J.
>   Katz)
>
>   --__--__--
>
>   Message: 1
>   From: "Reed Scarce" <junkscarce at hotmail.com>
>   To: mjk at sdsc.edu
>   Cc: npaci-rocks-discussion at sdsc.edu
>   Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation
>   fails
>   Date: Fri, 26 Dec 2003 23:35:57 +0000
>
>   The line:
>
>   chkconfig --level 3 gpm on
>
>   works great from the command line, not in extend-compute.xml. Thanks
>   for
>   the new tool though, always glad. The line above is in a block
>   without
>   <eval shell="bash"> tags. I'll keep trying and rtm. Is it possible
>   this is
>   a 2.6.2 issue? The live environment restricts me from using a more
>   recent
>   version.
>
>
>   >From: "Mason J. Katz" <mjk at sdsc.edu>
>   >To: "Reed Scarce" <junkscarce at hotmail.com>
>   >CC: npaci-rocks-discussion at sdsc.edu
>   >Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation
>   fails
>   >Date: Tue, 23 Dec 2003 16:35:13 -0800
>   >
>   >"man chkconfig"
>   >
>   >If you use chkconfig you do not need to create the rc*.d/* files and
>   they
>   >are put in place for you.
>   >
>   >   -mjk
>   >
>   >On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:
>   >
>   >>Within /export/home/install/profiles/2.3.2/site-nodes
>   extend-compute.xml
>   >>lies code like this commented code:
>   >><post>
>   >>/bin/mkdir /mnt/plc/ <-- works -->
>   >>/bin/mkdir /mnt/plc/plc_data <-- works -->
>   >>/bin/ln -s /mnt/plc_data /data1 <-- works -->
>   >>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to
>   ln,
>   >>source exists -->
>   >></post>
>   >>
>   >>I don't understand why the ln to a directory succeeds but a ln to a
>   script
>   >>fails.
>   >>
>   >>BTW, Dr. Landman, I've attempted to use your build.pl but it seems
>   to
>   >>faill with:
>   >>Can't stat
>   >>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm
>   .
>   >>(my note: the path ends at RPMS) I swear I thought I saw a
>   solution to
>   >>this once but I can't find it again.
>   >>Upon reinstallation with the file your tool created
>   >>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm)
>   anaconda
>   >>threw back the exception: Traceback (innermost last): file
>   >>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
>   >>configFileData) File
>   >>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line
>   427 in
>   >>run
>   >>ok save debug
>   >>
>   >>
>   >>TIA Reed Scarce
>   >>
>   >>_________________________________________________________________
>   >>Tired of slow downloads? Compare online deals from your local
>   high-speed
>   >>providers now. https://broadband.msn.com
>   >
>
>   _________________________________________________________________
>   Worried about inbox overload? Get MSN Extra Storage now!
>   http://join.msn.com/?PAGE=features/es
>
>
>   --__--__--
>
>   Message: 2
>   Cc: npaci-rocks-discussion at sdsc.edu
>   From: "Mason J. Katz" <mjk at sdsc.edu>
>   Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation
>   fails
>   Date: Fri, 26 Dec 2003 16:46:22 -0800
>   To: "Reed Scarce" <junkscarce at hotmail.com>
>
>   Not sure if this answers your question.   But..
>
>   The <eval></eval> blocks are for code to be run on the kickstart
>   server
>   (the one the generates the kickstart file). Code outside of the eval
>
>   blocks is run on the kickstarting host.
>
>      -mjk
>
>
>   On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote:
>
>   > The line:
>   >
>   > chkconfig --level 3 gpm on
>   >
>   > works great from the command line, not in extend-compute.xml.
>   Thanks
>   > for the new tool though, always glad. The line above is in a block
>
>   > without <eval shell="bash"> tags.   I'll keep trying and rtm.   Is it
>
>   > possible this is a 2.6.2 issue?   The live environment restricts me
>
>   > from using a more recent version.
>   >
>   >
>   >> From: "Mason J. Katz" <mjk at sdsc.edu>
>   >> To: "Reed Scarce" <junkscarce at hotmail.com>
>   >> CC: npaci-rocks-discussion at sdsc.edu
>   >> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation
>
>   >> fails
>   >> Date: Tue, 23 Dec 2003 16:35:13 -0800
>   >>
>   >> "man chkconfig"
>   >>
>   >> If you use chkconfig you do not need to create the rc*.d/* files
>   and
>   >> they are put in place for you.
>   >>
>   >> -mjk
>   >>
>   >> On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:
>   >>
>   >>> Within /export/home/install/profiles/2.3.2/site-nodes
>   >>> extend-compute.xml lies code like this commented code:
>   >>> <post>
>   >>> /bin/mkdir /mnt/plc/ <-- works -->
>   >>> /bin/mkdir /mnt/plc/plc_data <-- works -->
>   >>> /bin/ln -s /mnt/plc_data /data1 <-- works -->
>   >>> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to
>   ln,
>   >>> source exists -->
>   >>> </post>
>   >>>
>   >>> I don't understand why the ln to a directory succeeds but a ln to
>   a
>   >>> script fails.
>   >>>
>   >>> BTW, Dr. Landman, I've attempted to use your build.pl but it
>   seems
>   >>> to faill with:
>   >>> Can't stat
>   >>> `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm
>   .
>   >>> (my note: the path ends at RPMS) I swear I thought I saw a
>   solution
>   >>> to this once but I can't find it again.
>   >>> Upon reinstallation with the file your tool created
>   >>> (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm)
>   >>> anaconda threw back the exception: Traceback (innermost last):
>   file
>   >>> "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
>
>   >>> configFileData) File
>   >>> "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py",
>   line
>   >>> 427 in run
>   >>> ok save debug
>   >>>
>   >>>
>   >>> TIA Reed Scarce
>   >>>
>   >>>
>   _________________________________________________________________
>   >>> Tired of slow downloads? Compare online deals from your local
>   >>> high-speed providers now. https://broadband.msn.com
>   >>
>   >
>   > _________________________________________________________________
>   > Worried about inbox overload? Get MSN Extra Storage now!
>   > http://join.msn.com/?PAGE=features/es
>
>
>
>   --__--__--
>
>   _______________________________________________
>   npaci-rocks-discussion mailing list
>   npaci-rocks-discussion at sdsc.edu
>   http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
>
>
> End of npaci-rocks-discussion Digest
>


From rocks_india at yahoo.co.in Sat Dec 27 20:20:40 2003
From: rocks_india at yahoo.co.in (=?iso-8859-1?q?Rocks=20India?=)
Date: Sun, 28 Dec 2003 04:20:40 +0000 (GMT)
Subject: [Rocks-Discuss]Rocks 3.0 Newbeeeeeeee
Message-ID: <20031228042040.88990.qmail@web8301.mail.in.yahoo.com>

Hello All,
          I am new to Rocks, i was able to download
and
install Rocks 3.0. I am not sure if Globus 3.0 gets
installed during the installation process.I tried to
use simple ca commands and get command not found
error.
          Do i need to download Globus Tool Kit and
install it or would it be installed along with rocks.

Or can any one direct me to a site or give me steps
that
need to be taken after installing rocks what need to
be done for manipulating globus

                                Rocks-India

________________________________________________________________________
Yahoo! India Matrimony: Find your partner online.
Go to http://yahoo.shaadi.com


From bruno at rocksclusters.org Sat Dec 27 21:35:28 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Sat, 27 Dec 2003 21:35:28 -0800
Subject: [Rocks-Discuss]Rocks 3.0 Newbeeeeeeee
In-Reply-To: <20031228042040.88990.qmail@web8301.mail.in.yahoo.com>
References: <20031228042040.88990.qmail@web8301.mail.in.yahoo.com>
Message-ID: <A4E388DE-38F7-11D8-9E96-000A95C4E3B4@rocksclusters.org>

>             I am new to Rocks, i was able to download
>   and
>   install Rocks 3.0. I am not sure if Globus 3.0 gets
>   installed during the installation process.I tried to
>   use simple ca commands and get command not found
>   error.
>             Do i need to download Globus Tool Kit and
>   install it or would it be installed along with rocks.
>
>   Or can any one direct me to a site or give me steps
>   that
>   need to be taken after installing rocks what need to
>   be done for manipulating globus

here's the steps, but it would require reinstalling your frontend:

go to:
http://www.rocksclusters.org/rocks-documentation/3.1.0/iso-images.html

and download:

     Rocks Base, HPC Roll, SGE Roll and the Grid Roll

then burn them all to CD.

the follow the directions at:

http://www.rocksclusters.org/rocks-documentation/3.1.0/install-
frontend.html


but, before you get started, you should consult this page too:

http://rocks.npaci.edu/roll-documentation/grid/3.0/adding-the-roll.html


at the end of the process, your frontend will be configured with globus.

  - gb



From ramonjt at ucia.gov Mon Dec 29 09:08:45 2003
From: ramonjt at ucia.gov (ramonjt)
Date: Mon, 29 Dec 2003 12:08:45 -0500
Subject: [Rocks-Discuss]Rocks 3.1.0
Message-ID: <3FF05F9D.F6A122F@ucia.gov>

Folks,

   Which set of Rocks 3.1.0 downloads support Xeon Processors, "Pentium
and Athlon" or "Itanium"?

Thanks,
Ramon



From bruno at rocksclusters.org Mon Dec 29 09:31:56 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 29 Dec 2003 09:31:56 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0
In-Reply-To: <3FF05F9D.F6A122F@ucia.gov>
References: <3FF05F9D.F6A122F@ucia.gov>
Message-ID: <E60E6664-3A24-11D8-9E96-000A95C4E3B4@rocksclusters.org>

>    Which set of Rocks 3.1.0 downloads support Xeon Processors, "Pentium
> and Athlon" or "Itanium"?

xeons are x86 processors -- so you want the ISO images found under the
section:

     Software for x86 (Pentium and Athlon)

  - gb
From landman at scalableinformatics.com Mon Dec 29 10:49:49 2003
From: landman at scalableinformatics.com (landman)
Date: Mon, 29 Dec 2003 13:49:49 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
Message-ID: <20031229183225.M11961@scalableinformatics.com>

Pulled the distro. Burned it after checking md5's.     Ok.   Booted/installed test
cluster, completely vanilla, just defaults.

SSH is too slow.   Wow.   5-10 seconds to log in.

Ok, now out at a customer site with the disks.

Unhappily discovered that the following are missing:

a) md (e.g. Software RAID): Just try    to build one. Anaconda will happily let
you do this ... though it will die in   the formatting stages. Dropping into the
shell (Alt-F2) and looking for the md   module (lsmod) shows nothing. Insmod the
md also doesn't do anything. Catting    /proc/devices shows no md as a character
or block device.

If md is really not there anymore, it should be removed from anaconda, just
like ...

b) ext3.   There is no ext3 available for the install.

Also discovered how incredibly fragile anaconda is. In order to install, you
have to wipe the disks. It will not install if there is an md (software raid)
device, chosing instead to crap out after you have entered in all the
information. To say that this is annoying is a slight understatement. This is
an anaconda issue, not a ROCKS issue, though as a result of this issue, ROCKS is
 less functional than it could be.

I also noted that there is no xfs option. This means that I will need to hack
new kernels later on after the install. Moreover, I will also need to turn on
the ext3 journaling features later on (post install).

Hopefully 3.1.1 or 3.2 will fix some of these things.

Joe



--
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
phone: +1 734 612 4615



From junkscarce at hotmail.com Mon Dec 29 15:15:52 2003
From: junkscarce at hotmail.com (Reed Scarce)
Date: Mon, 29 Dec 2003 23:15:52 +0000
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
Message-ID: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com>
Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that work?
I need to know the limitations of the distribution. As far as I can tell
the commands are available (`which command`locates the commands fine) but
they don't necessarily perform the job as expected. I had seen the
`eval...` clairification in the archives.

As it stands I plan to mkdir, ln and echo in the extend-c... but then run
the heart of the customization (scripted) once the nodes are up. It just
doesn't seem to be what was intended.

As always, thanks for your help
--Reed

>From: "Mason J. Katz" <mjk at sdsc.edu>
>To: "Reed Scarce" <junkscarce at hotmail.com>
>CC: npaci-rocks-discussion at sdsc.edu
>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
>Date: Fri, 26 Dec 2003 16:46:22 -0800
>
>Not sure if this answers your question. But..
>
>The <eval></eval> blocks are for code to be run on the kickstart server
>(the one the generates the kickstart file). Code outside of the eval
>blocks is run on the kickstarting host.
>
>     -mjk
>
>
>On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote:
>
>>The line:
>>
>>chkconfig --level 3 gpm on
>>
>>works great from the command line, not in extend-compute.xml. Thanks for
>>the new tool though, always glad. The line above is in a block without
>><eval shell="bash"> tags. I'll keep trying and rtm. Is it possible this
>>is a 2.6.2 issue? The live environment restricts me from using a more
>>recent version.
>>
>>
>>>From: "Mason J. Katz" <mjk at sdsc.edu>
>>>To: "Reed Scarce" <junkscarce at hotmail.com>
>>>CC: npaci-rocks-discussion at sdsc.edu
>>>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
>>>Date: Tue, 23 Dec 2003 16:35:13 -0800
>>>
>>>"man chkconfig"
>>>
>>>If you use chkconfig you do not need to create the rc*.d/* files and they
>>>are put in place for you.
>>>
>>>   -mjk
>>>
>>>On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:
>>>
>>>>Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml
>>>>lies code like this commented code:
>>>><post>
>>>>/bin/mkdir /mnt/plc/ <-- works -->
>>>>/bin/mkdir /mnt/plc/plc_data <-- works -->
>>>>/bin/ln -s /mnt/plc_data /data1 <-- works -->
>>>>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln,
>>>>source exists -->
>>>></post>
>>>>
>>>>I don't understand why the ln to a directory succeeds but a ln to a
>>>>script fails.
>>>>
>>>>BTW, Dr. Landman, I've attempted to use your build.pl but it seems to
>>>>faill with:
>>>>Can't stat
>>>>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm .
>>>>(my note: the path ends at RPMS) I swear I thought I saw a solution to
>>>>this once but I can't find it again.
>>>>Upon reinstallation with the file your tool created
>>>>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda
>>>>threw back the exception: Traceback (innermost last): file
>>>>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch,
>>>>configFileData) File
>>>>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427
>>>>in run
>>>>ok save debug
>>>>
>>>>
>>>>TIA Reed Scarce
>>>>
>>>>_________________________________________________________________
>>>>Tired of slow downloads? Compare online deals from your local high-speed
>>>>providers now. https://broadband.msn.com
>>>
>>
>>_________________________________________________________________
>>Worried about inbox overload? Get MSN Extra Storage now!
>>http://join.msn.com/?PAGE=features/es
>

_________________________________________________________________
Make your home warm and cozy this winter with tips from MSN House & Home.
http://special.msn.com/home/warmhome.armx



From dlane at ap.stmarys.ca Mon Dec 29 15:44:23 2003
From: dlane at ap.stmarys.ca (Dave Lane)
Date: Mon, 29 Dec 2003 19:44:23 -0400
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
In-Reply-To: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com>
Message-ID: <5.2.0.9.0.20031229194312.01ed0f40@ap.stmarys.ca>

At 11:15 PM 12/29/2003 +0000, Reed Scarce wrote:
>Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that work?

Reed,

Below is a script that worked fine for me (with 2.3.2). What it does should
be fairly explanatory...Dave
--->>>

<post>
          <!-- Insert your post installation script here. This
          code will be executed on the destination node after the
          packages have been installed. Typically configuration files
          are built and services setup in this section. -->

mv /usr/local /usr/local-old
ln -s /home/local /usr/local
ln -s /home/opt/intel /opt/intel
ln -s /home/disc15 /disc15
mkdir /scratch/tmp
chmod 1777 /scratch/tmp
echo '#!/bin/bash' > /etc/init.d/wait
echo 'sleep 60' >> /etc/init.d/wait
chmod +x /etc/init.d/wait
ln -s /etc/init.d/wait /etc/rc3.d/S11wait
ln -s /etc/init.d/wait /etc/rc4.d/S11wait
ln -s /etc/init.d/wait /etc/rc5.d/S11wait

          <eval sh="python">
                  <!-- This is python code that will be executed on
                  the frontend node during kickstart generation. You
                  may contact the database, make network queries, etc.
                  These sections are generally used to help build
                  more complex configuration files.
                  The 'sh' attribute may point to any language interpreter
                  such as "bash", "perl", "ruby", etc.
                  -->
          </eval>
</post>



From bruno at rocksclusters.org Mon Dec 29 19:03:25 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 29 Dec 2003 19:03:25 -0800
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <20031229183225.M11961@scalableinformatics.com>
References: <20031229183225.M11961@scalableinformatics.com>
Message-ID: <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>

> Pulled the distro. Burned it after checking md5's.     Ok.
> Booted/installed test
> cluster, completely vanilla, just defaults.

i'm assuming this is an x86 installation, yes?

> SSH is too slow.   Wow.   5-10 seconds to log in.

that is not the case on our clusters. in fact, we tested this on all
three architectures and all three are 'fast'.

> Ok, now out at a customer site with the disks.
>
> Unhappily discovered that the following are missing:
>
>   a) md (e.g. Software RAID): Just try    to build one.   Anaconda will
>   happily let
>   you do this ... though it will die in   the formatting stages.   Dropping
>   into the
>   shell (Alt-F2) and looking for the md   module (lsmod) shows nothing.
>   Insmod the
>   md also doesn't do anything. Catting    /proc/devices shows no md as a
>   character
>   or block device.
>
>   If md is really not there anymore, it should be removed from anaconda,
>   just like ...
>
>   b) ext3.   There is no ext3 available for the install.
>
>   Also discovered how incredibly fragile anaconda is. In order to
>   install, you
>   have to wipe the disks. It will not install if there is an md
>   (software raid)
>   device, chosing instead to crap out after you have entered in all the
>   information. To say that this is annoying is a slight understatement.
>    This is
>   an anaconda issue, not a ROCKS issue, though as a result of this
>   issue, ROCKS is
>    less functional than it could be.

we'll look into the above two issues.

> I also noted that there is no xfs option.      This means that I will need
> to hack
> new kernels later on after the install.

just curious, is xfs offered as an option on other redhat supported
products?

also (and i'm assuming this will be no consolation to you, but it may
be to others), building a new kernel RPM is straightforward in rocks:

http://www.rocksclusters.org/rocks-documentation/3.1.0/customization-
kernel.html

    - gb



From landman at scalableinformatics.com Mon Dec 29 19:44:16 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 29 Dec 2003 22:44:16 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
References: <20031229183225.M11961@scalableinformatics.com>
       <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
Message-ID: <1072755856.4432.15.camel@protein.scalableinformatics.com>

On Mon, 2003-12-29 at 22:03, Greg Bruno wrote:
> > Pulled the distro. Burned it after checking md5's.        Ok.
> > Booted/installed test
> > cluster, completely vanilla, just defaults.
>
> i'm assuming this is an x86 installation, yes?

Yes.

>
> > SSH is too slow. Wow. 5-10 seconds to log in.
>
> that is not the case on our clusters. in fact, we tested this on all
> three architectures and all three are 'fast'.

2 different clusters exhibited the same results.        Fixed one by applying
dnsmasq to one of them.

>
>   >   Ok, now out at a customer site with the disks.
>   >
>   >   Unhappily discovered that the following are missing:
>   >
>   >   a) md (e.g. Software RAID): Just try    to build one.   Anaconda will
>   >   happily let
>   >   you do this ... though it will die in   the formatting stages.   Dropping
>   >   into the
>   >   shell (Alt-F2) and looking for the md   module (lsmod) shows nothing.
>   >   Insmod the
>   >   md also doesn't do anything. Catting    /proc/devices shows no md as a
>   >   character
>   >   or block device.
>   >
>   >   If md is really not there anymore, it should be removed from anaconda,
>   >   just like ...
>   >
>   >   b) ext3.   There is no ext3 available for the install.
>   >
>   >   Also discovered how incredibly fragile anaconda is. In order to
>   >   install, you
>   >   have to wipe the disks. It will not install if there is an md
>   >   (software raid)
>   >   device, chosing instead to crap out after you have entered in all the
>   >   information. To say that this is annoying is a slight understatement.
>   >    This is
>   >   an anaconda issue, not a ROCKS issue, though as a result of this
>   >   issue, ROCKS is
>   >    less functional than it could be.
>
>   we'll look into the above two issues.

Thanks

>
>   > I also noted that there is no xfs option.      This means that I will need
>   > to hack
>   > new kernels later on after the install.
>
>   just curious, is xfs offered as an option on other redhat supported
>   products?

Nope, nor will Redhat likely do this in the near/mid term. This is
fairly common knowledge. All the other major distros do offer Redhat.
I hope that the defense of the current state isn't that "Redhat doesn't
support it". I might have misunderstood you, but Redhat is almost
completely disinterested in clusters, so Redhat supporting/not
supporting it is really not relevant.

Curiously, cAos which is doing some of the similar things ROCKS is doing
in terms of recompiling packages sans Redhat trademarks, has XFS and a
number of other useful things in there.

Regardless, having ext2 or vfat as your only fs options simply is not
reasonable, as both neither of these are really appropriate for very
large disks, or big file systems.

>
>   also (and i'm assuming this will be no consolation to you, but it may
>   be to others), building a new kernel RPM is straightforward in rocks:
>
>   http://www.rocksclusters.org/rocks-documentation/3.1.0/customization-
>   kernel.html

I had been planning to use a similar approach to this. I was/am simply
quite surprised that the two options for ROCKS file systems are really
not very good, and the good choices are unavailable. In all fairness
this is more likely a constraint of anaconda than of ROCKS.

I fixed the ext2/ext3 by a reboot after a quick tune2fs session and some
fixup of the /etc/fstab.

I have to say that I get less and less impressed with anaconda as time
goes on.

I fixed the partitioning problem (anaconda dies when it runs in an md'ed
set of partitions) by wiping the disk and using knoppix to fdisk the
disks. Autopartitioning is not an option, as the default choices are
not all that good (another anaconda-ism).




>
>    - gb



From cdwan at mail.ahc.umn.edu Mon Dec 29 20:58:20 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Mon, 29 Dec 2003 22:58:20 -0600 (CST)
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <1072755856.4432.15.camel@protein.scalableinformatics.com>
References: <20031229183225.M11961@scalableinformatics.com>
 <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
 <1072755856.4432.15.camel@protein.scalableinformatics.com>
Message-ID: <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>

I also encountered the Software RAID problem today.    It made upgrading an
existing ROCKS cluster a little tricky.

Another behavior I noticed was that the CDs were not ejecting as the node
installs finished. It was managable, but required watching to prevent the
endless reinstall cycle.

-Chris Dwan


From bruno at rocksclusters.org Mon Dec 29 21:48:22 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 29 Dec 2003 21:48:22 -0800
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
References: <20031229183225.M11961@scalableinformatics.com>
<BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
<1072755856.4432.15.camel@protein.scalableinformatics.com>
<Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
Message-ID: <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>

>   Another behavior I noticed was that the CDs were not ejecting as the
>   node
>   installs finished. It was managable, but required watching to prevent
>   the
>   endless reinstall cycle.

actually, it isn't a problem as the last CD in the frontend will be a
roll and rolls are not bootable.

    - gb



From cdwan at mail.ahc.umn.edu Mon Dec 29 21:51:13 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Mon, 29 Dec 2003 23:51:13 -0600 (CST)
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
References: <20031229183225.M11961@scalableinformatics.com>
 <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
 <1072755856.4432.15.camel@protein.scalableinformatics.com>
 <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
 <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
Message-ID: <Pine.GSO.4.58.0312292350001.4644@lenti.med.umn.edu>

>   >   Another behavior I noticed was that the CDs were not ejecting as the
>   >   node
>   >   installs finished. It was managable, but required watching to prevent
>   >   the
>   >   endless reinstall cycle.
>
>   actually, it isn't a problem as the last CD in the frontend will be a
>   roll and rolls are not bootable.

You're right about the frontend. It was the compute nodes where it gave
me trouble. Roll disks never go in those.

-Chris Dwan


From landman at scalableinformatics.com Mon Dec 29 22:03:06 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 30 Dec 2003 01:03:06 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
References: <20031229183225.M11961@scalableinformatics.com>
       <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
       <1072755856.4432.15.camel@protein.scalableinformatics.com>
       <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
       <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
Message-ID: <1072764186.4469.16.camel@protein.scalableinformatics.com>

What I had noticed is that some CD hardware does not eject when
prompting for swapping in the roll. I swapped hardware and that fixed
it. Rather odd. Seen this in 3 different systems. Worked ok with
previous ROCKS.

Is it possible to do something like a

     frontend askmethod

akin to the "linux askmethod" and specifically have the ISO's online in
a directory somewhere? Just curious... I find it interesting that 10
years after swapping floppies for OS installs, I am now swapping CDs...
There is irony here somewhere.

On Tue, 2003-12-30 at 00:48, Greg Bruno wrote:
> > Another behavior I noticed was that the CDs were not ejecting as the
> > node
> > installs finished. It was managable, but required watching to prevent
> > the
> > endless reinstall cycle.
>
> actually, it isn't a problem as the last CD in the frontend will be a
> roll and rolls are not bootable.
>
>   - gb



From bruno at rocksclusters.org Mon Dec 29 22:28:45 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 29 Dec 2003 22:28:45 -0800
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <1072764186.4469.16.camel@protein.scalableinformatics.com>
References: <20031229183225.M11961@scalableinformatics.com>
<BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
<1072755856.4432.15.camel@protein.scalableinformatics.com>
<Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
<C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
<1072764186.4469.16.camel@protein.scalableinformatics.com>
Message-ID: <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org>

> Is it possible to do something like a
>
>     frontend askmethod
>
> akin to the "linux askmethod" and specifically have the ISO's online in
> a directory somewhere? Just curious...

the ability to install frontends remotely is at the top of our priority
list for the next release.
> I find it interesting that 10
> years after swapping floppies for OS installs, I am now swapping CDs...
> There is irony here somewhere.

sorry, i'm going to have to evangelize rolls a bit.

joe, do you not have just a bit of appreciation for rolls and what is
going on under the sheets? we now have a formal way for you, that's
right you, to augment the installation of a cluster. you get to
programmatically interact with the installer at virtually any level.
you get to tell the installer what bits you want it to lay down and how
to configure them. and this is done completely independently of the
core. the core has no idea of your bits, yet, it installs it and
configures it to your specification.

for you, this could be having the 'scalable informatics' roll that
contains all your RPMS and XML configuration files. this ISO image
could be completely proprietary, yet, the installer installs it. you
could ship your roll worldwide and every one of your customers would,
within 2 hours, have a scalable informatics cluster online running the
applications you sold them. and, you know it would be running because
you embedded the correct configuration into the roll.

or, perhaps rolls work so smoothly, it just looks like CD swapping. :-)

    - gb



From landman at scalableinformatics.com Mon Dec 29 22:50:30 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 30 Dec 2003 01:50:30 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org>
References: <20031229183225.M11961@scalableinformatics.com>
       <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
       <1072755856.4432.15.camel@protein.scalableinformatics.com>
       <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
       <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
       <1072764186.4469.16.camel@protein.scalableinformatics.com>
       <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org>
Message-ID: <1072767030.4463.57.camel@protein.scalableinformatics.com>

On Tue, 2003-12-30 at 01:28, Greg Bruno wrote:

>   > There is irony here somewhere.
>
>   sorry, i'm going to have to evangelize rolls a bit.
>
>   joe, do you not have just a bit of appreciation for rolls and what is
>   going on under the sheets? we now have a formal way for you, that's
>   right you, to augment the installation of a cluster. you get to
>   programmatically interact with the installer at virtually any level.
>   you get to tell the installer what bits you want it to lay down and how
>   to configure them. and this is done completely independently of the
>   core. the core has no idea of your bits, yet, it installs it and
>   configures it to your specification.
Actually I do have a pretty good appreciation for them. I see that they
are a different way of solving the problems I have been solving for a
while using "other methods"
(http://scalableinformatics.com/downloads/finishing/finishing-v3.1.0.tar.gz).   What
I don't see is how to build them (yes, I did see the "source" messages, and
"cvs", ...).

The major issue for me is going to be anaconda, all its joy and bugs,
and what directions its use forces ROCKS to follow (vis-a-vis file
systems, etc).

>
>   for you, this could be having the 'scalable informatics' roll that
>   contains all your RPMS and XML configuration files. this ISO image
>   could be completely proprietary, yet, the installer installs it. you
>   could ship your roll worldwide and every one of your customers would,
>   within 2 hours, have a scalable informatics cluster online running the
>   applications you sold them. and, you know it would be running because
>   you embedded the correct configuration into the roll.

This is a nice vision, though it is unfortunately a vision. The
customer would have to re-install the cluster head node when a new
version of the bits comes out. Right? This is simply not tenable for a
production cycle facility that needs to upgrade a package. Please let
me know if my understanding is incorrect, I would be quite happy to hear
this.

The "other method" that I developed doesn't have this as a problem.
Just re-install the compute nodes, and load the RPM on the head nodes.
In fact I built some tools which simplify both the "other method" and
the ROCKS method. As I have to worry about multiple different cluster
distros (not just ROCKS, sorry, customers get what they need/want), I
have to worry about interfacing with that distro. So I have some tools
(the auto-build scripts) which simplify adding/removing packages into
the extend-compute.xml.

What I am hoping for rolls are two things: 1) insertable/removable from
a live cluster without forcing a re-install of the head node (compute
nodes, thats fine, not the head nodes) 2) simple documentation on how to
build. If they are really quite simple, I see no reason I could not
take the same tool I use to automate the building of installable RPMS
for the other method actually emit a ROCKS roll. But I need to know how
to do this. I am not sure I have sufficient time to "read the source,
Luke" for this. I would be happy to do this given time, and customer
demand/need. The other method had that, hence its development.

>
> or, perhaps rolls work so smoothly, it just looks like CD swapping. :-)

My point was that after inserting the SGE roll, I had to get up from the
console, walk over to the unit, swap in the next roll, iterate....

Felt like CD swapping to me.

Rolls wont solve other problems which are anaconda specific (file
systems, partitioning, formatting, RAID, network detection, etc). As
there are multiple similar RHEL de-redhatifying efforts, some of which
are drastically improving the installation process (by not using
anaconda), are you folks looking to move away from anaconda any time
soon?

>
>    - gb
--



From bruno at rocksclusters.org Mon Dec 29 23:45:52 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Mon, 29 Dec 2003 23:45:52 -0800
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <1072767030.4463.57.camel@protein.scalableinformatics.com>
References: <20031229183225.M11961@scalableinformatics.com>
<BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
<1072755856.4432.15.camel@protein.scalableinformatics.com>
<Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
<C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
<1072764186.4469.16.camel@protein.scalableinformatics.com>
<6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org>
<1072767030.4463.57.camel@protein.scalableinformatics.com>
Message-ID: <31636F66-3A9C-11D8-9E96-000A95C4E3B4@rocksclusters.org>

>   This is a nice vision, though it is unfortunately a vision. The
>   customer would have to re-install the cluster head node when a new
>   version of the bits comes out. Right? This is simply not tenable for a
>   production cycle facility that needs to upgrade a package. Please let
>   me know if my understanding is incorrect, I would be quite happy to
>   hear
>   this.

we've talked about this on the list and we've talked with you about
this in person. you know the above statement is true. you also know it
is a future direction for rolls.

>   What I am hoping for rolls are two things: 1) insertable/removable from
>   a live cluster without forcing a re-install of the head node (compute
>   nodes, thats fine, not the head nodes) 2) simple documentation on how
>   to
>   build. If they are really quite simple, I see no reason I could not
>   take the same tool I use to automate the building of installable RPMS
>   for the other method actually emit a ROCKS roll. But I need to know
>   how
>   to do this. I am not sure I have sufficient time to "read the source,
>   Luke" for this. I would be happy to do this given time, and customer
>   demand/need. The other method had that, hence its development.

a roll developer's guide is in progress. and, as stated above, adding
rolls to a live frontend is on our roadmap.

> Rolls wont solve other problems which are anaconda specific (file
> systems, partitioning, formatting, RAID, network detection, etc).

not true. if you wish to get deeply involved with the red hat
installer, you can develop a 'patch' roll that will change the
installer to do as you wish.

>   As
> there are multiple similar RHEL de-redhatifying efforts, some of which
> are drastically improving the installation process (by not using
> anaconda), are you folks looking to move away from anaconda any time
> soon?

please educate us -- where can we download these installers and find
the developer guides that describe how to interact with the installer.

as for moving away from anaconda, i don't think that will happen
anytime soon. anaconda has served us well. we have all had issues with
the installer, but i would rather work with anaconda rather than
reinvent it. the boys and girls at redhat have a vested interest in
detecting and configuring the latest hardware and i plan on leveraging
that.

of the issues you mention above, the only one we don't know how to
control yet is file system selection (but, we will look into it per
your earlier request). we already manipulate anaconda to partition and
format the drives to our specifications, and we have ideas on how to
handle RAID and network naming (which is what i think you mean by
network detection).

 - gb



From landman at scalableinformatics.com Tue Dec 30 00:55:37 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 30 Dec 2003 03:55:37 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <31636F66-3A9C-11D8-9E96-000A95C4E3B4@rocksclusters.org>
References: <20031229183225.M11961@scalableinformatics.com>
       <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org>
       <1072755856.4432.15.camel@protein.scalableinformatics.com>
       <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu>
       <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org>
       <1072764186.4469.16.camel@protein.scalableinformatics.com>
       <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org>
       <1072767030.4463.57.camel@protein.scalableinformatics.com>
       <31636F66-3A9C-11D8-9E96-000A95C4E3B4@rocksclusters.org>
Message-ID: <1072774537.4463.131.camel@protein.scalableinformatics.com>

On Tue, 2003-12-30 at 02:45, Greg Bruno wrote:
> > This is a nice vision, though it is unfortunately a vision. The
> > customer would have to re-install the cluster head node when a new
> > version of the bits comes out. Right? This is simply not tenable for a
> > production cycle facility that needs to upgrade a package. Please let
> > me know if my understanding is incorrect, I would be quite happy to
> > hear
> > this.
>
> we've talked about this on the list and we've talked with you about
> this in person. you know the above statement is true. you also know it
> is a future direction for rolls.

I was simply responding to the evangelism which seemed to imply the
functionality existed today. It doesn't, and we both agree that it is
necessary. Although the vision will provide innumerable benefits ...
ROCKS is not there yet, and won't be for a while.
Thats ok though, as I have a reasonable work around for some of these
issues. And when I can insert and delete rolls live into a cluster,
I'll modify my tools to emit rolls. Until then, it is as you said, a
vision for the future.

[...]

> a roll developer's guide is in progress. and, as stated above, adding
> rolls to a live frontend is on our roadmap.

Adding and removing are needed as we have discussed.

>
>   > Rolls wont solve other problems which are anaconda specific (file
>   > systems, partitioning, formatting, RAID, network detection, etc).
>
>   not true. if you wish to get deeply involved with the red hat
>   installer, you can develop a 'patch' roll that will change the
>   installer to do as you wish.

I guess I am at a loss to understand what it is you are doing then. If
you are telling me I can hack around anaconda to my hearts content, why
do you tell me later on that ROCKS is deeply wedded to anaconda and will
not change soon? I will assume I am missing something here. Can I
replace anaconda? This is what I think you are saying. If you are
instead saying, no don't replace, just hack it, I am not sure I want to
do that. It is a very large and complex beast, with one system doing
the job of many. Jack of all trades.

More than half of the pain I have experienced deploying ROCKS is
directly attributable to anaconda. I would like to work around it. If
I can completely replace it under ROCKS this could be of interest. If I
cannot, and ROCKS will always remain closely tied to RedHat specific
technology (e.g. anaconda), that is also important to know.

>
>   >     As
>   >   there are multiple similar RHEL de-redhatifying efforts, some of which
>   >   are drastically improving the installation process (by not using
>   >   anaconda), are you folks looking to move away from anaconda any time
>   >   soon?
>
>   please educate us -- where can we download these installers and find
>   the developer guides that describe how to interact with the installer.

If you are serious about this, I would be happy to help you find more
development info and help make introductions to some of the people doing
this stuff. If you are not serious about this, thats fine too.

>   as for moving away from anaconda, i don't think that will happen
>   anytime soon. anaconda has served us well. we have all had issues with
>   the installer, but i would rather work with anaconda rather than
>   reinvent it. the boys and girls at redhat have a vested interest in
>   detecting and configuring the latest hardware and i plan on leveraging
>   that.

Knoppix makes good use of the anaconda detection routines without using
anaconda. You do not need anaconda in its entirety for the detection
routines.
While Redhat has a vested interest in making sure it detects hardware
well, the software that does it's installation has been getting more and
more fragile compared to other installation systems. Simple failures of
one item or the other in the SUSE YAST tool, or the Mandrake installer,
or for that matter, most of the non-anaconda based installers do not
force you to start over from the beginning. Stack traces are not given,
and you are not asked to debug an arcane and complex python program from
a highly limited command window. You are brought back to a well known
and well defined state, and you have a finite and non zero chance of
recovering from the failure. This is different than the anaconda
experience, where the slightest hiccup, which would be trivially
correctable given the opportunity, results in a complete failure of the
process.

This has resulted in our discovery of the RH9/RHEL fragility and
sensitivity (and lack of ability) to software raid, partitioning, and
related. This has wasted many hours of our collective time, and the
inability to use the upgrade option for those of us with software RAID
systems.

As ROCKS depends critically upon this bit of technology that you
indicate later on is so important, ROCKS happens to share in its
pitfalls, even though these are not ROCKS problems. I am not sure if
you understand how much time I have to spend explaining to customers and
end users why what they are seeing are not ROCKS problems but a Redhat
artifacts. Part of the reason I am raising this issue in this forum is
that I have spent all together too much time trying to explain this to
various users.

>   of the issues you mention above, the only one we don't know how to
>   control yet is file system selection (but, we will look into it per
>   your earlier request). we already manipulate anaconda to partition and
>   format the drives to our specifications, and we have ideas on how to
>   handle RAID and network naming (which is what i think you mean by
>   network detection).

Network detection is

a) getting the right network driver config
      1) by detection
      2) from floppy/usb/whatever

b) getting the correct network interface ordering (what you call naming)

The point you (somewhat whimsicality) made was that I could create
Scalable Informatics rolls and ship them around the world for people to
use in 2 hours. Great. Good vision, and that is something like what I
am looking at. I have that now with my tools, but I can always expand
their functionality.

Now the problem is, if after shipping out my roll, when my end users
install it, anaconda barfs in some new and exciting manner (has happened
already with the finishing scripts, and I have worked hard to try to
figure out what is broken in anaconda to work around its bugs), who are
the customers going to blame?

My experience thus far is that ROCKS is taking more than its fair share
of heat over bugs that it has nothing to do with.
From fds at sdsc.edu Tue Dec 30 05:53:48 2003
From: fds at sdsc.edu (fds at sdsc.edu)
Date: Tue, 30 Dec 2003 05:53:48 -0800 (PST)
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
In-Reply-To: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com>
References: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com>
Message-ID: <1291.194.125.171.53.1072792428.squirrel@uhura.sdsc.edu>

Code in the <post> section of an xml file (extend-compute or otherwise)
can be almost anything. When the script is run, the environment is not as
full as usual, which is why we always recommend specifying the full path
to commands. As you saw, /bin and /usr/bin are in the path, so certain
things like "which sed" will work, for example.

Remember that everything in the eval tags gets run at kickstart generation
time (on the frontend). Everything else (the naked commands in the post
section) are run by the node being installed.

We do intend for the heart of the customization to be performed at
kickstart time. I would be suprised if you had to postpone many tasks
until the node was up, although this does happen occasionally. The globus
and condor post configuration contain tasks that cannot be done at install
time.

Send us the scripts in question and we will take a look.

-Federico

>   Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that
>   work?
>   I need to know the limitations of the distribution. As far as I can tell
>   the commands are available (`which command`locates the commands fine) but
>   they don't necessarily perform the job as expected. I had seen the
>   `eval...` clairification in the archives.
>
>   As it stands I plan to mkdir, ln and echo in the extend-c... but then run
>   the heart of the customization (scripted) once the nodes are up. It just
>   doesn't seem to be what was intended.
>
>   As always, thanks for your help
>   --Reed
>



From purikk at hotmail.com Tue Dec 30 06:03:02 2003
From: purikk at hotmail.com (Purushotham Komaravolu)
Date: Tue, 30 Dec 2003 09:03:02 -0500
Subject: [Rocks-Discuss]Licensing
References: <200312300711.hBU7BeJ14002@postal.sdsc.edu>
Message-ID: <BAY1-DAV14HJL2WZcXm0000fc27@hotmail.com>

Hi All,
          I would like to know the list of    the components that have to be
licensed, when we install ROCKS as a commercial solution.
Thanks
Happy Holidays
Puru


From doug at seismo.berkeley.edu Tue Dec 30 10:53:36 2003
From: doug at seismo.berkeley.edu (Doug Neuhauser)
Date: Tue, 30 Dec 2003 10:53:36 -0800 (PST)
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
Message-ID: <200312301853.hBUIragp015469@perry.geo.berkeley.edu>

I am having a problem upgrading Rocks 2.3.2 to 3.1.0.
Both my head node and compute nodes are dual XEON 2.4 GHz boxes.

We burned the CDs from the following images:
      rocks-base-3.1.0.i386.iso
      roll-hpc-3.1.0-0.i386.iso
      roll-grid-3.1.0-0.any.iso
      roll-intel-3.1.0-0.any.iso
      roll-sge-3.1.0-0.any.iso
I verified the md5s both on the downloaded images from the rocks
web site and the md5s on the burned cds. They are fine.
I have run the upgrade several times -- at least once with all of the
rolls, and once with nust the rocks base and hpc roll.

I head node installs with no problem using the command
      frontend upgrade
I can login and run insert-ethers, telling it to look for compute nodes.

When I power on a compute node, it boots grub, selects the only
kernel on its local disk
      Rocks Reinstall
and runs through the /sbin/loader.
The blue screen comes up, the compute node requests and receives a
dynamic IP address from the head node, but then within a few seconds
aborts with the messages:
      install exited abnormally - received signal 11
      sending termination signals ... done
      sending kill signals ... done
      disabling swap ...
      unmounting filesystems ...
      /proc/bus/usb done
      /proc done
      /dev/pts done
      You may safely reboot your system

It appears the the "Rocks Reinstall" kernel on the disk is not compatible
with Rocks 3.1.0. When I changed the compute node boot order to perform
a PXE boot before the hard disks, it properly downloads the 3.1.0 kernel
from the head node, reformats the disk, and installes 3.1.0 properly.
I have to catch it in the reboot, and change the boot order to use the
disk before PXE, or I get into an infinite loop.

Is there any better way to address this problem? The procedure of:
      set PXE boot first
      boot from net, install rocks 3.1.0 on disk
      reboot
      catch node during reboot, change boot order to floppy,disk,net
reboot
for each node is tedious.

Did I do something wrong in how I shut my 2.3.2 cluster down before the
upgrade? If so, some notes about this in the install instructions would
be useful.

- Doug N

------------------------------------------------------------------------
Doug Neuhauser                University of California, Berkeley
doug at seismo.berkeley.edu   Berkeley Seismological Laboratory
Phone: 510-642-0931           215 McCone Hall # 4760
Fax:   510-643-5811           Berkeley, CA 94720-4760



From bruno at rocksclusters.org Tue Dec 30 11:29:14 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 30 Dec 2003 11:29:14 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <200312301853.hBUIragp015469@perry.geo.berkeley.edu>
References: <200312301853.hBUIragp015469@perry.geo.berkeley.edu>
Message-ID: <73E3933E-3AFE-11D8-9E96-000A95C4E3B4@rocksclusters.org>

On Dec 30, 2003, at 10:53 AM, Doug Neuhauser wrote:

>   I am having a problem upgrading Rocks 2.3.2 to 3.1.0.
>   Both my head node and compute nodes are dual XEON 2.4 GHz boxes.
>
>   We burned the CDs from the following images:
>       rocks-base-3.1.0.i386.iso
>       roll-hpc-3.1.0-0.i386.iso
>       roll-grid-3.1.0-0.any.iso
>       roll-intel-3.1.0-0.any.iso
>       roll-sge-3.1.0-0.any.iso
>   I verified the md5s both on the downloaded images from the rocks
>   web site and the md5s on the burned cds. They are fine.
>   I have run the upgrade several times -- at least once with all of the
>   rolls, and once with nust the rocks base and hpc roll.
>
>   I head node installs with no problem using the command
>       frontend upgrade
>   I can login and run insert-ethers, telling it to look for compute
>   nodes.
>
>   When I power on a compute node, it boots grub, selects the only
>   kernel on its local disk
>       Rocks Reinstall
>   and runs through the /sbin/loader.
>   The blue screen comes up, the compute node requests and receives a
>   dynamic IP address from the head node, but then within a few seconds
>   aborts with the messages:
>       install exited abnormally - received signal 11
>       sending termination signals ... done
>       sending kill signals ... done
>       disabling swap ...
>       unmounting filesystems ...
>       /proc/bus/usb done
>      /proc done
>      /dev/pts done
>      You may safely reboot your system
>
>   It appears the the "Rocks Reinstall" kernel on the disk is not
>   compatible
>   with Rocks 3.1.0. When I changed the compute node boot order to
>   perform
>   a PXE boot before the hard disks, it properly downloads the 3.1.0
>   kernel
>   from the head node, reformats the disk, and installes 3.1.0 properly.
>   I have to catch it in the reboot, and change the boot order to use the
>   disk before PXE, or I get into an infinite loop.
>
>   Is there any better way to address this problem? The procedure of:
>       set PXE boot first
>       boot from net, install rocks 3.1.0 on disk
>       reboot
>       catch node during reboot, change boot order to floppy,disk,net
>       reboot
>   for each node is tedious.
>
>   Did I do something wrong in how I shut my 2.3.2 cluster down before the
>   upgrade? If so, some notes about this in the install instructions
>   would
>   be useful.

your right, the 2.3.2 installer (anaconda from redhat's version 7.3) is
not compatible with the installer on rocks 3.1 (anaconda from redhat's
enterprise linux 3.0).

the way you will have to reinstall your cluster is one of two ways:

1) if your compute nodes support PXE that is enabled from the keyboard
-- that is, when you boot the node, in BIOS you see a message that says
"Press F12 for Network Boot (PXE)". if your nodes have that, then
you'll have to boot the nodes, one by one and, when you see the
message, press the F12 key, then move to the next node.

2) use the rocks base CD to boot each compute node. when insert-ethers
reports that it discovered the node, take the CD out and put it in the
next compute node.


but, if your compute nodes were initially installed with PXE, the
fastest way to upgrade the compute nodes is to simply turn all the
compute nodes off, upgrade the frontend, run insert-ethers, then turn
the compute nodes on one by one. the compute nodes should be set for
PXE boot which will pull the installer from the frontend and therefore
be updated installer.

as you state above, we need to document this.

thanks for the bug report.

    - gb
From doug at seismo.berkeley.edu Tue Dec 30 11:45:59 2003
From: doug at seismo.berkeley.edu (Doug Neuhauser)
Date: Tue, 30 Dec 2003 11:45:59 -0800 (PST)
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
Message-ID: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>

Greg,

1.   I don't have cdroms on my compute nodes, only floppy. :(
2.   My boot order on the compute nodes is normally:
       floppy, disk, PXE
3.   I don't have a hot-key override to force PXE boot.
     I have to change the BIOS boot order to enable PXE boot.

>   but, if your compute nodes were initially installed with PXE, the
>   fastest way to upgrade the compute nodes is to simply turn all the
>   compute nodes off, upgrade the frontend, run insert-ethers, then turn
>   the compute nodes on one by one. the compute nodes should be set for
>   PXE boot which will pull the installer from the frontend and therefore
>   be updated installer.

I don't understand this.

I can't leave the compute nodes with PXE boot first, or it will create an
endless loop. The compute node will boot via PXE, install rocks 3.1.0,
and then reboot via PXE and repeat the process ad-nauseum.

Can I use the old floppy boot image found at:
      ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img
to force a network boot?

The 3.1.0 online manual has a link in the section
      1.3 Install your Compute Nodes
to    ftp://www.rocksclusters.org/pub/rocks/bootnet.img
but this does not exist.

- Doug N
------------------------------------------------------------------------
Doug Neuhauser                University of California, Berkeley
doug at seismo.berkeley.edu   Berkeley Seismological Laboratory
Phone: 510-642-0931           215 McCone Hall # 4760
Fax:   510-643-5811           Berkeley, CA 94720-4760



From junkscarce at hotmail.com Tue Dec 30 11:57:16 2003
From: junkscarce at hotmail.com (Reed Scarce)
Date: Tue, 30 Dec 2003 19:57:16 +0000
Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
Message-ID: <BAY1-F39DSuCcN0o41B0005872e@hotmail.com>

I tested your echo ... wait and ln wait...   S11wait lines. They worked
perfectly. Then I tried the same with gpm    and left wait in the script.
Wait worked as before, and gpm didn't work   - like before. I've given up on
doing anything very fancy and have started   to make a script to run the first
time it boots, with hand removal.

Thanks for the perspective,
--Reed
>From: Dave Lane <dlane at ap.stmarys.ca>
>To: "Reed Scarce" <junkscarce at hotmail.com>
>CC: npaci-rocks-discussion at sdsc.edu
>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails
>Date: Mon, 29 Dec 2003 19:44:23 -0400
>
>At 11:15 PM 12/29/2003 +0000, Reed Scarce wrote:
>>Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that
>>work?
>
>Reed,
>
>Below is a script that worked fine for me (with 2.3.2). What it does should
>be fairly explanatory...Dave
>
>--->>>
>
><post>
>         <!-- Insert your post installation script here. This
>         code will be executed on the destination node after the
>         packages have been installed. Typically configuration files
>         are built and services setup in this section. -->
>
>mv /usr/local /usr/local-old
>ln -s /home/local /usr/local
>ln -s /home/opt/intel /opt/intel
>ln -s /home/disc15 /disc15
>mkdir /scratch/tmp
>chmod 1777 /scratch/tmp
>echo '#!/bin/bash' > /etc/init.d/wait
>echo 'sleep 60' >> /etc/init.d/wait
>chmod +x /etc/init.d/wait
>ln -s /etc/init.d/wait /etc/rc3.d/S11wait
>ln -s /etc/init.d/wait /etc/rc4.d/S11wait
>ln -s /etc/init.d/wait /etc/rc5.d/S11wait
>
>         <eval sh="python">
>                 <!-- This is python code that will be executed on
>                 the frontend node during kickstart generation. You
>                 may contact the database, make network queries, etc.
>                 These sections are generally used to help build
>                 more complex configuration files.
>                 The 'sh' attribute may point to any language interpreter
>                 such as "bash", "perl", "ruby", etc.
>                 -->
>         </eval>
></post>
>

_________________________________________________________________
Get reliable dial-up Internet access now with our limited-time introductory
offer. http://join.msn.com/?page=dept/dialup



From landman at scalableinformatics.com Tue Dec 30 12:01:44 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 30 Dec 2003 15:01:44 -0500
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
Message-ID: <1072814503.4469.196.camel@protein.scalableinformatics.com>

Hi Doug:

  As long as pxe is in there, you should be able to do this
(semi)-automatically. All you need to do is to wipe the partition
tables and boot sectors of the compute nodes. I seem to remember a
really simply single floppy that did this.

  See http://paud.sourceforge.net/
and http://dban.sourceforge.net/

  I think dban is the right one.   After that (only on compute nodes) you
should be able to pxe boot.

Joe

On Tue, 2003-12-30 at 14:45, Doug Neuhauser wrote:
> Greg,
>
> 1. I don't have cdroms on my compute nodes, only floppy. :(
> 2. My boot order on the compute nodes is normally:
>     floppy, disk, PXE
> 3. I don't have a hot-key override to force PXE boot.
>      I have to change the BIOS boot order to enable PXE boot.
>
> > but, if your compute nodes were initially installed with PXE, the
> > fastest way to upgrade the compute nodes is to simply turn all the
> > compute nodes off, upgrade the frontend, run insert-ethers, then turn
> > the compute nodes on one by one. the compute nodes should be set for
> > PXE boot which will pull the installer from the frontend and therefore
> > be updated installer.
>
> I don't understand this.
>
> I can't leave the compute nodes with PXE boot first, or it will create an
> endless loop. The compute node will boot via PXE, install rocks 3.1.0,
> and then reboot via PXE and repeat the process ad-nauseum.
>
> Can I use the old floppy boot image found at:
>     ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img
> to force a network boot?
>
> The 3.1.0 online manual has a link in the section
>     1.3 Install your Compute Nodes
> to ftp://www.rocksclusters.org/pub/rocks/bootnet.img
> but this does not exist.
>
> - Doug N
> ------------------------------------------------------------------------
> Doug Neuhauser               University of California, Berkeley
> doug at seismo.berkeley.edu        Berkeley Seismological Laboratory
> Phone: 510-642-0931          215 McCone Hall # 4760
> Fax:    510-643-5811         Berkeley, CA 94720-4760
From bruno at rocksclusters.org Tue Dec 30 12:07:34 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 30 Dec 2003 12:07:34 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
Message-ID: <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org>

On Dec 30, 2003, at 11:45 AM, Doug Neuhauser wrote:

> Greg,
>
> 1. I don't have cdroms on my compute nodes, only floppy. :(
> 2. My boot order on the compute nodes is normally:
>     floppy, disk, PXE
> 3. I don't have a hot-key override to force PXE boot.
>     I have to change the BIOS boot order to enable PXE boot.
>
>> but, if your compute nodes were initially installed with PXE, the
>> fastest way to upgrade the compute nodes is to simply turn all the
>> compute nodes off, upgrade the frontend, run insert-ethers, then turn
>> the compute nodes on one by one. the compute nodes should be set for
>> PXE boot which will pull the installer from the frontend and therefore
>> be updated installer.
>
> I don't understand this.

i'll try to a better explanation.

when compute nodes are installed via PXE, rocks detects this and
manipulates the boot sector of the disk drive on the compute node that
makes the disk non-bootable. that way, if the compute node is reset, it
will try to PXE boot. it will PXE boot even if your boot order is: hard
disk, cd/floppy, PXE. this occurs because the hard disk is non-bootable
so the BIOS boot loader will skip the hard disk and move on to the
other boot devices.

>   I can't leave the compute nodes with PXE boot first, or it will create
>   an
>   endless loop. The compute node will boot via PXE, install rocks 3.1.0,
>   and then reboot via PXE and repeat the process ad-nauseum.
>
>   Can I use the old floppy boot image found at:
>       ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img
>   to force a network boot?
>
>   The 3.1.0 online manual has a link in the section
>       1.3 Install your Compute Nodes
>   to ftp://www.rocksclusters.org/pub/rocks/bootnet.img
>   but this does not exist.

we are no longer supporting the boot floppy as it was problematic to
make one that contained the appropriate device drivers that worked on
most compute nodes.

    - gb
From doug at seismo.berkeley.edu Tue Dec 30 12:28:46 2003
From: doug at seismo.berkeley.edu (Doug Neuhauser)
Date: Tue, 30 Dec 2003 12:28:46 -0800 (PST)
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
Message-ID: <200312302028.hBUKSkgp017318@perry.geo.berkeley.edu>

Greg,

Thanks for the detailed boot/reboot explaination. My problem dates
back to my intitial rocks 2.3.2 installation. My compute node
motherboards have 3 ethernet interfaces (1 100Mb, 2 1Gb), but initially
only the 100 Mb supported PXE. When I used that for PXE boot, Linux
would then remap the interfaces so that it tried to use one of the Gbit
interfaces on the next reboot. Needless to say, the head node did not
respond to DHCP because the MAC address was unknown to it.

My solution was to get a new BIOS from Tyan that supported PXE on
all interfaces. However, since my cluster was initially installed using
the boot floppy, my compute nodes have the vestiges of floppy boot config,
not PXE boot config.

I'll try Joe Landman's suggestion of a scrub floppy to scrub the boot
sector of the boot disk on the compute nodes. If I can't do that, I
CAN go through the manual process of setting and resetting the boot
order on each compute node, but it is a slow and sequential process.

- Doug N
------------------------------------------------------------------------
Doug Neuhauser                University of California, Berkeley
doug at seismo.berkeley.edu   Berkeley Seismological Laboratory
Phone: 510-642-0931           215 McCone Hall # 4760
Fax:   510-643-5811           Berkeley, CA 94720-4760



From sjenks at uci.edu Tue Dec 30 12:37:26 2003
From: sjenks at uci.edu (Stephen Jenks)
Date: Tue, 30 Dec 2003 12:37:26 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org>
References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
<CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org>
Message-ID: <FAA5CF63-3B07-11D8-BF62-000A95B96C68@uci.edu>

On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote:
> when compute nodes are installed via PXE, rocks detects this and
> manipulates the boot sector of the disk drive on the compute node that
> makes the disk non-bootable. that way, if the compute node is reset,
> it will try to PXE boot. it will PXE boot even if your boot order is:
> hard disk, cd/floppy, PXE. this occurs because the hard disk is
> non-bootable so the BIOS boot loader will skip the hard disk and move
> on to the other boot devices.

Hi Greg, et al.

Is there any way to force this behavior even if I initially used a CD
to install the compute nodes? My nodes are capable of PXE boot, but
since I didn't use that, I presume they didn't do the non-bootable disk
trick upon install. Now that I'm clear about how the PXE install works,
I'd prefer to move to that, but don't really want to have to corrupt
the disks to cause the PXE install.

The nodes are currently loaded with 3.0, so perhaps that will work with
3.1's kickstart, but I'm curious about the PXE issue.

Thanks,

Steve Jenks



From bruno at rocksclusters.org Tue Dec 30 12:48:08 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 30 Dec 2003 12:48:08 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <FAA5CF63-3B07-11D8-BF62-000A95B96C68@uci.edu>
References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
<CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> <FAA5CF63-3B07-11D8-
BF62-000A95B96C68@uci.edu>
Message-ID: <796DAC35-3B09-11D8-9E96-000A95C4E3B4@rocksclusters.org>

On Dec 30, 2003, at 12:37 PM, Stephen Jenks wrote:

>
> On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote:
>> when compute nodes are installed via PXE, rocks detects this and
>> manipulates the boot sector of the disk drive on the compute node
>> that makes the disk non-bootable. that way, if the compute node is
>> reset, it will try to PXE boot. it will PXE boot even if your boot
>> order is: hard disk, cd/floppy, PXE. this occurs because the hard
>> disk is non-bootable so the BIOS boot loader will skip the hard disk
>> and move on to the other boot devices.
>
> Hi Greg, et al.
>
> Is there any way to force this behavior even if I initially used a CD
> to install the compute nodes? My nodes are capable of PXE boot, but
> since I didn't use that, I presume they didn't do the non-bootable
> disk trick upon install. Now that I'm clear about how the PXE install
> works, I'd prefer to move to that, but don't really want to have to
> corrupt the disks to cause the PXE install.
>
> The nodes are currently loaded with 3.0, so perhaps that will work
> with 3.1's kickstart, but I'm curious about the PXE issue.

3.0 is based on redhat 7.3 and 3.1 is based on redhat enterprise linux
3.0 -- so you'll hit a similar problem as doug did when you perform an
upgrade.

give me a bit of time to cook up a procedure for forcing your compute
nodes to PXE boot.

  - gb
From cdwan at mail.ahc.umn.edu Tue Dec 30 14:22:18 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Tue, 30 Dec 2003 16:22:18 -0600 (CST)
Subject: [Rocks-Discuss]NIS outside, 411 inside?
Message-ID: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>

Is there a preferred way to have the 411 server on the head node replicate
information (passwd and auto.whatever) from an external NIS server to the
compute nodes? It seems to me that a cron job like the one below does the
trick, but it feels crufty to me:

  ypcat passwd > yp.passwd;
  cat /etc/passwd yp.passwd > 411.passwd
  ** build the 411 distributed passwd from the file above instead of
  ** /etc/passwd.

I'd love to hear suggestions for a more elegant solution.

-Chris Dwan
 The University of Minnesota




From bruno at rocksclusters.org Tue Dec 30 15:16:36 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 30 Dec 2003 15:16:36 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <FAA5CF63-3B07-11D8-BF62-000A95B96C68@uci.edu>
References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu>
<CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> <FAA5CF63-3B07-11D8-
BF62-000A95B96C68@uci.edu>
Message-ID: <3737B584-3B1E-11D8-9E96-000A95C4E3B4@rocksclusters.org>

On Dec 30, 2003, at 12:37 PM, Stephen Jenks wrote:

>
> On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote:
>> when compute nodes are installed via PXE, rocks detects this and
>> manipulates the boot sector of the disk drive on the compute node
>> that makes the disk non-bootable. that way, if the compute node is
>> reset, it will try to PXE boot. it will PXE boot even if your boot
>> order is: hard disk, cd/floppy, PXE. this occurs because the hard
>> disk is non-bootable so the BIOS boot loader will skip the hard disk
>> and move on to the other boot devices.
>
> Hi Greg, et al.
>
> Is there any way to force this behavior even if I initially used a CD
> to install the compute nodes? My nodes are capable of PXE boot, but
> since I didn't use that, I presume they didn't do the non-bootable
> disk trick upon install. Now that I'm clear about how the PXE install
> works, I'd prefer to move to that, but don't really want to have to
> corrupt the disks to cause the PXE install.
>
> The nodes are currently loaded with 3.0, so perhaps that will work
> with 3.1's kickstart, but I'm curious about the PXE issue.

here's a procedure to ensure that your non-3.1.0 compute nodes PXE
install after a frontend upgrade.

this assumes your compute nodes support PXE installs.

before you upgrade the frontend, login to the frontend and execute:

     # ssh-agent $SHELL
     # ssh-add

     # cluster-fork 'touch /boot/grub/pxe-install'

     # cluster-fork '/boot/kickstart/cluster-kickstart --start'

     # cluster-fork '/sbin/chkconfig --del rocks-grub'

now you can shutdown your compute nodes.


then upgrade your frontend.

after you login to your new frontend, run insert-ethers, then reset
each compute node, one at a time.


doug, you'll have a bit harder time.

if you can find a bootable floppy, after the compute node boots, you
can chroot to the root partition on the disk and run the three
cluster-fork commands above.

i apologize for making this procedure tough on you.

  - gb



From mjk at sdsc.edu Tue Dec 30 15:32:20 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 30 Dec 2003 15:32:20 -0800
Subject: [Rocks-Discuss]Licensing
In-Reply-To: <BAY1-DAV14HJL2WZcXm0000fc27@hotmail.com>
References: <200312300711.hBU7BeJ14002@postal.sdsc.edu> <BAY1-
DAV14HJL2WZcXm0000fc27@hotmail.com>
Message-ID: <69879D2D-3B20-11D8-98D0-000A95DA5638@sdsc.edu>

Nothing!

Rocks is entirely open source with various GNU, BSD, Artistic, etc open
source licenses attached. The underlying RedHat OS (as of Rocks 3.1.0
-- available now) is recompiled from RedHat's publicly available SRPMS.
  You of course welcome to send us money and hardware to help further
the causes. Several vendor do in fact do this, and this helps us
support them.

     -mjk

On Dec 30, 2003, at 6:03 AM, Purushotham Komaravolu wrote:

> Hi All,
>            I would like to know the list of   the components that have
>   to be
>   licensed, when we install ROCKS as a commercial solution.
>   Thanks
>   Happy Holidays
>   Puru



From mjk at sdsc.edu Tue Dec 30 15:35:39 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 30 Dec 2003 15:35:39 -0800
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
Message-ID: <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>

As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in
some ways this job is simpler now, although no one has done this yet.
A simple ypcat like you have will do most of the right thing and 411
will pick up the changed and send them around the cluster. But, you
need to figure out how to merge the cluster information with the
external NIS information. This will include things like the IP address
for the cluster compute nodes.

       -mjk

On Dec 30, 2003, at 2:22 PM, Chris Dwan (CCGB) wrote:

>
>   Is there a preferred way to have the 411 server on the head node
>   replicate
>   information (passwd and auto.whatever) from an external NIS server to
>   the
>   compute nodes? It seems to me that a cron job like the one below does
>   the
>   trick, but it feels crufty to me:
>
>    ypcat passwd > yp.passwd;
>    cat /etc/passwd yp.passwd > 411.passwd
>    ** build the 411 distributed passwd from the file above instead of
>    ** /etc/passwd.
>
>   I'd love to hear suggestions for a more elegant solution.
>
>   -Chris Dwan
>    The University of Minnesota
>



From mitchskin at comcast.net Tue Dec 30 17:13:44 2003
From: mitchskin at comcast.net (Mitchell Skinner)
Date: Tue, 30 Dec 2003 17:13:44 -0800
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <200312302028.hBUKSkgp017318@perry.geo.berkeley.edu>
References: <200312302028.hBUKSkgp017318@perry.geo.berkeley.edu>
Message-ID: <1072833146.8645.1114.camel@zeitgeist>
On Tue, 2003-12-30 at 12:28, Doug Neuhauser wrote:
> I'll try Joe Landman's suggestion of a scrub floppy to scrub the boot
> sector of the boot disk on the compute nodes. If I can't do that, I
> CAN go through the manual process of setting and resetting the boot
> order on each compute node, but it is a slow and sequential process.

Something I'm going to try and implement at our site is support for the
pxelinux 'localboot' option. If the hard drives have a valid boot
sector, I can leave the BIOS set to PXE boot before the hard drive, and
by changing the pxelinux configuration on the head node, I can set a
particular node to boot from the network or from the local disk. In
other words, when a node PXE boots, it might get either the kickstart
instructions or the 'boot from hard drive' instructions.

That will take some fiddling, I think, because the head node then has to
maintain some more state for all of the compute nodes. I really want to
avoid going through the BIOS setup on all my nodes more than once,
though.

Is this something that the ROCKS mainline would be interested in?

Mitch



From doug at seismo.berkeley.edu Tue Dec 30 17:51:49 2003
From: doug at seismo.berkeley.edu (Doug Neuhauser)
Date: Tue, 30 Dec 2003 17:51:49 -0800 (PST)
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
Message-ID: <200312310151.hBV1pngp026060@perry.geo.berkeley.edu>

My solution to force PXE boot is outlined below.

1.   Boot dban floppy (floppy image at http://dban.sourceforge.net/ ).

2.   Run "quick" purge of disks on system (I only have 1 disk on compute nodes).
     I let the disk purge get far enough into the disk to overwrite the boot
     sectors and filesystem -- I didn't wait for it to completely erase the
     entire disk.

3.   Reset the system, and CYCLE POWER on the compute node.

     NOTE: If you don't cycle power, the BIOS sees the disk, but reports that
     it has a fatal error reading from it. This caused the following problems:
     a. PXE boot worked, but Rocks install also did not see the disk.
       It asked whether you want to manually configure the disk, but
       the configuration failed immeditately irregardless of whether I
       answered yes or no. The Rocks developers may want to look into
       this bug.
     b. By the time that I figured out that I needed to cycle power,
       the BIOS had already removed the disk from the boot order.
       My boot order was now:
             floppy, PXE, disk
       Rocks installed properly once, twice, .... until I reset the boot
       order to:
             floppy, disk, PXE.

4.   Compute node will now perform PXE boot, install Rocks 3.1.0, and
     subsequent "controlled reboots" will boot from disk. If the node
is powered down or reset with reset button, no boot block is left
      on disk, and the system will perform PXE boot and reinstall Rocks.

------------------------------------------------------------------------
Doug Neuhauser                University of California, Berkeley
doug at seismo.berkeley.edu   Berkeley Seismological Laboratory
Phone: 510-642-0931           215 McCone Hall # 4760
Fax:   510-643-5811           Berkeley, CA 94720-4760



From tim.carlson at pnl.gov Tue Dec 30 19:17:11 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Tue, 30 Dec 2003 19:17:11 -0800 (PST)
Subject: [Rocks-Discuss]Rocks 3.1.0 install problems
In-Reply-To: <200312310151.hBV1pngp026060@perry.geo.berkeley.edu>
Message-ID: <Pine.GSO.4.44.0312301914310.23660-100000@poincare.emsl.pnl.gov>

On Tue, 30 Dec 2003, Doug Neuhauser wrote:

> 2.   Run "quick" purge of disks on system (I only have 1 disk on compute nodes).
>      I let the disk purge get far enough into the disk to overwrite the boot
>      sectors and filesystem -- I didn't wait for it to completely erase the
>      entire disk.

Here is something that is a bit quicker

cluster-fork dd if=/dev/zero of=/dev/hda bs=1k count=512

Then either power cycle or

cluster-fork reboot

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support



From cdwan at mail.ahc.umn.edu Tue Dec 30 19:44:11 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Tue, 30 Dec 2003 21:44:11 -0600 (CST)
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
 <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>
Message-ID: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>

>   As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in
>   some ways this job is simpler now, although no one has done this yet.
>   A simple ypcat like you have will do most of the right thing and 411
>   will pick up the changed and send them around the cluster. But, you
>   need to figure out how to merge the cluster information with the
>   external NIS information. This will include things like the IP address
>   for the cluster compute nodes.
The shuffling below would work, I think, but it still gives me the
willies to be mucking with the passwd file every hour:

 mv /etc/passwd /etc/passwd.local
 ypcat /etc/passwd > /etc/passwd.nis
 cat /etc/passwd.local /etc/passwd.nis > /etc/passwd
 service 411 commit
 cp /etc/passwd.local /etc/passwd

Am I missing the simple way? I seem to have an affinity for finding the
maximially complex way to do things...

-Chris Dwan
 The University of Minnesota


From mjk at sdsc.edu Tue Dec 30 19:58:43 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Tue, 30 Dec 2003 19:58:43 -0800
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
<E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>
<Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
Message-ID: <A04191F8-3B45-11D8-98D0-000A95DA5638@sdsc.edu>

This sounds reasonable, but you still have a chance of conflicting UIDs
in your password file. If you only issues accounts from your LAN NIS
server than you should be fine. I'd suggest adding the accounts
created by Rocks into your server (just look at the initial passwd
file). The SGE roll creates an SGE user, others may also exist.

You can also try setting up your frontend as an NIS client of your
external server, with the same UID issues above.

The bad news is we don't have a canned answer, and need someone to give
us one. The good news is with 411 in place only the frontend need be
changed and the compute node will still function as stock Rocks.

     -mjk


On Dec 30, 2003, at 7:44 PM, Chris Dwan (CCGB) wrote:

>
>> As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in
>> some ways this job is simpler now, although no one has done this yet.
>> A simple ypcat like you have will do most of the right thing and 411
>> will pick up the changed and send them around the cluster. But, you
>> need to figure out how to merge the cluster information with the
>> external NIS information. This will include things like the IP
>> address
>> for the cluster compute nodes.
>
> The shuffling below would work, I think, but it still gives me the
> willies to be mucking with the passwd file every hour:
>
>   mv /etc/passwd /etc/passwd.local
>   ypcat /etc/passwd > /etc/passwd.nis
>    cat /etc/passwd.local /etc/passwd.nis > /etc/passwd
>    service 411 commit
>    cp /etc/passwd.local /etc/passwd
>
>   Am I missing the simple way? I seem to have an affinity for finding
>   the
>   maximially complex way to do things...
>
>   -Chris Dwan
>    The University of Minnesota



From csamuel at vpac.org Tue Dec 30 19:59:51 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 31 Dec 2003 14:59:51 +1100
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
<E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>
<Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
Message-ID: <200312311459.54054.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 31 Dec 2003 02:44 pm, Chris Dwan (CCGB) wrote:

>    mv /etc/passwd /etc/passwd.local
>    ypcat /etc/passwd > /etc/passwd.nis
>    cat /etc/passwd.local /etc/passwd.nis > /etc/passwd
>    service 411 commit
>    cp /etc/passwd.local /etc/passwd

Hmm, how about:

ypcat passwd > /etc/passwd.nis
cat /etc/passwd /etc/passwd.nis > /etc/passwd.tmp
cp /etc/passwd /etc/passwd.local
mv /etc/passwd.tmp /etc/passwd
service 411 commit
mv /etc/passwd.local /etc/passwd

That should mean that you're never operating without a password file and the
overwrites should be approaching atomic (I hope).

Of course, it'd be nice if you could do whatever the 411 init file does on
something else than /etc/passwd :-)

Disclaimer: I have not tried this myself & don't (yet) have a 3.1 system to
test with, caveat emptor, batteries not includeded, IANAL, etc..

cheers!
Chris
- --
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/8km3O2KABBYQAh8RAnpPAJ9a9oRdGXeBUBAokdX6wmwrVbgXkQCeKD0C
xh8eT6qTbZpxhu8+FHPSt90=
=lhiY
-----END PGP SIGNATURE-----



From csamuel at vpac.org Tue Dec 30 20:01:39 2003
From: csamuel at vpac.org (Chris Samuel)
Date: Wed, 31 Dec 2003 15:01:39 +1100
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <200312311459.54054.csamuel@vpac.org>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
<Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
<200312311459.54054.csamuel@vpac.org>
Message-ID: <200312311501.43675.csamuel@vpac.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 31 Dec 2003 02:59 pm, Chris Samuel wrote:

> cp /etc/passwd /etc/passwd.local

should be:

cp -p /etc/passwd /etc/passwd.local

Oh, and what happens if users overlap ? :-)

cheers,
Chris
- --
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/8kojO2KABBYQAh8RAmWTAJwNhpm77IclXcWLoAuhp2/B4/GsCgCfZWek
me3Lk2I7VDmRj4ygTSLSaaY=
=Pv8G
-----END PGP SIGNATURE-----



From cdwan at mail.ahc.umn.edu Tue Dec 30 20:12:34 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Tue, 30 Dec 2003 22:12:34 -0600 (CST)
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <200312311459.54054.csamuel@vpac.org>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
 <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>
 <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
<200312311459.54054.csamuel@vpac.org>
Message-ID: <Pine.GSO.4.58.0312302206370.25976@lenti.med.umn.edu>

> Of course, it'd be nice if you could do whatever the 411 init file does on
> something else than /etc/passwd :-)

That would be a really big step.   I'm deeply wary of cron jobs that
overwrite my passwd file.

The next step might be to put this functionality into 411 itself. it
would be truly cool to have an automatic, non NIS way to make the passwd,
group, autofs, and host lookup stuff be consistent and static across the
cluster nodes.

On the other hand, I appreciate that this is probably a complex enough
system without trying to reinvent NIS but leave out the brittle server
bits. We can work around for the time being.

-Chris Dwan


From doug at seismo.berkeley.edu Tue Dec 30 20:34:25 2003
From: doug at seismo.berkeley.edu (Doug Neuhauser)
Date: Tue, 30 Dec 2003 20:34:25 -0800 (PST)
Subject: [Rocks-Discuss]Mozilla / ssh DISPLAY problem with Rocks 3.1.0
Message-ID: <200312310434.hBV4YPgp028521@perry.geo.berkeley.edu>

I am having a problem using mozilla with the default Rocks monitor web page
over an ssh session to my headnode from a Sun workstation with a 24-bit
display. My workstation is Sun Blade 150 running Solaris 8, and I am
using SSH Secure Shell 3.2.5 (non-commercial version).

When I ssh to my frontend and to run mozilla, I get an empty Mozilla frame.
Running mozilla with debugging options "--g-fatal-warnings" I get:

Gdk-WARNING **: Attempt to draw a drawable with depth 24 to a drawable with
depth 8
aborting...

xwinfino shows the following window characteristics:

xwininfo: Window id: 0x9400034 "GCLCluster Cluster - Mozilla"

  Absolute upper-left X: 175
  Absolute upper-left Y: 150
  Relative upper-left X: 0
  Relative upper-left Y: 0
  Width: 1021
  Height: 738
  Depth: 8
  Visual Class: PseudoColor
  Border width: 0
  Class: InputOutput
  Colormap: 0x22 (installed)
  Bit Gravity State: NorthWestGravity
  Window Gravity State: NorthWestGravity
  Backing Store State: NotUseful
  Save Under State: no
  Map State: IsViewable
  Override Redirect State: no
Corners: +175+150 -84+150    -84-136   +175-136
  -geometry 1021x738-78+125

Is there a way to configure mozilla to use only a 8-bit drawable?

If I ssh from a workstation with an 8-bit display, mozilla starts up
OK, and creates an 8-bit window.

- Doug N
------------------------------------------------------------------------
Doug Neuhauser                University of California, Berkeley
doug at seismo.berkeley.edu   Berkeley Seismological Laboratory
Phone: 510-642-0931           215 McCone Hall # 4760
Fax:   510-643-5811           Berkeley, CA 94720-4760



From qian1129 at yahoo.com Tue Dec 30 22:47:57 2003
From: qian1129 at yahoo.com (li lee)
Date: Tue, 30 Dec 2003 22:47:57 -0800 (PST)
Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0
Message-ID: <20031231064757.52813.qmail@web41508.mail.yahoo.com>

Hi,

I want to install Rocks v3.1.0 in PCs, but I do not
want to so many CDs:
      roll-grid-3.1.0-0.any.iso
      roll-intel-3.1.0-0.any.iso
      roll-sge-3.1.0-0.any.iso
        ......
So, how to install all these after Rocks and HPC
installation on clusters?

Thanks

Li

__________________________________
Do you Yahoo!?
Find out what made the Top Yahoo! Searches of 2003
http://search.yahoo.com/top2003


From bruno at rocksclusters.org Tue Dec 30 23:35:28 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Tue, 30 Dec 2003 23:35:28 -0800
Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0
In-Reply-To: <20031231064757.52813.qmail@web41508.mail.yahoo.com>
References: <20031231064757.52813.qmail@web41508.mail.yahoo.com>
Message-ID: <E7D709AA-3B63-11D8-9E96-000A95C4E3B4@rocksclusters.org>

> I want to install Rocks v3.1.0 in PCs, but I do not
> want to so many CDs:
>     roll-grid-3.1.0-0.any.iso
>     roll-intel-3.1.0-0.any.iso
>     roll-sge-3.1.0-0.any.iso
>         ......
> So, how to install all these after Rocks and HPC
> installation on clusters?

for now, we do not have a systematic way in which to incorporate rolls
after the frontend is up. this is on our 'todo' list.

    - gb



From tim.carlson at pnl.gov Wed Dec 31 07:29:21 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Wed, 31 Dec 2003 07:29:21 -0800 (PST)
Subject: [Rocks-Discuss]Mozilla / ssh DISPLAY problem with Rocks 3.1.0
In-Reply-To: <200312310434.hBV4YPgp028521@perry.geo.berkeley.edu>
Message-ID: <Pine.GSO.4.44.0312310727220.9033-100000@poincare.emsl.pnl.gov>

On Tue, 30 Dec 2003, Doug Neuhauser wrote:

>
>   I am having a problem using mozilla with the default Rocks monitor web page
>   over an ssh session to my headnode from a Sun workstation with a 24-bit
>   display. My workstation is Sun Blade 150 running Solaris 8, and I am
>   using SSH Secure Shell 3.2.5 (non-commercial version).
>
>   When I ssh to my frontend and to run mozilla, I get an empty Mozilla frame.
>   Running mozilla with debugging options "--g-fatal-warnings" I get:

This sounds like an X tunnel problem. I see X tunnel errors all the time
(OpenGL, colormap, etc). What happens if you just set the DISPLAY
variable back to your Sun box and do the proper xhost command on the Sun?


Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support




From mjk at sdsc.edu Wed Dec 31 09:45:49 2003
From: mjk at sdsc.edu (Mason J. Katz)
Date: Wed, 31 Dec 2003 09:45:49 -0800
Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0
In-Reply-To: <20031231064757.52813.qmail@web41508.mail.yahoo.com>
References: <20031231064757.52813.qmail@web41508.mail.yahoo.com>
Message-ID: <2BEBEC90-3BB9-11D8-9BE3-000A95DA5638@sdsc.edu>

For this release you need all these CDs (if you want this
functionality). Think of Rolls as add-on packs to Rocks, and remember
that software belongs on a CD (not a tar ball, or ftp site). CDs are
the accepted commercial way of releasing software, they a very nice.
But we have some issues with this that we are addressing right now:

- Meta-Rolls. That is how do you merge multiple Rolls into a single CD
image. This is actually very easy to do, and we have some early code
for this, it will be there in the next release. For IA64 we merge the
HPC Roll onto the base DVD, so we have a proof of concept here.

- Rolls cannot be added after a cluster is installed, and must be used
during installation.

- Rolls cannot be uninstalled.

Rolls are maturing pretty quickly, and we know where they need to go.

          -mjk

On Dec 30, 2003, at 10:47 PM, li lee wrote:

>   Hi,
>
>   I want to install Rocks v3.1.0 in PCs, but I do not
>   want to so many CDs:
>       roll-grid-3.1.0-0.any.iso
>       roll-intel-3.1.0-0.any.iso
>       roll-sge-3.1.0-0.any.iso
>           ......
>   So, how to install all these after Rocks and HPC
>   installation on clusters?
>
>   Thanks
>
>   Li
>
>   __________________________________
>   Do you Yahoo!?
>   Find out what made the Top Yahoo! Searches of 2003
>   http://search.yahoo.com/top2003



From michal at harddata.com Wed Dec 31 10:05:26 2003
From: michal at harddata.com (Michal Jaegermann)
Date: Wed, 31 Dec 2003 11:05:26 -0700
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>; from
cdwan@mail.ahc.umn.edu on Tue, Dec 30, 2003 at 09:44:11PM -0600
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
<E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu>
<Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
Message-ID: <20031231110526.B11252@mail.harddata.com>

On Tue, Dec 30, 2003 at 09:44:11PM -0600, Chris Dwan (CCGB) wrote:
>
>
> The shuffling below would work, I think, but it still gives me the
> willies to be mucking with the passwd file every hour:
>
>   mv /etc/passwd /etc/passwd.local
>   ypcat /etc/passwd > /etc/passwd.nis
>   cat /etc/passwd.local /etc/passwd.nis > /etc/passwd
>   service 411 commit
>   cp /etc/passwd.local /etc/passwd
>
> Am I missing the simple way?
cp -p /etc/passwd /etc/passwd.local
    ypcat passwd >> /etc/passwd
    service 411 commit
    mv /etc/passwd.local /etc/passwd

unless 'service 411' cat be told to use another file. You minimize
that way a time gap when you are without /etc/passwd, you make
sure that file attributes on /etc/passwd will remain intact and
you are not left with extra files.

You can also play with (symbolic) links but I am not sure if every
possible /etc/passwd reader will indeed follow a link.

   Michal


From michal at harddata.com Wed Dec 31 10:16:18 2003
From: michal at harddata.com (Michal Jaegermann)
Date: Wed, 31 Dec 2003 11:16:18 -0700
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <200312311501.43675.csamuel@vpac.org>; from csamuel@vpac.org on Wed,
Dec 31, 2003 at 03:01:39PM +1100
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
<Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
<200312311459.54054.csamuel@vpac.org> <200312311501.43675.csamuel@vpac.org>
Message-ID: <20031231111618.C11252@mail.harddata.com>

On Wed, Dec 31, 2003 at 03:01:39PM +1100, Chris Samuel wrote:
> should be:
>
> cp -p /etc/passwd /etc/passwd.local
>
> Oh, and what happens if users overlap ? :-)


'sort -u' over relevant fields after replacing ':'s with blanks? But
this is getting somewhat tad more involved and an "automatic
conflict resolution" still may screw up. A bit of coordination
between whomever maintains NIS and local user data, like reserving
some names and uid ranges for one or another, is likely more
effective in practice.

   Michal


From bruno at rocksclusters.org Wed Dec 31 10:42:21 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 31 Dec 2003 10:42:21 -0800
Subject: [Rocks-Discuss]Roll Documentation posted on the web site
Message-ID: <117308FA-3BC1-11D8-9E96-000A95C4E3B4@rocksclusters.org>

just posted documentation for some of the rolls on the web site -- see
the left-hand side of the web page:

http://www.rocksclusters.org/Rocks/


and here are the links to the roll documentation:
HPC Roll: http://www.rocksclusters.org/rocks-documentation/3.1.0/

SGE Roll: http://www.rocksclusters.org/roll-documentation/sge/3.1.0/

Grid Roll: http://www.rocksclusters.org/roll-documentation/grid/3.1.0/

Intel Roll: http://www.rocksclusters.org/roll-documentation/intel/3.1.0/


as a side note, for every one of the rolls you install above, the
documentation will be available on your frontend at:

     http://localhost/roll-documentation/

  - gb



From cdwan at mail.ahc.umn.edu Wed Dec 31 11:07:37 2003
From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB))
Date: Wed, 31 Dec 2003 13:07:37 -0600 (CST)
Subject: [Rocks-Discuss]NIS outside, 411 inside?
In-Reply-To: <20031231111618.C11252@mail.harddata.com>
References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu>
 <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>
<200312311459.54054.csamuel@vpac.org>
 <200312311501.43675.csamuel@vpac.org> <20031231111618.C11252@mail.harddata.com>
Message-ID: <Pine.GSO.4.58.0312311239310.3992@lenti.med.umn.edu>

> this is getting somewhat tad more involved and an "automatic
> conflict resolution" still may screw up.

I agree with this assessment. The key is to keep the local passwd file as
small as possible, and remove redundant accounts on the frontend node.
Since it consists mostly of non-login accounts anyway, this shouldn't be
too difficult...and it's a one time task anyway.

I've settled on the hourly cron job below. I'll report any weirdness as
appropriate. Thanks for all the suggestions and discussion.

#!/bin/sh
ypcat auto.master   >   /etc/auto.master
ypcat auto.home     >   /etc/auto.home
ypcat auto.net      >   /etc/auto.net
ypcat auto.web      >   /etc/auto.web

ypcat passwd      > /etc/passwd.nis
cat   /etc/passwd.local /etc/passwd.nis > /etc/passwd.combined
cp    /etc/passwd.combined /etc/passwd

ypcat group       > /etc/group.nis
cat   /etc/group.local /etc/group.nis > /etc/group.combined
cp    /etc/group.combined /etc/group

-Chris Dwan
 The University of Minnesota
From maz at tempestcomputers.com Wed Dec 31 11:37:09 2003
From: maz at tempestcomputers.com (John Mazza)
Date: Wed, 31 Dec 2003 14:37:09 -0500
Subject: [Rocks-Discuss]Rocks 3.1.0 with Adaptec I2O RAID
Message-ID: <200312311937.hBVJb9J25828@postal.sdsc.edu>

Does anyone know of a way to make 3.1.0 (x86-64) version
work with an Adaptec 2100S SCSI RAID card? My master node
needs to use this card, but it doesn't appear to be in the
kernel on the CD. Also, does it support the SysKonnect
SK-9821 (Ver 2.0) Gig cards?

Thanks!




From tim.carlson at pnl.gov Wed Dec 31 12:49:25 2003
From: tim.carlson at pnl.gov (Tim Carlson)
Date: Wed, 31 Dec 2003 12:49:25 -0800 (PST)
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <20031229183225.M11961@scalableinformatics.com>
Message-ID: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>

On Mon, 29 Dec 2003, landman wrote:

> SSH is too slow.    Wow.   5-10 seconds to log in.

Just getting around to this. I did a clean install on our test cluster
(Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal
user, a "cluster-fork date" command on 4 nodes took under .6 seconds

Sounds like you have some type of DNS issue. Did you get a bad
/etc/resolv.conf file on the nodes for some reason?

>   a) md (e.g. Software RAID): Just try to build one. Anaconda will
>   happily let you do this ... though it will die in the formatting stages.
>   Dropping into the shell (Alt-F2) and looking for the md module (lsmod)
>   shows nothing. Insmod the md also doesn't do anything. Catting
>   /proc/devices shows no md as a character or block device.

The odd bit here is that you can do a

modprobe raid0

on a running frontend and it gets installed but there is no associated
"md" module. Was "md" built directly into the kernel? very odd.

>b) ext3.   There is no ext3 available for the install.

This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not
having ext3 as an install option isn't a show stopper for me since I can
do a tune2fs after the fact. But ext3 should be there.

Having version 2.0.8 of the myrinet drivers up and running is a big + in
my book. SGE 5.3p5 is also nice to see.

It will be some time before I upgrade any production clusters given the
differences between Rh 7.3 and WS 3.0. Too big of a jump for me right now.
We first need to convert a couple hundred desktop boxes :)

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson at pnl.gov
EMSL UNIX System Support




From James_ODell at Brown.edu Wed Dec 31 13:09:25 2003
From: James_ODell at Brown.edu (James O'Dell)
Date: Wed, 31 Dec 2003 16:09:25 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
Message-ID: <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu>

For whatever its worth, MPICH works MUCH better when run over rsh that
ssh. It seems as if ssh doesn't pass along
signals nearly as well as rsh. Since enabling rsh and configuring MPICH
to use it, we have had no Zombie jobs
on our compute nodes. When using SSH they were a common occurrence. In
fact, if you look at the MPICH implementation for myrinet, you'll see
the contortions that they use to try and clean up compute nodes when
using ssh.

Jim

On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote:

> On Mon, 29 Dec 2003, landman wrote:
>
>> SSH is too slow. Wow. 5-10 seconds to log in.
>
> Just getting around to this. I did a clean install on our test cluster
> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal
> user, a "cluster-fork date" command on 4 nodes took under .6 seconds
>
> Sounds like you have some type of DNS issue. Did you get a bad
> /etc/resolv.conf file on the nodes for some reason?
>
>> a) md (e.g. Software RAID): Just try to build one. Anaconda will
>> happily let you do this ... though it will die in the formatting
>> stages.
>> Dropping into the shell (Alt-F2) and looking for the md module (lsmod)
>> shows nothing. Insmod the md also doesn't do anything. Catting
>> /proc/devices shows no md as a character or block device.
>
> The odd bit here is that you can do a
>
> modprobe raid0
>
> on a running frontend and it gets installed but there is no associated
> "md" module. Was "md" built directly into the kernel? very odd.
>
>> b) ext3. There is no ext3 available for the install.
>
> This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not
> having ext3 as an install option isn't a show stopper for me since I
> can
> do a tune2fs after the fact. But ext3 should be there.
>
> Having version 2.0.8 of the myrinet drivers up and running is a big +
> in
> my book. SGE 5.3p5 is also nice to see.
>
> It will be some time before I upgrade any production clusters given the
> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right
> now.
> We first need to convert a couple hundred desktop boxes :)
>
> Tim Carlson
> Voice: (509) 376 3423
> Email: Tim.Carlson at pnl.gov
> EMSL UNIX System Support
>



From landman at scalableinformatics.com Wed Dec 31 14:46:22 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 31 Dec 2003 17:46:22 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
Message-ID: <1072910782.4470.268.camel@protein.scalableinformatics.com>

On Wed, 2003-12-31 at 15:49, Tim Carlson wrote:
> On Mon, 29 Dec 2003, landman wrote:
>
> > SSH is too slow. Wow. 5-10 seconds to log in.
>
> Just getting around to this. I did a clean install on our test cluster
> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal
> user, a "cluster-fork date" command on 4 nodes took under .6 seconds

Yeah, some weirdness in DNS. Re-load on one cluster head took care of
it, on the other applying dnsmasq helped.

>
>   Sounds like you have some type of DNS issue. Did you get a bad
>   /etc/resolv.conf file on the nodes for some reason?
>
>   >   a) md (e.g. Software RAID): Just try to build one. Anaconda will
>   >   happily let you do this ... though it will die in the formatting stages.
>   >   Dropping into the shell (Alt-F2) and looking for the md module (lsmod)
>   >   shows nothing. Insmod the md also doesn't do anything. Catting
>   >   /proc/devices shows no md as a character or block device.
>
>   The odd bit here is that you can do a
>
>   modprobe raid0
>
> on a running frontend and it gets installed but there is no associated
> "md" module. Was "md" built directly into the kernel? very odd.

True, but I wanted to do a raid 1. I tried the insmod raid1 but it
didn't work, from what I can see the module was not in the build. This
is ok, as some of it can be done later.

>
>   >b) ext3.   There is no ext3 available for the install.
>
>   This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not
>   having ext3 as an install option isn't a show stopper for me since I can
>   do a tune2fs after the fact. But ext3 should be there.

Thats what I did.     I'll post a quick set of instructions for this a
little later.

>
> Having version 2.0.8 of the myrinet drivers up and running is a big + in
> my book. SGE 5.3p5 is also nice to see.

I agree, though I would like to see people do a

      cluster-fork "/etc/init.d/rcsge stop"
      cluster-fork "chown -R root:root /opt/gridengine/bin
/opt/gridengine/utilbin"
      cluster-fork "/etc/init.d/rcsge start"

to fix the compute node sge permissions.     Some of the utils don't work
otherwise.

>
> It will be some time before I upgrade any production clusters given the
> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right now.
> We first need to convert a couple hundred desktop boxes :)

:)

>
>   Tim Carlson
>   Voice: (509) 376 3423
>   Email: Tim.Carlson at pnl.gov
>   EMSL UNIX System Support
>



From landman at scalableinformatics.com Wed Dec 31 14:48:08 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 31 Dec 2003 17:48:08 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu>
References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
       <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu>
Message-ID: <1072910887.4464.271.camel@protein.scalableinformatics.com>

Hi James:

    Did you rebuild MPICH for this?   I noticed the signal handling bit
using mpiBLAST.   Lots of zombies to deal with.

Joe

On Wed, 2003-12-31 at 16:09, James O'Dell wrote:
> For whatever its worth, MPICH works MUCH better when run over rsh that
> ssh. It seems as if ssh doesn't pass along
> signals nearly as well as rsh. Since enabling rsh and configuring MPICH
> to use it, we have had no Zombie jobs
> on our compute nodes. When using SSH they were a common occurrence. In
> fact, if you look at the MPICH implementation for myrinet, you'll see
> the contortions that they use to try and clean up compute nodes when
> using ssh.
>
> Jim
>
> On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote:
>
> > On Mon, 29 Dec 2003, landman wrote:
> >
> >> SSH is too slow. Wow. 5-10 seconds to log in.
> >
> > Just getting around to this. I did a clean install on our test cluster
> > (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal
> > user, a "cluster-fork date" command on 4 nodes took under .6 seconds
> >
> > Sounds like you have some type of DNS issue. Did you get a bad
> > /etc/resolv.conf file on the nodes for some reason?
> >
> >> a) md (e.g. Software RAID): Just try to build one. Anaconda will
> >> happily let you do this ... though it will die in the formatting
> >> stages.
> >> Dropping into the shell (Alt-F2) and looking for the md module (lsmod)
> >> shows nothing. Insmod the md also doesn't do anything. Catting
> >> /proc/devices shows no md as a character or block device.
> >
> > The odd bit here is that you can do a
> >
> > modprobe raid0
> >
> > on a running frontend and it gets installed but there is no associated
> > "md" module. Was "md" built directly into the kernel? very odd.
> >
> >> b) ext3. There is no ext3 available for the install.
> >
> > This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not
> > having ext3 as an install option isn't a show stopper for me since I
> > can
> > do a tune2fs after the fact. But ext3 should be there.
> >
> > Having version 2.0.8 of the myrinet drivers up and running is a big +
> > in
> > my book. SGE 5.3p5 is also nice to see.
> >
> > It will be some time before I upgrade any production clusters given the
> > differences between Rh 7.3 and WS 3.0. Too big of a jump for me right
> > now.
> > We first need to convert a couple hundred desktop boxes :)
> >
>   >   Tim Carlson
>   >   Voice: (509) 376 3423
>   >   Email: Tim.Carlson at pnl.gov
>   >   EMSL UNIX System Support
>   >



From James_ODell at Brown.edu Wed Dec 31 15:12:59 2003
From: James_ODell at Brown.edu (James O'Dell)
Date: Wed, 31 Dec 2003 18:12:59 -0500
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <1072910887.4464.271.camel@protein.scalableinformatics.com>
References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
<9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu>
<1072910887.4464.271.camel@protein.scalableinformatics.com>
Message-ID: <DFF94A81-3BE6-11D8-9574-0030656A27CC@Brown.edu>

The cheap way to do it is to grep the bin directory and look for SSH in
the execution
scripts. You can change them to RSH and MPICH will use RSH to execute.

An alternative is to set RSHCOMMAND=rsh during a rebuild. I'm pretty
sure that
this method accomplishes precisely the same thing as simply editing the
execution
scripts.

Jim

On Dec 31, 2003, at 5:48 PM, Joe Landman wrote:

> Hi James:
>
>   Did you rebuild MPICH for this? I noticed the signal handling bit
> using mpiBLAST. Lots of zombies to deal with.
>
> Joe
>
> On Wed, 2003-12-31 at 16:09, James O'Dell wrote:
>> For whatever its worth, MPICH works MUCH better when run over rsh that
>> ssh. It seems as if ssh doesn't pass along
>> signals nearly as well as rsh. Since enabling rsh and configuring
>> MPICH
>> to use it, we have had no Zombie jobs
>> on our compute nodes. When using SSH they were a common occurrence.
>> In
>> fact, if you look at the MPICH implementation for myrinet, you'll see
>> the contortions that they use to try and clean up compute nodes when
>> using ssh.
>>
>> Jim
>>
>> On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote:
>>
>>> On Mon, 29 Dec 2003, landman wrote:
>>>
>>>> SSH is too slow. Wow. 5-10 seconds to log in.
>>>
>>> Just getting around to this. I did a clean install on our test
>>> cluster
>>> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal
>>> user, a "cluster-fork date" command on 4 nodes took under .6 seconds
>>>
>>> Sounds like you have some type of DNS issue. Did you get a bad
>>> /etc/resolv.conf file on the nodes for some reason?
>>>
>>>> a) md (e.g. Software RAID): Just try to build one. Anaconda will
>>>> happily let you do this ... though it will die in the formatting
>>>> stages.
>>>> Dropping into the shell (Alt-F2) and looking for the md module
>>>> (lsmod)
>>>> shows nothing. Insmod the md also doesn't do anything. Catting
>>>> /proc/devices shows no md as a character or block device.
>>>
>>> The odd bit here is that you can do a
>>>
>>> modprobe raid0
>>>
>>> on a running frontend and it gets installed but there is no
>>> associated
>>> "md" module. Was "md" built directly into the kernel? very odd.
>>>
>>>> b) ext3. There is no ext3 available for the install.
>>>
>>> This is a bit annoying. Nobody really uses ext2 anymore do they? :)
>>> Not
>>> having ext3 as an install option isn't a show stopper for me since I
>>> can
>>> do a tune2fs after the fact. But ext3 should be there.
>>>
>>> Having version 2.0.8 of the myrinet drivers up and running is a big +
>>> in
>>> my book. SGE 5.3p5 is also nice to see.
>>>
>>> It will be some time before I upgrade any production clusters given
>>> the
>>> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right
>>> now.
>>> We first need to convert a couple hundred desktop boxes :)
>>>
>>> Tim Carlson
>>> Voice: (509) 376 3423
>>> Email: Tim.Carlson at pnl.gov
>>> EMSL UNIX System Support
>>>



From bruno at rocksclusters.org Wed Dec 31 15:46:23 2003
From: bruno at rocksclusters.org (Greg Bruno)
Date: Wed, 31 Dec 2003 15:46:23 -0800
Subject: [Rocks-Discuss]3.1.0 surprises
In-Reply-To: <1072910782.4470.268.camel@protein.scalableinformatics.com>
References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov>
<1072910782.4470.268.camel@protein.scalableinformatics.com>
Message-ID: <8ABA2E3A-3BEB-11D8-83CE-000A95C4E3B4@rocksclusters.org>
>> Having version 2.0.8 of the myrinet drivers up and running is a big +
>> in
>> my book. SGE 5.3p5 is also nice to see.
>
> I agree, though I would like to see people do a
>
>     cluster-fork "/etc/init.d/rcsge stop"
>     cluster-fork "chown -R root:root /opt/gridengine/bin
> /opt/gridengine/utilbin"
>     cluster-fork "/etc/init.d/rcsge start"
>
> to fix the compute node sge permissions. Some of the utils don't work
> otherwise.

so we can test the fixes, what utilities need the above changes?

 - gb



From landman at scalableinformatics.com Wed Dec 31 21:04:14 2003
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 01 Jan 2004 00:04:14 -0500
Subject: [Rocks-Discuss]looking for a work-around
Message-ID: <1072933453.4463.293.camel@protein.scalableinformatics.com>

Ok, this one is weird. On two different clusters using the same
replace-auto-partition.xml I get two completely different behaviors. I
am positive this is an anaconda issue, but it could be something else.

Both systems have IDE hard disks. I made the second one (my office
system) match the other system, so the IDE hard disks are hda and hdb.
Yes, I know this is not ideal, and I know that this should be changed.
I am simply trying to match their system.

First the partitioning:

<main>
 <clearpart>--all</clearpart>
 <part> / --size 4096 --ondisk hda </part>
 <part> swap --size 1024 --ondisk hda </part>
 <part> raid.00 --size 1 --grow --ondisk hda </part>
 <part> /tmp --size 4096 --ondisk hdb </part>
 <part> swap --size 1024 --ondisk hdb </part>
 <part> raid.01 --size 1 --grow --ondisk hdb </part>
</main>

On one cluster (my office), this works perfectly.

On the other cluster, it fails with:

 An unhandled exception has occurred. This is      # ???
            ??? most likely a bug. Please copy the full text       ???
???
            ??? of this exception or save the crash dump to a      ???
???
            ??? floppy then file a detailed bug report against     ???
???
            ??? anaconda at http://bugzilla.redhat.com/bugzilla/   ???
???
             ???                                                    ???
???
             ??? Traceback (most recent call last):                 ???
???
             ???   File "/usr/bin/anaconda.real", line 1081, in ?   ???
???
             ???     intf.run(id, dispatch, configFileData)         ???
???
             ???   File                                             ???
???
             ??? "/var/tmp/anaconda-9.1//usr/lib/anaconda/text.py   ???
???
             ??? ", line 448, in run                                ???
???
             ???   File "/tmp/ksclass.py", line 799, in __call__    ???
???
            ??? KeyError: swap                                     # ???
             ???                                                      ???
            ???     ??????????????????           ????????????????????????
 ??????????????     ??? OK ???           ??? Save ???         ??? Debug
???
???         ???     ??????????????????           ????????????????????????
 ??????????????                                                      ??

(sorry about the question marks). It appears that this is a python
KeyError, which occurs when the element being sought has not been found.

Any ideas?

Joe
--
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
   web: http://scalableinformatics.com
phone: +1 734 612 4615

2003 December

  • 1.
    From angel atmiami.edu Mon Dec 1 10:25:34 2003 From: angel at miami.edu (Angel Li) Date: Mon, 01 Dec 2003 13:25:34 -0500 Subject: [Rocks-Discuss]cluster-fork Message-ID: <3FCB879E.8050905@miami.edu> Hi, I recently installed Rocks 3.0 on a Linux cluster and when I run the command "cluster-fork" I get this error: apple* cluster-fork ls Traceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoder ImportError: Bad magic number in /usr/lib/python1.5/site-packages/gmon/encoder.pyc Any thoughts? I'm also wondering where to find the python sources for files in /usr/lib/python1.5/site-packages/gmon. Thanks, Angel From jghobrial at uh.edu Mon Dec 1 11:35:06 2003 From: jghobrial at uh.edu (Joseph) Date: Mon, 1 Dec 2003 13:35:06 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <3FCB879E.8050905@miami.edu> References: <3FCB879E.8050905@miami.edu> Message-ID: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> On Mon, 1 Dec 2003, Angel Li wrote: Hello Angel, I have the same problem and so far there is no response when I posted about this a month ago. Is your frontend an AMD setup?? I am thinking this is an AMD problem. Thanks, Joseph > Hi, > > I recently installed Rocks 3.0 on a Linux cluster and when I run the > command "cluster-fork" I get this error: > > apple* cluster-fork ls > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ?
  • 2.
    > import gmon.encoder > ImportError: Bad magic number in > /usr/lib/python1.5/site-packages/gmon/encoder.pyc > > Any thoughts? I'm also wondering where to find the python sources for > files in /usr/lib/python1.5/site-packages/gmon. > > Thanks, > > Angel > From tim.carlson at pnl.gov Mon Dec 1 14:58:54 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 01 Dec 2003 14:58:54 -0800 (PST) Subject: [Rocks-Discuss]odd kickstart problem In-Reply-To: <76AC0F5E-2025-11D8-804D-000393A4725A@sdsc.edu> Message-ID: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov> Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get the following error in /var/log/httpd/error_log Traceback (innermost last): File "/opt/rocks/sbin/kgen", line 530, in ? app.run() File "/opt/rocks/sbin/kgen", line 497, in run doc = FromXmlStream(file) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 386, in FromXmlStream return reader.fromStream(stream, ownerDocument) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 372, in fromStream self.parser.parse(s) File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 58, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line 125, in parse self.close() File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 154, in close self.feed("", isFinal = 1) File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line 148, in feed self._err_handler.fatalError(exc) File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", line 340, in fatalError raise exception xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found Doing a wget of http://frontend-0/install/kickstart.cgi? arch=i386&np=2&project=rocks on one of the working internal nodes yields the same error. Any thoughts on this?
  • 3.
    I've also donea fresh rocks-dist dist Tim From sjenks at uci.edu Mon Dec 1 15:35:54 2003 From: sjenks at uci.edu (Stephen Jenks) Date: Mon, 1 Dec 2003 15:35:54 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> Message-ID: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> FYI, I have a dual Athlon frontend and didn't have that problem. I know that doesn't exactly help you, but at least it doesn't fail on all AMD machines. It looks like the .pyc file might be corrupt in your installation. The source .py file (encoder.py) is in the /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing the .pyc file would regenerate it (if you run cluster-fork as root?) The md5sum for encoder.pyc on my system is: 459c78750fe6e065e9ed464ab23ab73d encoder.pyc So you can check if yours is different. Steve Jenks On Dec 1, 2003, at 11:35 AM, Joseph wrote: > On Mon, 1 Dec 2003, Angel Li wrote: > Hello Angel, I have the same problem and so far there is no response > when > I posted about this a month ago. > > Is your frontend an AMD setup?? > > I am thinking this is an AMD problem. > > Thanks, > Joseph > > >> Hi, >> >> I recently installed Rocks 3.0 on a Linux cluster and when I run the >> command "cluster-fork" I get this error: >> >> apple* cluster-fork ls >> Traceback (innermost last): >> File "/opt/rocks/sbin/cluster-fork", line 88, in ? >> import rocks.pssh >> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? >> import gmon.encoder >> ImportError: Bad magic number in
  • 4.
    >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc >> >> Any thoughts? I'm also wondering where to find the python sources for >> files in /usr/lib/python1.5/site-packages/gmon. >> >> Thanks, >> >> Angel >> From mjk at sdsc.edu Mon Dec 1 19:03:16 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Mon, 1 Dec 2003 19:03:16 -0800 Subject: [Rocks-Discuss]odd kickstart problem In-Reply-To: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov> References: <Pine.LNX.4.44.0312011453020.22892-100000@scorpion.emsl.pnl.gov> Message-ID: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu> You'll need to run the kpp and kgen steps (what kickstart.cgi does for your) manually to find if this is an XML error. # cd /home/install/profiles/current # kpp compute This will generate a kickstart file for a compute nodes, although some information will be missing since it isn't specific to a node (not like what ./kickstart.cgi --client=node-name generates). But what this does do is traverse the XML graph and build a monolithic XML kickstart profile. If this step works you can then "|" pipe the output into kgen to convert the XML to kickstart syntax. Something in this procedure should fail and point to the error. -mjk On Dec 1, 2003, at 2:58 PM, Tim Carlson wrote: > Trying to bring up an old dead node on a Rocks 2.3.2 cluster and I get > the > following error in /var/log/httpd/error_log > > > Traceback (innermost last): > File "/opt/rocks/sbin/kgen", line 530, in ? > app.run() > File "/opt/rocks/sbin/kgen", line 497, in run > doc = FromXmlStream(file) > File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line > 386, in FromXmlStream > return reader.fromStream(stream, ownerDocument) > File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line > 372, in fromStream > self.parser.parse(s) > File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 58, > in parse
  • 5.
    > xmlreader.IncrementalParser.parse(self, source) > File "/usr/lib/python1.5/site-packages/xml/sax/xmlreader.py", line > 125, > in parse > self.close() > File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 154, in close > self.feed("", isFinal = 1) > File "/usr/lib/python1.5/site-packages/xml/sax/expatreader.py", line > 148, in feed > self._err_handler.fatalError(exc) > File "/usr/lib/python1.5/site-packages/xml/dom/ext/reader/Sax2.py", > line > 340, in fatalError > raise exception > xml.sax._exceptions.SAXParseException: <stdin>:3298:0: no element found > > > Doing a wget of > http://frontend-0/install/kickstart.cgi? > arch=i386&np=2&project=rocks > on one of the working internal nodes yields the same error. > > Any thoughts on this? > > I've also done a fresh > rocks-dist dist > > Tim From tim.carlson at pnl.gov Mon Dec 1 20:42:51 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 01 Dec 2003 20:42:51 -0800 (PST) Subject: [Rocks-Discuss]odd kickstart problem In-Reply-To: <132DD626-2474-11D8-A7A4-000A95DA5638@sdsc.edu> Message-ID: <Pine.GSO.4.44.0312012040250.3148-100000@paradox.emsl.pnl.gov> On Mon, 1 Dec 2003, Mason J. Katz wrote: > You'll need to run the kpp and kgen steps (what kickstart.cgi does for > your) manually to find if this is an XML error. > > # cd /home/install/profiles/current > # kpp compute That was the trick. This sent me down the correct path. I had uninstalled SGE on the frontend (I was having problems with SGE and wanted to start from scratch) Adding the 2 SGE XML files back to /home/install/profiles/2.3.2/nodes/ fixed everything Thanks! Tim
  • 6.
    From landman atscalableinformatics.com Tue Dec 2 04:15:07 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 02 Dec 2003 07:15:07 -0500 Subject: [Rocks-Discuss]supermicro based MB's Message-ID: <3FCC824B.5060406@scalableinformatics.com> Folks: Working on integrating a Supermicro MB based cluster. Discovered early on that all of the compute nodes have an Intel based NIC that RedHat doesn't know anything about (any version of RH). Some of the administrative nodes have other similar issues. I am seeing simply a suprising number of mis/un detected hardware across the collection of MBs. Anyone have advice on where to get modules/module source for Redhat for these things? It looks like I will need to rebuild the boot CD, though the several times I have tried this previously have failed to produce a working/bootable system. It looks like new modules need to be created/inserted into the boot process (head node and cluster nodes) kernels, as well as into the installable kernels. Has anyone done this for a Supermicro MB based system? Thanks . Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From jghobrial at uh.edu Tue Dec 2 08:28:08 2003 From: jghobrial at uh.edu (Joseph) Date: Tue, 2 Dec 2003 10:28:08 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> Message-ID: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> Indeed my md5sum is different for encoder.pyc. However, when I pulled the file and run "cluster-fork" python responds about an import problem. So it seems that regeneration did not occur. Is there a flag I need to pass? I have also tried to figure out what package provides encoder and reinstall the package, but an rpm query reveals nothing. If this is a generated file, what generates it? It seems that an rpm file query on ganglia show that files in the directory belong to the package, but encoder.pyc does not. Thanks,
  • 7.
    Joseph On Mon, 1Dec 2003, Stephen Jenks wrote: > FYI, I have a dual Athlon frontend and didn't have that problem. I know > that doesn't exactly help you, but at least it doesn't fail on all AMD > machines. > > It looks like the .pyc file might be corrupt in your installation. The > source .py file (encoder.py) is in the > /usr/lib/python1.5/site-packages/gmon directory, so perhaps removing > the .pyc file would regenerate it (if you run cluster-fork as root?) > > The md5sum for encoder.pyc on my system is: > 459c78750fe6e065e9ed464ab23ab73d encoder.pyc > So you can check if yours is different. > > Steve Jenks > > > On Dec 1, 2003, at 11:35 AM, Joseph wrote: > > > On Mon, 1 Dec 2003, Angel Li wrote: > > Hello Angel, I have the same problem and so far there is no response > > when > > I posted about this a month ago. > > > > Is your frontend an AMD setup?? > > > > I am thinking this is an AMD problem. > > > > Thanks, > > Joseph > > > > > >> Hi, > >> > >> I recently installed Rocks 3.0 on a Linux cluster and when I run the > >> command "cluster-fork" I get this error: > >> > >> apple* cluster-fork ls > >> Traceback (innermost last): > >> File "/opt/rocks/sbin/cluster-fork", line 88, in ? > >> import rocks.pssh > >> File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > >> import gmon.encoder > >> ImportError: Bad magic number in > >> /usr/lib/python1.5/site-packages/gmon/encoder.pyc > >> > >> Any thoughts? I'm also wondering where to find the python sources for > >> files in /usr/lib/python1.5/site-packages/gmon. > >> > >> Thanks, > >> > >> Angel > >> >
  • 8.
    From angel atmiami.edu Tue Dec 2 09:02:55 2003 From: angel at miami.edu (Angel Li) Date: Tue, 02 Dec 2003 12:02:55 -0500 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> Message-ID: <3FCCC5BF.3030903@miami.edu> Joseph wrote: >Indeed my md5sum is different for encoder.pyc. However, when I pulled the >file and run "cluster-fork" python responds about an import problem. So it >seems that regeneration did not occur. Is there a flag I need to pass? > >I have also tried to figure out what package provides encoder and >reinstall the package, but an rpm query reveals nothing. > >If this is a generated file, what generates it? > >It seems that an rpm file query on ganglia show that files in the >directory belong to the package, but encoder.pyc does not. > >Thanks, >Joseph > > > > I have finally found the python sources in the HPC rolls CD, filename ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it seems python "compiles" the .py files to ".pyc" and then deletes the source file the first time they are referenced? I also noticed that there are two versions of python installed. Maybe the pyc files from one version won't load into the other one? Angel From mjk at sdsc.edu Tue Dec 2 15:52:52 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 2 Dec 2003 15:52:52 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <3FCCC5BF.3030903@miami.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> Message-ID: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> Python creates the .pyc files for you, and does not remove the original .py file. I would be extremely surprised it two "identical" .pyc files had the same md5 checksum. I'd expect this to be more like C .o file which always contain random data to pad out to the end of a page and
  • 9.
    32/64 bit wordsizes. Still this is just a guess, the real point is you can always remove the .pyc files and the .py will regenerate it when imported (although standard UNIX file/dir permission still apply). What is the import error you get from cluster-fork? -mjk On Dec 2, 2003, at 9:02 AM, Angel Li wrote: > Joseph wrote: > >> Indeed my md5sum is different for encoder.pyc. However, when I pulled >> the file and run "cluster-fork" python responds about an import >> problem. So it seems that regeneration did not occur. Is there a flag >> I need to pass? >> >> I have also tried to figure out what package provides encoder and >> reinstall the package, but an rpm query reveals nothing. >> >> If this is a generated file, what generates it? >> >> It seems that an rpm file query on ganglia show that files in the >> directory belong to the package, but encoder.pyc does not. >> >> Thanks, >> Joseph >> >> >> > I have finally found the python sources in the HPC rolls CD, filename > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > seems python "compiles" the .py files to ".pyc" and then deletes the > source file the first time they are referenced? I also noticed that > there are two versions of python installed. Maybe the pyc files from > one version won't load into the other one? > > Angel > > From vrowley at ucsd.edu Mon Dec 1 14:27:03 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Mon, 01 Dec 2003 14:27:03 -0800 Subject: [Rocks-Discuss]PXE boot problems Message-ID: <3FCBC037.5000302@ucsd.edu> We have installed a ROCKS 3.0.0 frontend on a DL380 and are trying to install a compute node via PXE. We are getting an error similar to the one mentioned in the archives, e.g. > Loading initrd.img.... > Ready > > Failed to free base memory >
  • 10.
    We have upgradedto syslinux-2.07-1, per the suggestion in the archives, but continue to get the same error. Any ideas? -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb From naihh at imcb.a-star.edu.sg Tue Dec 2 18:50:55 2003 From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis) Date: Wed, 3 Dec 2003 10:50:55 +0800 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg> Hi Laurence, I just downloaded the Rocks3.0 for IA32 and installed it but SGE is still not working. Any idea? Nai Hong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: Laurence Liew [mailto:laurence at scalablesys.com] Sent: Thursday, November 20, 2003 2:53 PM To: Nai Hong Hwa Francis Cc: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? Hi Francis GridEngine roll is ready for ia32. We will get a ia64 native version ready as soon as we get back from SC2003. It will be released in a few weeks time. Globus GT2.4 is included in the Grid Roll Cheers! Laurence On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote: > > Hi,
  • 11.
    > > Does anyonehave any idea when will Sun Grid Engine be included as part > of Rocks 3 distribution. > > I am a newbie to Grid Computing. > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid? > > Regards > > Nai Hong Hwa Francis > > Institute of Molecular and Cell Biology (A*STAR) > 30 Medical Drive > Singapore 117609 > DID: 65-6874-6196 > > -----Original Message----- > From: npaci-rocks-discussion-request at sdsc.edu > [mailto:npaci-rocks-discussion-request at sdsc.edu] > Sent: Thursday, November 20, 2003 4:01 AM > To: npaci-rocks-discussion at sdsc.edu > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs > > Send npaci-rocks-discussion mailing list submissions to > npaci-rocks-discussion at sdsc.edu > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > or, via email, send a message with subject or body 'help' to > npaci-rocks-discussion-request at sdsc.edu > > You can reach the person managing the list at > npaci-rocks-discussion-admin at sdsc.edu > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of npaci-rocks-discussion digest..." > > > Today's Topics: > > 1. top500 cluster installation movie (Greg Bruno) > 2. Re: Running Normal Application on Rocks Cluster - > Newbie Question (Laurence Liew) > > --__--__-- > > Message: 1 > To: npaci-rocks-discussion at sdsc.edu > From: Greg Bruno <bruno at rocksclusters.org> > Date: Tue, 18 Nov 2003 13:41:15 -0800 > Subject: [Rocks-Discuss]top500 cluster installation movie > > here's a crew of 7, installing the 201st fastest supercomputer in the > world in under two hours on the showroom floor at SC 03: > > http://www.rocksclusters.org/rocks.mov >
  • 12.
    > warning: theabove file is ~65MB. > > - gb > > > --__--__-- > > Message: 2 > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks Cluster > - > Newbie Question > From: Laurence Liew <laurenceliew at yahoo.com.sg> > To: Leong Chee Shian <chee-shian.leong at schenker.com> > Cc: npaci-rocks-discussion at sdsc.edu > Date: Wed, 19 Nov 2003 12:31:18 +0800 > > Chee Shian, > > Thanks for your call. We will take this off list and visit you next week > in your office as you requested. > > Cheers! > laurence > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote: > > I have just installed Rocks 3.0 with one frontend and two compute > > node. > > > > A normal file based application is installed on the frontend and is > > NFS shared to the compute nodes . > > > > Question is : When run 5 sessions of my applications , the CPU > > utilization is all concentrated on the frontend node , nothing is > > being passed on to the compute nodes . How do I make these 3 computers > > to function as one and share the load ? > > > > Thanks everyone as I am really new to this clustering stuff.. > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive > > intel machines to replace our existing multi CPU sun server, > > suggestions and recommendations are greatly appreciated. > > > > > > Leong > > > > > > > > > > --__--__-- > > _______________________________________________ > npaci-rocks-discussion mailing list
  • 13.
    > npaci-rocks-discussion atsdsc.edu > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > > End of npaci-rocks-discussion Digest > > > DISCLAIMER: > This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. -- Laurence Liew CTO, Scalable Systems Pte Ltd 7 Bedok South Road Singapore 469272 Tel : 65 6827 3953 Fax : 65 6827 3922 Mobile: 65 9029 4312 Email : laurence at scalablesys.com http://www.scalablesys.com DISCLAIMER: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From laurence at scalablesys.com Tue Dec 2 19:10:08 2003 From: laurence at scalablesys.com (Laurence Liew) Date: Wed, 03 Dec 2003 11:10:08 +0800 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? In-Reply-To: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg> References: <5E118EED7CC277468A275F11EEEC39B94CCC22@EXIMCB2.imcb.a-star.edu.sg> Message-ID: <1070421007.2452.51.camel@scalable> Hi, SGE is in the SGE roll. You need to download the base, hpc and sge roll. The install is now different from V2.3.x Cheers! laurence On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote: > Hi Laurence, >
  • 14.
    > I just downloaded the Rocks3.0 for IA32 and installed it but SGE is > still not working. > > Any idea? > > Nai Hong Hwa Francis > Institute of Molecular and Cell Biology (A*STAR) > 30 Medical Drive > Singapore 117609. > DID: (65) 6874-6196 > > -----Original Message----- > From: Laurence Liew [mailto:laurence at scalablesys.com] > Sent: Thursday, November 20, 2003 2:53 PM > To: Nai Hong Hwa Francis > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included > inRocks 3 for Itanium? > > Hi Francis > > GridEngine roll is ready for ia32. We will get a ia64 native version > ready as soon as we get back from SC2003. It will be released in a few > weeks time. > > Globus GT2.4 is included in the Grid Roll > > Cheers! > Laurence > > > On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote: > > > > Hi, > > > > Does anyone have any idea when will Sun Grid Engine be included as > part > > of Rocks 3 distribution. > > > > I am a newbie to Grid Computing. > > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid? > > > > Regards > > > > Nai Hong Hwa Francis > > > > Institute of Molecular and Cell Biology (A*STAR) > > 30 Medical Drive > > Singapore 117609 > > DID: 65-6874-6196 > > > > -----Original Message----- > > From: npaci-rocks-discussion-request at sdsc.edu > > [mailto:npaci-rocks-discussion-request at sdsc.edu] > > Sent: Thursday, November 20, 2003 4:01 AM > > To: npaci-rocks-discussion at sdsc.edu > > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs > > > > Send npaci-rocks-discussion mailing list submissions to
  • 15.
    > > npaci-rocks-discussion at sdsc.edu > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > or, via email, send a message with subject or body 'help' to > > npaci-rocks-discussion-request at sdsc.edu > > > > You can reach the person managing the list at > > npaci-rocks-discussion-admin at sdsc.edu > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of npaci-rocks-discussion digest..." > > > > > > Today's Topics: > > > > 1. top500 cluster installation movie (Greg Bruno) > > 2. Re: Running Normal Application on Rocks Cluster - > > Newbie Question (Laurence Liew) > > > > --__--__-- > > > > Message: 1 > > To: npaci-rocks-discussion at sdsc.edu > > From: Greg Bruno <bruno at rocksclusters.org> > > Date: Tue, 18 Nov 2003 13:41:15 -0800 > > Subject: [Rocks-Discuss]top500 cluster installation movie > > > > here's a crew of 7, installing the 201st fastest supercomputer in the > > world in under two hours on the showroom floor at SC 03: > > > > http://www.rocksclusters.org/rocks.mov > > > > warning: the above file is ~65MB. > > > > - gb > > > > > > --__--__-- > > > > Message: 2 > > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks > Cluster > > - > > Newbie Question > > From: Laurence Liew <laurenceliew at yahoo.com.sg> > > To: Leong Chee Shian <chee-shian.leong at schenker.com> > > Cc: npaci-rocks-discussion at sdsc.edu > > Date: Wed, 19 Nov 2003 12:31:18 +0800 > > > > Chee Shian, > > > > Thanks for your call. We will take this off list and visit you next > week > > in your office as you requested. > > > > Cheers! > > laurence
  • 16.
    > > > > >> > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote: > > > I have just installed Rocks 3.0 with one frontend and two compute > > > node. > > > > > > A normal file based application is installed on the frontend and is > > > NFS shared to the compute nodes . > > > > > > Question is : When run 5 sessions of my applications , the CPU > > > utilization is all concentrated on the frontend node , nothing is > > > being passed on to the compute nodes . How do I make these 3 > computers > > > to function as one and share the load ? > > > > > > Thanks everyone as I am really new to this clustering stuff.. > > > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive > > > intel machines to replace our existing multi CPU sun server, > > > suggestions and recommendations are greatly appreciated. > > > > > > > > > Leong > > > > > > > > > > > > > > > > > --__--__-- > > > > _______________________________________________ > > npaci-rocks-discussion mailing list > > npaci-rocks-discussion at sdsc.edu > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > > > > > End of npaci-rocks-discussion Digest > > > > > > DISCLAIMER: > > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its contents to any > other person as it may be an offence under the Official Secrets Act. > Thank you. -- Laurence Liew CTO, Scalable Systems Pte Ltd 7 Bedok South Road Singapore 469272 Tel : 65 6827 3953 Fax : 65 6827 3922 Mobile: 65 9029 4312 Email : laurence at scalablesys.com http://www.scalablesys.com
  • 17.
    From DGURGUL atPARTNERS.ORG Wed Dec 3 07:24:29 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Wed, 3 Dec 2003 10:24:29 -0500 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for Itanium? Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu> Where do we find the SGE roll? Under Lhoste at http://rocks.npaci.edu/Rocks/ there is a "Grid" roll listed. Is SGE in that? The userguide doesn't mention SGE. Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 -----Original Message----- From: npaci-rocks-discussion-admin at sdsc.edu [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Laurence Liew Sent: Tuesday, December 02, 2003 10:10 PM To: Nai Hong Hwa Francis Cc: npaci-rocks-discussion at sdsc.edu Subject: RE: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRocks 3 for Itanium? Hi, SGE is in the SGE roll. You need to download the base, hpc and sge roll. The install is now different from V2.3.x Cheers! laurence On Wed, 2003-12-03 at 10:50, Nai Hong Hwa Francis wrote: > Hi Laurence, > > I just downloaded the Rocks3.0 for IA32 and installed it but SGE is > still not working. > > Any idea? > > Nai Hong Hwa Francis > Institute of Molecular and Cell Biology (A*STAR) > 30 Medical Drive > Singapore 117609. > DID: (65) 6874-6196 > > -----Original Message----- > From: Laurence Liew [mailto:laurence at scalablesys.com] > Sent: Thursday, November 20, 2003 2:53 PM
  • 18.
    > To: Nai Hong Hwa Francis > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]RE: When will Sun Grid Engine be included > inRocks 3 for Itanium? > > Hi Francis > > GridEngine roll is ready for ia32. We will get a ia64 native version > ready as soon as we get back from SC2003. It will be released in a few > weeks time. > > Globus GT2.4 is included in the Grid Roll > > Cheers! > Laurence > > > On Thu, 2003-11-20 at 10:13, Nai Hong Hwa Francis wrote: > > > > Hi, > > > > Does anyone have any idea when will Sun Grid Engine be included as > part > > of Rocks 3 distribution. > > > > I am a newbie to Grid Computing. > > Anyone have any idea on how to invoke Globus in Rocks to setup a Grid? > > > > Regards > > > > Nai Hong Hwa Francis > > > > Institute of Molecular and Cell Biology (A*STAR) > > 30 Medical Drive > > Singapore 117609 > > DID: 65-6874-6196 > > > > -----Original Message----- > > From: npaci-rocks-discussion-request at sdsc.edu > > [mailto:npaci-rocks-discussion-request at sdsc.edu] > > Sent: Thursday, November 20, 2003 4:01 AM > > To: npaci-rocks-discussion at sdsc.edu > > Subject: npaci-rocks-discussion digest, Vol 1 #613 - 3 msgs > > > > Send npaci-rocks-discussion mailing list submissions to > > npaci-rocks-discussion at sdsc.edu > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > or, via email, send a message with subject or body 'help' to > > npaci-rocks-discussion-request at sdsc.edu > > > > You can reach the person managing the list at > > npaci-rocks-discussion-admin at sdsc.edu > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of npaci-rocks-discussion digest..." > >
  • 19.
    > > > > Today's Topics: > > > > 1. top500 cluster installation movie (Greg Bruno) > > 2. Re: Running Normal Application on Rocks Cluster - > > Newbie Question (Laurence Liew) > > > > --__--__-- > > > > Message: 1 > > To: npaci-rocks-discussion at sdsc.edu > > From: Greg Bruno <bruno at rocksclusters.org> > > Date: Tue, 18 Nov 2003 13:41:15 -0800 > > Subject: [Rocks-Discuss]top500 cluster installation movie > > > > here's a crew of 7, installing the 201st fastest supercomputer in the > > world in under two hours on the showroom floor at SC 03: > > > > http://www.rocksclusters.org/rocks.mov > > > > warning: the above file is ~65MB. > > > > - gb > > > > > > --__--__-- > > > > Message: 2 > > Subject: Re: [Rocks-Discuss]Running Normal Application on Rocks > Cluster > > - > > Newbie Question > > From: Laurence Liew <laurenceliew at yahoo.com.sg> > > To: Leong Chee Shian <chee-shian.leong at schenker.com> > > Cc: npaci-rocks-discussion at sdsc.edu > > Date: Wed, 19 Nov 2003 12:31:18 +0800 > > > > Chee Shian, > > > > Thanks for your call. We will take this off list and visit you next > week > > in your office as you requested. > > > > Cheers! > > laurence > > > > > > > > On Tue, 2003-11-18 at 17:29, Leong Chee Shian wrote: > > > I have just installed Rocks 3.0 with one frontend and two compute > > > node. > > > > > > A normal file based application is installed on the frontend and is > > > NFS shared to the compute nodes . > > > > > > Question is : When run 5 sessions of my applications , the CPU > > > utilization is all concentrated on the frontend node , nothing is > > > being passed on to the compute nodes . How do I make these 3 > computers
  • 20.
    > > >to function as one and share the load ? > > > > > > Thanks everyone as I am really new to this clustering stuff.. > > > > > > PS : The idea of exploring rocks cluster is to use a few inexpensive > > > intel machines to replace our existing multi CPU sun server, > > > suggestions and recommendations are greatly appreciated. > > > > > > > > > Leong > > > > > > > > > > > > > > > > > --__--__-- > > > > _______________________________________________ > > npaci-rocks-discussion mailing list > > npaci-rocks-discussion at sdsc.edu > > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > > > > > End of npaci-rocks-discussion Digest > > > > > > DISCLAIMER: > > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its contents to any > other person as it may be an offence under the Official Secrets Act. > Thank you. -- Laurence Liew CTO, Scalable Systems Pte Ltd 7 Bedok South Road Singapore 469272 Tel : 65 6827 3953 Fax : 65 6827 3922 Mobile: 65 9029 4312 Email : laurence at scalablesys.com http://www.scalablesys.com From bruno at rocksclusters.org Wed Dec 3 07:32:14 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 3 Dec 2003 07:32:14 -0800 Subject: [Rocks-Discuss]RE: When will Sun Grid Engine be included inRo cks 3 for Itanium? In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu> References: <BC447F1AD529D311B4DE0008C71BF2EB0AE157F7@phsexch7.mgh.harvard.edu> Message-ID: <DF132702-25A5-11D8-86E6-000A95C4E3B4@rocksclusters.org> > Where do we find the SGE roll? Under Lhoste at > http://rocks.npaci.edu/Rocks/ > there is a "Grid" roll listed. Is SGE in that? The userguide doesn't > mention > SGE.
  • 21.
    the SGE rollwill be available in the upcoming v3.1.0 release. scheduled release date is december 15th. - gb From jlkaiser at fnal.gov Wed Dec 3 08:35:18 2003 From: jlkaiser at fnal.gov (Joe Kaiser) Date: Wed, 03 Dec 2003 10:35:18 -0600 Subject: [Rocks-Discuss]supermicro based MB's In-Reply-To: <3FCC824B.5060406@scalableinformatics.com> References: <3FCC824B.5060406@scalableinformatics.com> Message-ID: <1070469318.12324.13.camel@nietzsche.fnal.gov> Hi, You don't say what version of Rocks you are using. The following is for the X5DPA-GG board and Rocks 3.0. It requires modifying only the pcitable in the boot image on the tftp server. I believe the procedure for 2.3.2 requires a heck of a lot more work, (but it may not). I would have to dig deep for the notes about the changing 2.3.2. This should be done on the frontend: cd /tftpboot/X86PC/UNDI/pxelinux/ cp initrd.img initrd.img.orig cp initrd.img /tmp cd /tmp mv initrd.img initrd.gz gunzip initrd.gz mkdir /mnt/loop mount -o loop initrd /mnt/loop cd /mnt/loop/modules/ vi pcitable Search for the e1000 drivers and add the following line: 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet Controller" write the file cd /tmp umount /mnt/loop gzip initrd mv initrd.gz initrd.img mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/ Then boot the node. Hope this helps. Thanks, Joe On Tue, 2003-12-02 at 06:15, Joe Landman wrote:
  • 22.
    > Folks: > > Working on integrating a Supermicro MB based cluster. Discovered early > on that all of the compute nodes have an Intel based NIC that RedHat > doesn't know anything about (any version of RH). Some of the > administrative nodes have other similar issues. I am seeing simply a > suprising number of mis/un detected hardware across the collection of MBs. > > Anyone have advice on where to get modules/module source for Redhat > for these things? It looks like I will need to rebuild the boot CD, > though the several times I have tried this previously have failed to > produce a working/bootable system. It looks like new modules need to be > created/inserted into the boot process (head node and cluster nodes) > kernels, as well as into the installable kernels. > > Has anyone done this for a Supermicro MB based system? Thanks . > > Joe -- =================================================================== Joe Kaiser - Systems Administrator Fermi Lab CD/OSS-SCS Never laugh at live dragons. 630-840-6444 jlkaiser at fnal.gov =================================================================== From jghobrial at uh.edu Wed Dec 3 08:59:15 2003 From: jghobrial at uh.edu (Joseph) Date: Wed, 3 Dec 2003 10:59:15 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> Message-ID: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> Here is the error I receive when I remove the file encoder.pyc and run the command cluster-fork Traceback (innermost last): File "/opt/rocks/sbin/cluster-fork", line 88, in ? import rocks.pssh File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? import gmon.encoder ImportError: No module named encoder Thanks, Joseph On Tue, 2 Dec 2003, Mason J. Katz wrote: > Python creates the .pyc files for you, and does not remove the original
  • 23.
    > .py file. I would be extremely surprised it two "identical" .pyc files > had the same md5 checksum. I'd expect this to be more like C .o file > which always contain random data to pad out to the end of a page and > 32/64 bit word sizes. Still this is just a guess, the real point is > you can always remove the .pyc files and the .py will regenerate it > when imported (although standard UNIX file/dir permission still apply). > > What is the import error you get from cluster-fork? > > -mjk > > On Dec 2, 2003, at 9:02 AM, Angel Li wrote: > > > Joseph wrote: > > > >> Indeed my md5sum is different for encoder.pyc. However, when I pulled > >> the file and run "cluster-fork" python responds about an import > >> problem. So it seems that regeneration did not occur. Is there a flag > >> I need to pass? > >> > >> I have also tried to figure out what package provides encoder and > >> reinstall the package, but an rpm query reveals nothing. > >> > >> If this is a generated file, what generates it? > >> > >> It seems that an rpm file query on ganglia show that files in the > >> directory belong to the package, but encoder.pyc does not. > >> > >> Thanks, > >> Joseph > >> > >> > >> > > I have finally found the python sources in the HPC rolls CD, filename > > ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > > seems python "compiles" the .py files to ".pyc" and then deletes the > > source file the first time they are referenced? I also noticed that > > there are two versions of python installed. Maybe the pyc files from > > one version won't load into the other one? > > > > Angel > > > > > From mjk at sdsc.edu Wed Dec 3 15:19:38 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 3 Dec 2003 15:19:38 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> Message-ID: <2A332131-25E7-11D8-A641-000A95DA5638@sdsc.edu> This file come from a ganglia package, what does
  • 24.
    # rpm -qganglia-receptor Return? -mjk On Dec 3, 2003, at 8:59 AM, Joseph wrote: > Here is the error I receive when I remove the file encoder.pyc and run > the > command cluster-fork > > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > import gmon.encoder > ImportError: No module named encoder > > Thanks, > Joseph > > > On Tue, 2 Dec 2003, Mason J. Katz wrote: > >> Python creates the .pyc files for you, and does not remove the >> original >> .py file. I would be extremely surprised it two "identical" .pyc >> files >> had the same md5 checksum. I'd expect this to be more like C .o file >> which always contain random data to pad out to the end of a page and >> 32/64 bit word sizes. Still this is just a guess, the real point is >> you can always remove the .pyc files and the .py will regenerate it >> when imported (although standard UNIX file/dir permission still >> apply). >> >> What is the import error you get from cluster-fork? >> >> -mjk >> >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote: >> >>> Joseph wrote: >>> >>>> Indeed my md5sum is different for encoder.pyc. However, when I >>>> pulled >>>> the file and run "cluster-fork" python responds about an import >>>> problem. So it seems that regeneration did not occur. Is there a >>>> flag >>>> I need to pass? >>>> >>>> I have also tried to figure out what package provides encoder and >>>> reinstall the package, but an rpm query reveals nothing. >>>> >>>> If this is a generated file, what generates it? >>>> >>>> It seems that an rpm file query on ganglia show that files in the
  • 25.
    >>>> directory belongto the package, but encoder.pyc does not. >>>> >>>> Thanks, >>>> Joseph >>>> >>>> >>>> >>> I have finally found the python sources in the HPC rolls CD, filename >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it >>> seems python "compiles" the .py files to ".pyc" and then deletes the >>> source file the first time they are referenced? I also noticed that >>> there are two versions of python installed. Maybe the pyc files from >>> one version won't load into the other one? >>> >>> Angel >>> >>> >> From csamuel at vpac.org Wed Dec 3 18:09:26 2003 From: csamuel at vpac.org (Chris Samuel) Date: Thu, 4 Dec 2003 13:09:26 +1100 Subject: [Rocks-Discuss]Confirmation of Rocks 3.1.0 Opteron support & RHEL trademark removal ? Message-ID: <200312041309.27986.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi folks, Can someone confirm that the next Rocks release will support Opteron please ? Also, I noticed that the current Rocks release on Itanium based on RHEL still has a lot of mentions of RedHat in it, which from my reading of their trademark guidelines is not permitted, is that fixed in the new version ? cheers! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/zpdWO2KABBYQAh8RAqB8AJ9FG+IjIeem21qlFS6XYIHamIMPmwCghVTV AgjAlVHWgdv/KzYQinHGPxs= =IAWU -----END PGP SIGNATURE----- From bruno at rocksclusters.org Wed Dec 3 18:46:30 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 3 Dec 2003 18:46:30 -0800
  • 26.
    Subject: [Rocks-Discuss]Confirmation ofRocks 3.1.0 Opteron support & RHEL trademark removal ? In-Reply-To: <200312041309.27986.csamuel@vpac.org> References: <200312041309.27986.csamuel@vpac.org> Message-ID: <10AD9827-2604-11D8-86E6-000A95C4E3B4@rocksclusters.org> > Can someone confirm that the next Rocks release will support Opteron > please ? yes, it will support opteron. > Also, I noticed that the current Rocks release on Itanium based on > RHEL still > has a lot of mentions of RedHat in it, which from my reading of their > trademark guidelines is not permitted, is that fixed in the new > version ? and yes, (even though it doesn't feel like the right thing to do, as redhat has offered to the community some outstanding technologies that we'd like to credit), all redhat trademarks will be removed from 3.1.0. - gb From fds at sdsc.edu Thu Dec 4 06:46:32 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Thu, 4 Dec 2003 06:46:32 -0800 Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8- A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> Message-ID: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu> Please install the http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1 -2.i386.rpm package, which includes the correct encoder.py file. (This package is listed on the 3.0.0 errata page) -Federico On Dec 3, 2003, at 8:59 AM, Joseph wrote: > Here is the error I receive when I remove the file encoder.pyc and run > the > command cluster-fork > > Traceback (innermost last): > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > import rocks.pssh > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > import gmon.encoder > ImportError: No module named encoder > > Thanks, > Joseph
  • 27.
    > > > On Tue,2 Dec 2003, Mason J. Katz wrote: > >> Python creates the .pyc files for you, and does not remove the >> original >> .py file. I would be extremely surprised it two "identical" .pyc >> files >> had the same md5 checksum. I'd expect this to be more like C .o file >> which always contain random data to pad out to the end of a page and >> 32/64 bit word sizes. Still this is just a guess, the real point is >> you can always remove the .pyc files and the .py will regenerate it >> when imported (although standard UNIX file/dir permission still >> apply). >> >> What is the import error you get from cluster-fork? >> >> -mjk >> >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote: >> >>> Joseph wrote: >>> >>>> Indeed my md5sum is different for encoder.pyc. However, when I >>>> pulled >>>> the file and run "cluster-fork" python responds about an import >>>> problem. So it seems that regeneration did not occur. Is there a >>>> flag >>>> I need to pass? >>>> >>>> I have also tried to figure out what package provides encoder and >>>> reinstall the package, but an rpm query reveals nothing. >>>> >>>> If this is a generated file, what generates it? >>>> >>>> It seems that an rpm file query on ganglia show that files in the >>>> directory belong to the package, but encoder.pyc does not. >>>> >>>> Thanks, >>>> Joseph >>>> >>>> >>>> >>> I have finally found the python sources in the HPC rolls CD, filename >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it >>> seems python "compiles" the .py files to ".pyc" and then deletes the >>> source file the first time they are referenced? I also noticed that >>> there are two versions of python installed. Maybe the pyc files from >>> one version won't load into the other one? >>> >>> Angel >>> >>> >> >> Federico Rocks Cluster Group, San Diego Supercomputing Center, CA
  • 28.
    From jghobrial atuh.edu Thu Dec 4 07:14:21 2003 From: jghobrial at uh.edu (Joseph) Date: Thu, 4 Dec 2003 09:14:21 -0600 (CST) Subject: [Rocks-Discuss]cluster-fork In-Reply-To: <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu> References: <3FCB879E.8050905@miami.edu> <Pine.LNX.4.56.0312011331460.5615@mail.tlc2.uh.edu> <1B15A45F-2457-11D8-A374-00039389B580@uci.edu> <Pine.LNX.4.56.0312021000490.7581@mail.tlc2.uh.edu> <3FCCC5BF.3030903@miami.edu> <A43157DE-2522-11D8-A7A4-000A95DA5638@sdsc.edu> <Pine.LNX.4.56.0312031057280.11073@mail.tlc2.uh.edu> <A69923FA-2668-11D8-804D-000393A4725A@sdsc.edu> Message-ID: <Pine.LNX.4.56.0312040913110.13972@mail.tlc2.uh.edu> Thank you very much this solved the problem. Joseph On Thu, 4 Dec 2003, Federico Sacerdoti wrote: > Please install the > http://www.rocksclusters.org/errata/3.0.0/ganglia-python-3.0.1 > -2.i386.rpm package, which includes the correct encoder.py file. (This > package is listed on the 3.0.0 errata page) > > -Federico > > On Dec 3, 2003, at 8:59 AM, Joseph wrote: > > > Here is the error I receive when I remove the file encoder.pyc and run > > the > > command cluster-fork > > > > Traceback (innermost last): > > File "/opt/rocks/sbin/cluster-fork", line 88, in ? > > import rocks.pssh > > File "/opt/rocks/lib/python/rocks/pssh.py", line 96, in ? > > import gmon.encoder > > ImportError: No module named encoder > > > > Thanks, > > Joseph > > > > > > On Tue, 2 Dec 2003, Mason J. Katz wrote: > > > >> Python creates the .pyc files for you, and does not remove the > >> original > >> .py file. I would be extremely surprised it two "identical" .pyc > >> files > >> had the same md5 checksum. I'd expect this to be more like C .o file > >> which always contain random data to pad out to the end of a page and > >> 32/64 bit word sizes. Still this is just a guess, the real point is > >> you can always remove the .pyc files and the .py will regenerate it > >> when imported (although standard UNIX file/dir permission still > >> apply).
  • 29.
    > >> > >> What is the import error you get from cluster-fork? > >> > >> -mjk > >> > >> On Dec 2, 2003, at 9:02 AM, Angel Li wrote: > >> > >>> Joseph wrote: > >>> > >>>> Indeed my md5sum is different for encoder.pyc. However, when I > >>>> pulled > >>>> the file and run "cluster-fork" python responds about an import > >>>> problem. So it seems that regeneration did not occur. Is there a > >>>> flag > >>>> I need to pass? > >>>> > >>>> I have also tried to figure out what package provides encoder and > >>>> reinstall the package, but an rpm query reveals nothing. > >>>> > >>>> If this is a generated file, what generates it? > >>>> > >>>> It seems that an rpm file query on ganglia show that files in the > >>>> directory belong to the package, but encoder.pyc does not. > >>>> > >>>> Thanks, > >>>> Joseph > >>>> > >>>> > >>>> > >>> I have finally found the python sources in the HPC rolls CD, filename > >>> ganglia-python-3.0.0-2.i386.rpm. I'm not familiar with python but it > >>> seems python "compiles" the .py files to ".pyc" and then deletes the > >>> source file the first time they are referenced? I also noticed that > >>> there are two versions of python installed. Maybe the pyc files from > >>> one version won't load into the other one? > >>> > >>> Angel > >>> > >>> > >> > >> > Federico > > Rocks Cluster Group, San Diego Supercomputing Center, CA > From vrowley at ucsd.edu Thu Dec 4 12:29:55 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Thu, 04 Dec 2003 12:29:55 -0800 Subject: [Rocks-Discuss]Re: PXE boot problems In-Reply-To: <3FCBC037.5000302@ucsd.edu> References: <3FCBC037.5000302@ucsd.edu> Message-ID: <3FCF9943.1020806@ucsd.edu> Uh, nevermind. We had upgraded syslinux on our frontend, not the node we were trying to PXE boot. Sigh. V. Rowley wrote:
  • 30.
    > We haveinstalled a ROCKS 3.0.0 frontend on a DL380 and are trying to > install a compute node via PXE. We are getting an error similar to the > one mentioned in the archives, e.g. > >> Loading initrd.img.... >> Ready >> >> Failed to free base memory >> > > We have upgraded to syslinux-2.07-1, per the suggestion in the archives, > but continue to get the same error. Any ideas? > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb From cdwan at mail.ahc.umn.edu Fri Dec 5 08:16:07 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Fri, 5 Dec 2003 10:16:07 -0600 (CST) Subject: [Rocks-Discuss]Private NIS master Message-ID: <Pine.GSO.4.58.0312042305070.18193@lenti.med.umn.edu> Hello all. Long time listener, first time caller. Thanks for all the great work. I'm integrating a Rocks cluster into an existing NIS domain. I noticed that while the cluster database now supports a PrivateNISMaster, that variable doesn't make it into the /etc/yp.conf on the compute nodes. They remain broadcast. Assume that, for whatever reason, I don't want to set up a repeater (slave) ypserv process on my frontend. I added the option "--nisserver <var name="Kickstart_PrivateNISMaster"/>" to the "profiles/3.0.0/nodes/nis-client.xml" file, removed the ypserver on my frontend, and it works like I want it to. Am I missing anything fundamental here? -Chris Dwan University of Minnesota From wyzhong78 at msn.com Mon Dec 8 06:18:34 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Mon, 08 Dec 2003 22:18:34 +0800 Subject: [Rocks-Discuss]3.0.0 problem: not able to boot up Message-ID: <BAY3-F14uFqD45TpNO40002c14c@hotmail.com> Hi,everyone!
  • 31.
    I installed rocks3.0.0 defautly, There wasn't any trouble in the installing. But I haven't be able to boot,it stopped at the beginning,the message "GRUB" showed on the screen,and waiting.... my hardware are double Xeon 2.4G,MSI 9138,Seagate SCSI disk. Any appreciate is welcome! _________________________________________________________________ ???? MSN Explorer: http://explorer.msn.com/lccn/ From angelini at vki.ac.be Mon Dec 8 06:20:45 2003 From: angelini at vki.ac.be (Angelini Giuseppe) Date: Mon, 08 Dec 2003 15:20:45 +0100 Subject: [Rocks-Discuss]How to use MPICH with ssh Message-ID: <3FD488BD.3EBBDB8D@vki.ac.be> Dear rocks folk, I have recently installed mpich with Lahay Fortran and now that I can compile and link, I would like to run but it seems that I have another problem. In fact I have the following error message when I try to run: [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE $DPT/hybflow p0_13226: p4_error: Path to program is invalid while starting /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7: -1 p4_error: latest msg from perror: No such file or directory p0_13226: p4_error: Child process exited while making connection to remote process on compute-0-6: 0 p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32 p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32 I am wondering why it is looking for /usr/bin/rsh for the communication, I expected to use ssh and not rsh. Any help will be welcome. Regards. Giuseppe Angelini From casuj at cray.com Mon Dec 8 07:31:21 2003 From: casuj at cray.com (John Casu) Date: Mon, 8 Dec 2003 07:31:21 -0800 Subject: [Rocks-Discuss]How to use MPICH with ssh In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>; from Angelini Giuseppe on Mon, Dec 08, 2003 at 03:20:45PM +0100 References: <3FD488BD.3EBBDB8D@vki.ac.be> Message-ID: <20031208073121.A10151@stemp3.wc.cray.com>
  • 32.
    On Mon, Dec08, 2003 at 03:20:45PM +0100, Angelini Giuseppe wrote: > > Dear rocks folk, > > > I have recently installed mpich with Lahay Fortran and now that I can > compile and link, > I would like to run but it seems that I have another problem. In fact I > have the following > error message when I try to run: > > [panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE > $DPT/hybflow > p0_13226: p4_error: Path to program is invalid while starting > /dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7: > -1 > p4_error: latest msg from perror: No such file or directory > p0_13226: p4_error: Child process exited while making connection to > remote process on compute-0-6: 0 > p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32 > p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32 > > I am wondering why it is looking for /usr/bin/rsh for the communication, > > I expected to use ssh and not rsh. > > Any help will be welcome. > build mpich thus: RSHCOMMAND=ssh ./configure ..... > > Regards. > > > Giuseppe Angelini -- "Roses are red, Violets are blue, You lookin' at me ? YOU LOOKIN' AT ME ?!" -- Get Fuzzy. ======================================================================= John Casu Cray Inc. casuj at cray.com 411 First Avenue South, Suite 600 Tel: (206) 701-2173 Seattle, WA 98104-2860 Fax: (206) 701-2500 ======================================================================= From davidow at molbio.mgh.harvard.edu Mon Dec 8 08:12:53 2003 From: davidow at molbio.mgh.harvard.edu (Lance Davidow) Date: Mon, 8 Dec 2003 11:12:53 -0500 Subject: [Rocks-Discuss]How to use MPICH with ssh In-Reply-To: <3FD488BD.3EBBDB8D@vki.ac.be>
  • 33.
    References: <3FD488BD.3EBBDB8D@vki.ac.be> Message-ID: <p06002001bbfa51fea005@[132.183.190.222]> Giuseppe, Here'san answer from a newbie who just faced the same problem. You are using the wrong flavor of mpich (and mpirun). There are several different distributions which work differently in ROCKS. the one you are using in the default path expects serv_p4 demons and .rhosts files in your home directory. The different flavors may be more compatible with different compilers as well. [lance at rescluster2 lance]$ which mpirun /opt/mpich-mpd/gnu/bin/mpirun the one you probably want is /opt/mpich/gnu/bin/mpirun [lance at rescluster2 lance]$ locate mpirun ... /opt/mpich-mpd/gnu/bin/mpirun ... /opt/mpich/myrinet/gnu/bin/mpirun ... /opt/mpich/gnu/bin/mpirun Cheers, Lance At 3:20 PM +0100 12/8/03, Angelini Giuseppe wrote: >Dear rocks folk, > > >I have recently installed mpich with Lahay Fortran and now that I can >compile and link, >I would like to run but it seems that I have another problem. In fact I >have the following >error message when I try to run: > >[panara at compute-0-7 ~]$ mpirun -np $NPROC -machinefile $PBS_NODEFILE >$DPT/hybflow >p0_13226: p4_error: Path to program is invalid while starting >/dc_03_04/panara/PREPRO_TESTS/hybflow with /usr/bin/rsh on compute-0-7: >-1 > p4_error: latest msg from perror: No such file or directory >p0_13226: p4_error: Child process exited while making connection to >remote process on compute-0-6: 0 >p0_13226: (6.025133) net_send: could not write to fd=4, errno = 32 >p0_13226: (6.025231) net_send: could not write to fd=4, errno = 32 > >I am wondering why it is looking for /usr/bin/rsh for the communication, > >I expected to use ssh and not rsh. > >Any help will be welcome. > >
  • 34.
    >Regards. > >Giuseppe Angelini -- Lance Davidow,PhD Director of Bioinformatics Dept of Molecular Biology Mass General Hospital Boston MA 02114 davidow at molbio.mgh.harvard.edu 617.726-5955 Fax: 617.726-6893 From rscarce at caci.com Fri Dec 5 16:43:00 2003 From: rscarce at caci.com (Reed Scarce) Date: Fri, 5 Dec 2003 19:43:00 -0500 Subject: [Rocks-Discuss]PXE and system images Message-ID: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com> We want to initialize new hardware with a known good image from identical hardware currently in use. The process imagined would be to PXE boot to a disk image server, PXE would create a RAM system that would request the system disk image from the server, which would push the desired system disk image to the requesting system. Upon completion the system would be available as a cluster member. The lab configuration is a PC grade frontend with two 3Com 905s and a single server grade cluster node with integrated Intel 82551 (10/100)(the only PXE interface) and two integrated Intel 82546 (10/100/1000). The cluster node is one of the stock of nodes for the expansion. The stock of nodes have a Linux OS pre-installed, which would be eliminated in the process. Currently the node will PXE boot from the 10/100 and pickup an installation boot from one of the g-bit interfaces. From there kickstart wants to take over. Any recommendations how to get kickstart to push an image to the disk? Thanks, Reed Scarce -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031205/dad04521/attachment-0001.html From wyzhong78 at msn.com Mon Dec 8 05:36:37 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Mon, 08 Dec 2003 21:36:37 +0800 Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up Message-ID: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com> Hi,everyone! I have installed Rocks 3.0.0 with default options successful,there was not any trouble.But I boot it up,it stopped at beginning,just show "GRUB" on
  • 35.
    the screen andwaiting... Thanks for your help! _________________________________________________________________ ???? MSN Explorer: http://explorer.msn.com/lccn/ From daniel.kidger at quadrics.com Mon Dec 8 09:54:53 2003 From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com) Date: Mon, 8 Dec 2003 17:54:53 -0000 Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> Dear all, Previously I have been installing a custom kernel on the compute nodes with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf). However I am now trying to do it the 'proper' way. So I do (on : # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm /home/install/rocks-dist/7.3/en/os/i386/force/RPMS # cd /home/install # rocks-dist dist # SSH_NO_PASSWD=1 shoot-node compute-0-0 Hence: # find /home/install/ |xargs -l grep -nH qsnet shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate my rpm in that directory rocks-dist notices this and warns me.) However the node always ends up with "2.4.20-20.7smp" again. anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel- smp-2.4.20-20.7." So my question is: It looks like my RPM has a name that Rocks doesn't understand properly. What is wrong with my name ? and what are the rules for getting the correct name ? (.i686.rpm is of course correct, but I don't have -smp. in the name Is this the problem ?) cf. Greg Bruno's wisdom: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- > From DGURGUL at PARTNERS.ORG Mon Dec 8 11:09:27 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Mon, 8 Dec 2003 14:09:27 -0500
  • 36.
    Subject: [Rocks-Discuss]cluster-fork --mpdstrangeness Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15840@phsexch7.mgh.harvard.edu> I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and then "cluster-fork service gschedule restart" (not sure I had to do the last). I also put 3.0.1-2 and restarted gschedule on the frontend. Now I run "cluster-fork --mpd w". I currently have a user who ssh'd to compute-0-8 from the frontend and one who ssh'd into compute-0-17 from the front end. But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for the user on 0-17): 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash When I do "cluster-fork w" (without the --mpd) the users show up on the correct nodes. Do the numbers on the left of the -mpd output correspond to the node names? Thanks. Dennis Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 From DGURGUL at PARTNERS.ORG Mon Dec 8 11:28:30 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Mon, 8 Dec 2003 14:28:30 -0500 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu> Maybe this is a better description of the "strangeness". I did "cluster-fork --mpd hostname": 1: compute-0-0.local 2: compute-0-1.local 3: compute-0-3.local 4: compute-0-13.local 5: compute-0-11.local 6: compute-0-15.local 7: compute-0-16.local 8: compute-0-19.local 9: compute-0-21.local
  • 37.
    10: compute-0-17.local 11: compute-0-5.local 12:compute-0-20.local 13: compute-0-18.local 14: compute-0-12.local 15: compute-0-9.local 16: compute-0-4.local 17: compute-0-8.local 18: compute-0-14.local 19: compute-0-2.local 20: compute-0-6.local 0: compute-0-7.local 21: compute-0-10.local Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 -----Original Message----- From: npaci-rocks-discussion-admin at sdsc.edu [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, Dennis J. Sent: Monday, December 08, 2003 2:09 PM To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]cluster-fork --mpd strangeness I just did "cluster-fork -Uvh /sourcedir/ganglia-python-3.0.1-2.i386.rpm" and then "cluster-fork service gschedule restart" (not sure I had to do the last). I also put 3.0.1-2 and restarted gschedule on the frontend. Now I run "cluster-fork --mpd w". I currently have a user who ssh'd to compute-0-8 from the frontend and one who ssh'd into compute-0-17 from the front end. But the return shows the users on lines for 17 (for the user on 0-8) and 10 (for the user on 0-17): 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, 0.03 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s -bash 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, 0.07 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s -bash When I do "cluster-fork w" (without the --mpd) the users show up on the correct nodes. Do the numbers on the left of the -mpd output correspond to the node names?
  • 38.
    Thanks. Dennis Dennis J. Gurgul PartnersHealth Care System Research Management Research Computing Core 617.724.3169 From tim.carlson at pnl.gov Mon Dec 8 12:35:16 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 08 Dec 2003 12:35:16 -0800 (PST) Subject: [Rocks-Discuss]PXE and system images In-Reply-To: <OFF783DCCA.8F016562-ON85256DF3.008001FC-85256DF7.00043E45@caci.com> Message-ID: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov> On Fri, 5 Dec 2003, Reed Scarce wrote: > We want to initialize new hardware with a known good image from identical > hardware currently in use. The process imagined would be to PXE boot to a > disk image server, PXE would create a RAM system that would request the > system disk image from the server, which would push the desired system > disk image to the requesting system. Upon completion the system would be > available as a cluster member. > > The lab configuration is a PC grade frontend with two 3Com 905s and a > single server grade cluster node with integrated Intel 82551 (10/100)(the > only PXE interface) and two integrated Intel 82546 (10/100/1000). The > cluster node is one of the stock of nodes for the expansion. The stock of > nodes have a Linux OS pre-installed, which would be eliminated in the > process. > > Currently the node will PXE boot from the 10/100 and pickup an > installation boot from one of the g-bit interfaces. From there kickstart > wants to take over. > > Any recommendations how to get kickstart to push an image to the disk? This sounds like you want to use Oscar instead of ROCKS. http://oscar.openclustergroup.org/tiki-index.php I'm not exactly sure why you think that the kickstart process won't give you exactly the same image on ever machine. If the hardware is the same, you'll get the same image on each machine. We have boxes with the same setup, 10/100 PXE, and then dual gigabit. Our method for installing ROCKS on this type of hardware is the following 1) Run insert-ethers and choose "manager" type of node. 2) Connect all the PXE interfaces to the switch and boot them all. Do not connect the gigabit interface 3) Once all of the nodes have PXE booted, exit insert-ethers. Start insert-ethers again and this time choose compute node 4) Hook up the gigabit interface and the PXE interface to your nodes. All
  • 39.
    of your machineswill now install. 5) In our case, we now quickly disconnect the PXE interface because we don't want to have the machine continually install. The real ROCKS method would have you choose (HD/net) for booting in the BIOS, but if you already have an OS on your machine, you would have to go into the BIOS twice before the compute nodes were installed. We disable rocks-grub and just connect up the PXE cable if we need to reinstall. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From tim.carlson at pnl.gov Mon Dec 8 12:42:23 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Mon, 08 Dec 2003 12:42:23 -0800 (PST) Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> Message-ID: <Pine.LNX.4.44.0312081238270.19031-100000@scorpion.emsl.pnl.gov> On Mon, 8 Dec 2003 daniel.kidger at quadrics.com wrote: I've gotten confused from time to time as to where to place custom RPMS (it's changed between releases), so my not-so-clean method is to just rip out the kernels in /home/install/rocks-dist/7.3/en/os/i386/Redhat/RPMS and drop my own in. Then do a cd /home/install rocks-dist dist shoot-node You are probably running into an issue where the "force" directory is more of an "in addition to" directory and your 2.4.18 kernel is being noted, but ignored since the 2.4.20 kernel is newer. I assume you nodes get both and SMP and UP version of 2.4.20 and that your custom 2.4.18 is nowhere to be found on the compute node. Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support > Previously I have been installing a custom kernel on the compute nodes > with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf). > > However I am now trying to do it the 'proper' way. So I do (on : > # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm > /home/install/rocks-dist/7.3/en/os/i386/force/RPMS > # cd /home/install > # rocks-dist dist > # SSH_NO_PASSWD=1 shoot-node compute-0-0 > > Hence: > # find /home/install/ |xargs -l grep -nH qsnet
  • 40.
    > shows methat hdlist and hdlist2 now contain this RPM. (and indeed If I duplicate my rpm in that directory rocks-dist notices this and warns me.) > > However the node always ends up with "2.4.20-20.7smp" again. > anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing kernel-smp-2.4.20-20.7." > > So my question is: > It looks like my RPM has a name that Rocks doesn't understand properly. > What is wrong with my name ? > and what are the rules for getting the correct name ? > (.i686.rpm is of course correct, but I don't have -smp. in the name Is this the problem ?) > > cf. Greg Bruno's wisdom: > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html > > > Yours, > Daniel. From fds at sdsc.edu Mon Dec 8 12:51:12 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 8 Dec 2003 12:51:12 -0800 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu> References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15843@phsexch7.mgh.harvard.edu> Message-ID: <423D0494-29C0-11D8-804D-000393A4725A@sdsc.edu> You are right, and I think this is a shortcoming of MPD. There is no obvious way to force the MPD numbering to correspond to the order the nodes were called out on the command line (cluster-fork --mpd actually makes a shell call to mpirun and it calls out all the node names explicitly). MPD seems to number the output differently, as you found out. So mpd for now may be more useful for jobs that are not sensitive to this. If enough of you find this shortcoming to be a real annoyance, we could work on putting the node name label on the output by explicitly calling "hostname" or similar. Good ideas are welcome :) -Federico On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote: > Maybe this is a better description of the "strangeness". > > I did "cluster-fork --mpd hostname": > > 1: compute-0-0.local > 2: compute-0-1.local > 3: compute-0-3.local > 4: compute-0-13.local > 5: compute-0-11.local > 6: compute-0-15.local > 7: compute-0-16.local
  • 41.
    > 8: compute-0-19.local > 9: compute-0-21.local > 10: compute-0-17.local > 11: compute-0-5.local > 12: compute-0-20.local > 13: compute-0-18.local > 14: compute-0-12.local > 15: compute-0-9.local > 16: compute-0-4.local > 17: compute-0-8.local > 18: compute-0-14.local > 19: compute-0-2.local > 20: compute-0-6.local > 0: compute-0-7.local > 21: compute-0-10.local > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > > > -----Original Message----- > From: npaci-rocks-discussion-admin at sdsc.edu > [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, > Dennis J. > Sent: Monday, December 08, 2003 2:09 PM > To: npaci-rocks-discussion at sdsc.edu > Subject: [Rocks-Discuss]cluster-fork --mpd strangeness > > > I just did "cluster-fork -Uvh > /sourcedir/ganglia-python-3.0.1-2.i386.rpm" > and > then "cluster-fork service gschedule restart" (not sure I had to do the > last). > I also put 3.0.1-2 and restarted gschedule on the frontend. > > Now I run "cluster-fork --mpd w". > > I currently have a user who ssh'd to compute-0-8 from the frontend and > one > who > ssh'd into compute-0-17 from the front end. > > But the return shows the users on lines for 17 (for the user on 0-8) > and 10 > (for > the user on 0-17): > > 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, > 0.03 > 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s > -bash > > 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04,
  • 42.
    > 0.07 > 10:USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s > -bash > > When I do "cluster-fork w" (without the --mpd) the users show up on the > correct > nodes. > > Do the numbers on the left of the -mpd output correspond to the node > names? > > Thanks. > > Dennis > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From DGURGUL at PARTNERS.ORG Mon Dec 8 12:55:13 2003 From: DGURGUL at PARTNERS.ORG (Gurgul, Dennis J.) Date: Mon, 8 Dec 2003 15:55:13 -0500 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness Message-ID: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu> Thanks. On a related note, when I did "cluster-fork service gschedule restart" gschedule started with the "OK" output, but then the fork process hung on each node and I had to ^c out for it to go on to the next node. I tried to ssh to a node and then did the gschedule restart. Even then, after I tried to "exit" out of the node, the session hung and I had to log back in and kill it from the frontend. Dennis J. Gurgul Partners Health Care System Research Management Research Computing Core 617.724.3169 -----Original Message----- From: Federico Sacerdoti [mailto:fds at sdsc.edu] Sent: Monday, December 08, 2003 3:51 PM To: Gurgul, Dennis J. Cc: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness
  • 43.
    You are right,and I think this is a shortcoming of MPD. There is no obvious way to force the MPD numbering to correspond to the order the nodes were called out on the command line (cluster-fork --mpd actually makes a shell call to mpirun and it calls out all the node names explicitly). MPD seems to number the output differently, as you found out. So mpd for now may be more useful for jobs that are not sensitive to this. If enough of you find this shortcoming to be a real annoyance, we could work on putting the node name label on the output by explicitly calling "hostname" or similar. Good ideas are welcome :) -Federico On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote: > Maybe this is a better description of the "strangeness". > > I did "cluster-fork --mpd hostname": > > 1: compute-0-0.local > 2: compute-0-1.local > 3: compute-0-3.local > 4: compute-0-13.local > 5: compute-0-11.local > 6: compute-0-15.local > 7: compute-0-16.local > 8: compute-0-19.local > 9: compute-0-21.local > 10: compute-0-17.local > 11: compute-0-5.local > 12: compute-0-20.local > 13: compute-0-18.local > 14: compute-0-12.local > 15: compute-0-9.local > 16: compute-0-4.local > 17: compute-0-8.local > 18: compute-0-14.local > 19: compute-0-2.local > 20: compute-0-6.local > 0: compute-0-7.local > 21: compute-0-10.local > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > > > -----Original Message----- > From: npaci-rocks-discussion-admin at sdsc.edu > [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, > Dennis J. > Sent: Monday, December 08, 2003 2:09 PM > To: npaci-rocks-discussion at sdsc.edu
  • 44.
    > Subject: [Rocks-Discuss]cluster-fork--mpd strangeness > > > I just did "cluster-fork -Uvh > /sourcedir/ganglia-python-3.0.1-2.i386.rpm" > and > then "cluster-fork service gschedule restart" (not sure I had to do the > last). > I also put 3.0.1-2 and restarted gschedule on the frontend. > > Now I run "cluster-fork --mpd w". > > I currently have a user who ssh'd to compute-0-8 from the frontend and > one > who > ssh'd into compute-0-17 from the front end. > > But the return shows the users on lines for 17 (for the user on 0-8) > and 10 > (for > the user on 0-17): > > 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, > 0.03 > 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s > -bash > > 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, > 0.07 > 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU > WHAT > 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s > -bash > > When I do "cluster-fork w" (without the --mpd) the users show up on the > correct > nodes. > > Do the numbers on the left of the -mpd output correspond to the node > names? > > Thanks. > > Dennis > > Dennis J. Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From mjk at sdsc.edu Mon Dec 8 12:58:22 2003
  • 45.
    From: mjk atsdsc.edu (Mason J. Katz) Date: Mon, 8 Dec 2003 12:58:22 -0800 Subject: [Rocks-Discuss]PXE and system images In-Reply-To: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov> References: <Pine.LNX.4.44.0312081226270.19031-100000@scorpion.emsl.pnl.gov> Message-ID: <4261C250-29C1-11D8-AECB-000A95DA5638@sdsc.edu> On Dec 8, 2003, at 12:35 PM, Tim Carlson wrote: > 5) In our case, we now quickly disconnect the PXE interface because we > don't want to have the machine continually install. The real ROCKS > method would have you choose (HD/net) for booting in the BIOS, but > if you already > have an OS on your machine, you would have to go into the BIOS twice > before the compute nodes were installed. We disable rocks-grub and > just > connect up the PXE cable if we need to reinstall. > For most boxes we've seen that support PXE there is an option to hit <F12> to force a network PXE boot, this allows you to force a PXE even when a valid OS/Boot block exists on your hard disk. If you don't have this you do indeed need to go into BIOS twice -- a pain. -mjk From fds at sdsc.edu Mon Dec 8 13:26:46 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 8 Dec 2003 13:26:46 -0800 Subject: [Rocks-Discuss]cluster-fork --mpd strangeness In-Reply-To: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu> References: <BC447F1AD529D311B4DE0008C71BF2EB0AE15847@phsexch7.mgh.harvard.edu> Message-ID: <39CC5B05-29C5-11D8-804D-000393A4725A@sdsc.edu> I've seen this before as well. I believe it has something to do with the way the color "[ OK ]" characters are interacting with the ssh session from the normal cluster-fork. We have yet to characterize this bug adequately. -Federico On Dec 8, 2003, at 12:55 PM, Gurgul, Dennis J. wrote: > Thanks. > > On a related note, when I did "cluster-fork service gschedule restart" > gschedule > started with the "OK" output, but then the fork process hung on each > node and I > had to ^c out for it to go on to the next node. > > I tried to ssh to a node and then did the gschedule restart. Even > then, after I > tried to "exit" out of the node, the session hung and I had to log > back in and > kill it from the frontend.
  • 46.
    > > > Dennis J.Gurgul > Partners Health Care System > Research Management > Research Computing Core > 617.724.3169 > > > -----Original Message----- > From: Federico Sacerdoti [mailto:fds at sdsc.edu] > Sent: Monday, December 08, 2003 3:51 PM > To: Gurgul, Dennis J. > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]cluster-fork --mpd strangeness > > > You are right, and I think this is a shortcoming of MPD. There is no > obvious way to force the MPD numbering to correspond to the order the > nodes were called out on the command line (cluster-fork --mpd actually > makes a shell call to mpirun and it calls out all the node names > explicitly). MPD seems to number the output differently, as you found > out. > > So mpd for now may be more useful for jobs that are not sensitive to > this. If enough of you find this shortcoming to be a real annoyance, we > could work on putting the node name label on the output by explicitly > calling "hostname" or similar. > > Good ideas are welcome :) > -Federico > > On Dec 8, 2003, at 11:28 AM, Gurgul, Dennis J. wrote: > >> Maybe this is a better description of the "strangeness". >> >> I did "cluster-fork --mpd hostname": >> >> 1: compute-0-0.local >> 2: compute-0-1.local >> 3: compute-0-3.local >> 4: compute-0-13.local >> 5: compute-0-11.local >> 6: compute-0-15.local >> 7: compute-0-16.local >> 8: compute-0-19.local >> 9: compute-0-21.local >> 10: compute-0-17.local >> 11: compute-0-5.local >> 12: compute-0-20.local >> 13: compute-0-18.local >> 14: compute-0-12.local >> 15: compute-0-9.local >> 16: compute-0-4.local >> 17: compute-0-8.local >> 18: compute-0-14.local >> 19: compute-0-2.local >> 20: compute-0-6.local >> 0: compute-0-7.local
  • 47.
    >> 21: compute-0-10.local >> >> Dennis J. Gurgul >> Partners Health Care System >> Research Management >> Research Computing Core >> 617.724.3169 >> >> >> -----Original Message----- >> From: npaci-rocks-discussion-admin at sdsc.edu >> [mailto:npaci-rocks-discussion-admin at sdsc.edu]On Behalf Of Gurgul, >> Dennis J. >> Sent: Monday, December 08, 2003 2:09 PM >> To: npaci-rocks-discussion at sdsc.edu >> Subject: [Rocks-Discuss]cluster-fork --mpd strangeness >> >> >> I just did "cluster-fork -Uvh >> /sourcedir/ganglia-python-3.0.1-2.i386.rpm" >> and >> then "cluster-fork service gschedule restart" (not sure I had to do >> the >> last). >> I also put 3.0.1-2 and restarted gschedule on the frontend. >> >> Now I run "cluster-fork --mpd w". >> >> I currently have a user who ssh'd to compute-0-8 from the frontend and >> one >> who >> ssh'd into compute-0-17 from the front end. >> >> But the return shows the users on lines for 17 (for the user on 0-8) >> and 10 >> (for >> the user on 0-17): >> >> 17: 1:58pm up 24 days, 3:20, 1 user, load average: 0.00, 0.00, >> 0.03 >> 17: USER TTY FROM LOGIN@ IDLE JCPU PCPU >> WHAT >> 17: lance pts/0 rescluster2.mgh. 1:31pm 40.00s 0.02s 0.02s >> -bash >> >> 10: 1:58pm up 24 days, 3:21, 1 user, load average: 0.02, 0.04, >> 0.07 >> 10: USER TTY FROM LOGIN@ IDLE JCPU PCPU >> WHAT >> 10: dennis pts/0 rescluster2.mgh. 1:57pm 17.00s 0.02s 0.02s >> -bash >> >> When I do "cluster-fork w" (without the --mpd) the users show up on >> the >> correct >> nodes. >> >> Do the numbers on the left of the -mpd output correspond to the node >> names?
  • 48.
    >> >> Thanks. >> >> Dennis >> >>Dennis J. Gurgul >> Partners Health Care System >> Research Management >> Research Computing Core >> 617.724.3169 >> > Federico > > Rocks Cluster Group, San Diego Supercomputing Center, CA > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From bruno at rocksclusters.org Mon Dec 8 15:31:08 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 8 Dec 2003 15:31:08 -0800 Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up In-Reply-To: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com> References: <BAY3-F9yOi5AgJQlDrR0002a5da@hotmail.com> Message-ID: <9979F090-29D6-11D8-9715-000A95C4E3B4@rocksclusters.org> > I have installed Rocks 3.0.0 with default options successful,there was > not any trouble.But I boot it up,it stopped at beginning,just show > "GRUB" on the screen and waiting... when you built the frontend, did you start with the rocks base CD then add the HPC roll? - gb From bruno at rocksclusters.org Mon Dec 8 15:37:46 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 8 Dec 2003 15:37:46 -0800 Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F622357C7@tardis0.quadrics.com> Message-ID: <8700A2BE-29D7-11D8-9715-000A95C4E3B4@rocksclusters.org> > Previously I have been installing a custom kernel on the compute > nodes > with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix > grub.conf). > > However I am now trying to do it the 'proper' way. So I do (on : > # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm > /home/install/rocks-dist/7.3/en/os/i386/force/RPMS > # cd /home/install > # rocks-dist dist > # SSH_NO_PASSWD=1 shoot-node compute-0-0
  • 49.
    > > Hence: > # find /home/install/ |xargs -l grep -nH qsnet > shows me that hdlist and hdlist2 now contain this RPM. (and indeed If > I duplicate my rpm in that directory rocks-dist notices this and warns > me.) > > However the node always ends up with "2.4.20-20.7smp" again. > anaconda-ks.cfg contains just "kernel-smp" and install.log has > "Installing kernel-smp-2.4.20-20.7." > > So my question is: > It looks like my RPM has a name that Rocks doesn't understand > properly. > What is wrong with my name ? > and what are the rules for getting the correct name ? > (.i686.rpm is of course correct, but I don't have -smp. in the > name Is this the problem ?) the anaconda installer looks for kernel packages with a specific format: kernel-<kernel ver>-<redhat ver>.i686.rpm and for smp nodes: kernel-smp-<kernel ver>-<redhat ver>.i686.rpm we have made the necessary patches to files under /usr/src/linux-2.4 in order to produce redhat-compliant kernels. see: http://www.rocksclusters.org/rocks-documentation/3.0.0/customization- kernel.html also, would you be interested in making your changes for the quadrics interconnect available to the general rocks community? - gb From purikk at hotmail.com Mon Dec 8 20:23:35 2003 From: purikk at hotmail.com (purushotham komaravolu) Date: Mon, 8 Dec 2003 23:23:35 -0500 Subject: [Rocks-Discuss]AMD Opteron References: <200312082001.hB8K1KJ24139@postal.sdsc.edu> Message-ID: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com> Hello, I am a newbie to ROCKS cluster. I wanted to setup clusters on 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel and AMD). I found the 64-bit download for Intel on the website but not for AMD. Does it work for AMD opteron? if not what is the ETA for AMD-64. We are planning to but AMD-64 bit machines shortly, and I would like to volunteer for the beta testing if needed. Thanks Regards, Puru
  • 50.
    From mjk atsdsc.edu Tue Dec 9 07:28:51 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 9 Dec 2003 07:28:51 -0800 Subject: [Rocks-Discuss]AMD Opteron In-Reply-To: <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com> References: <200312082001.hB8K1KJ24139@postal.sdsc.edu> <BAY1- DAV65Bp80SiEmA00005c14@hotmail.com> Message-ID: <6413D41A-2A5C-11D8-AECB-000A95DA5638@sdsc.edu> We have a beta right now that we have sent to a few people. We plan on a release this month, and AMD_64 will be part of this release along with the usual x86, IA64 support. If you want to help accelerate this process please talk to your vendor about loaning/giving us some hardware for testing. Having access to a variety of Opteron hardware (we own two boxes) is the only way we can have good support for this chip. -mjk On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote: > Hello, > I am a newbie to ROCKS cluster. I wanted to setup clusters > on > 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel > and > AMD). > I found the 64-bit download for Intel on the website but not for AMD. > Does > it work for AMD opteron? if not what is the ETA for AMD-64. > We are planning to but AMD-64 bit machines shortly, and I would like to > volunteer for the beta testing if needed. > Thanks > Regards, > Puru From cdmaest at sandia.gov Tue Dec 9 07:48:31 2003 From: cdmaest at sandia.gov (Christopher D. Maestas) Date: Tue, 09 Dec 2003 08:48:31 -0700 Subject: [Rocks-Discuss]AMD Opteron In-Reply-To: <6413D41A-2A5C-11D8-AECB-000A95DA5638@sdsc.edu> References: <200312082001.hB8K1KJ24139@postal.sdsc.edu> <BAY1-DAV65Bp80SiEmA00005c14@hotmail.com> <6413D41A-2A5C-11D8-AECB-000A95DA5638@sdsc.edu> Message-ID: <1070984911.19042.12.camel@capdesk.sandia.gov> What do I have to do to sign up to test? We have opteron systems we can test on here. On Tue, 2003-12-09 at 08:28, Mason J. Katz wrote: > We have a beta right now that we have sent to a few people. We plan on > a release this month, and AMD_64 will be part of this release along > with the usual x86, IA64 support. >
  • 51.
    > If you want to help accelerate this process please talk to your vendor > about loaning/giving us some hardware for testing. Having access to a > variety of Opteron hardware (we own two boxes) is the only way we can > have good support for this chip. > > -mjk > > > On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote: > > > Hello, > > I am a newbie to ROCKS cluster. I wanted to setup clusters > > on > > 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel > > and > > AMD). > > I found the 64-bit download for Intel on the website but not for AMD. > > Does > > it work for AMD opteron? if not what is the ETA for AMD-64. > > We are planning to but AMD-64 bit machines shortly, and I would like to > > volunteer for the beta testing if needed. > > Thanks > > Regards, > > Puru > From vincent_b_fox at yahoo.com Tue Dec 9 11:10:40 2003 From: vincent_b_fox at yahoo.com (Vincent Fox) Date: Tue, 9 Dec 2003 11:10:40 -0800 (PST) Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform Message-ID: <20031209191040.71171.qmail@web14811.mail.yahoo.com> I tried doing a rebuild of the ATLAS libraries on a PII test cluster and no go. Did an export PATH=/opt/gcc32/bin:$PATH first to make it easy on myself. The "make rpm" appears to get stuck in a loop on the xconfig part. I pause it and it seems like the prompt is defining f77=-O and f77 FLAGS=y which doesn't work of course. My guess is the spec file doesn't have an answer for a previous question, so the /usr/bin/g77 answer is getting set for the previous prompt, and since no f77 is defined, it gets stuck. Anyhow thought I would note this problem on the list for those more qualified to address it. __________________________________ Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing. http://photos.yahoo.com/ From bryan at UCLAlumni.net Tue Dec 9 12:14:16 2003
  • 52.
    From: bryan atUCLAlumni.net (Bryan Littlefield) Date: Tue, 09 Dec 2003 12:14:16 -0800 Subject: [Rocks-Discuss]Rocks-Discuss] AMD Opteron - Contact Appro In-Reply-To: <200312091531.hB9FV9J12694@postal.sdsc.edu> References: <200312091531.hB9FV9J12694@postal.sdsc.edu> Message-ID: <3FD62D18.7010208@UCLAlumni.net> Hi Mason, I suggest contacting Appro. We are using Rocks on our Opteron cluster and Appro would likely love to help. I will contact them as well to see if they could help getting a opteron machine for testing. Contact info below: Thanks --Bryan Jian Chang - Regional Sales Manager (408) 941-8100 x 202 (800) 927-5464 x 202 (408) 941-8111 Fax jian at appro.com http://www.appro.com npaci-rocks-discussion-request at sdsc.edu wrote: >From: "Mason J. Katz" <mjk at sdsc.edu> >Subject: Re: [Rocks-Discuss]AMD Opteron >Date: Tue, 9 Dec 2003 07:28:51 -0800 >To: "purushotham komaravolu" <purikk at hotmail.com> > >We have a beta right now that we have sent to a few people. We plan on >a release this month, and AMD_64 will be part of this release along >with the usual x86, IA64 support. > >If you want to help accelerate this process please talk to your vendor >about loaning/giving us some hardware for testing. Having access to a >variety of Opteron hardware (we own two boxes) is the only way we can >have good support for this chip. > > -mjk > > >On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote: > > > > Cc: <npaci-rocks-discussion at sdsc.edu> > >>Hello, >> I am a newbie to ROCKS cluster. I wanted to setup clusters >>on >>32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel >>and >>AMD). >>I found the 64-bit download for Intel on the website but not for AMD. >>Does >>it work for AMD opteron? if not what is the ETA for AMD-64. >>We are planning to but AMD-64 bit machines shortly, and I would like to >>volunteer for the beta testing if needed.
  • 53.
    >>Thanks >>Regards, >>Puru >> >> > >_______________________________________________ >npaci-rocks-discussion mailing list >npaci-rocks-discussionat sdsc.edu >http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > > >End of npaci-rocks-discussion Digest > > -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031209/611e65b4/attachment-0001.html From vincent_b_fox at yahoo.com Tue Dec 9 13:22:59 2003 From: vincent_b_fox at yahoo.com (Vincent Fox) Date: Tue, 9 Dec 2003 13:22:59 -0800 (PST) Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform Message-ID: <20031209212259.39587.qmail@web14810.mail.yahoo.com> Okay, came up my own quick hack: Edit atlas.spec.in, go to "other x86" section, remove 2 lines right above "linux", seems to make rpm now. A more formal patch would be put in a section for cpuid eq 4 with this correction I suppose. __________________________________ Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing. http://photos.yahoo.com/ From landman at scalableinformatics.com Tue Dec 9 13:49:06 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 09 Dec 2003 16:49:06 -0500 Subject: [Rocks-Discuss]Has anyone tried Gaussian binary only on the ROCKS 3.1.0 beta? Message-ID: <1071006546.18100.46.camel@squash.scalableinformatics.com> Hi Folks Working on building the same cluster from last week. The admin nodes are up and functional (plain old RH9+XFS). I want to get the head nodes up, with one of the requirements being running the Gaussian binary-only code. Gaussian's page lists RH9.0 support, so I wanted to see if someone has tried the beta with this code. Thanks.
  • 54.
    Joe -- Joseph Landman, Ph.D ScalableInformatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From landman at scalableinformatics.com Tue Dec 9 13:59:37 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 09 Dec 2003 16:59:37 -0500 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... Message-ID: <1071007177.18100.58.camel@squash.scalableinformatics.com> Folks: As indicated previously, I am wrestling with a Supermicro based cluster. None of the RH distributions come with the correct E1000 driver, so a new kernel is needed (in the boot CD, and for installation). The problem I am running into is that it isn't at all obvious/easy how to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable this thing to work. Following the examples in the documentation have not met with success. Running "rocks-dist cdrom" with the new kernels (2.4.23 works nicely on the nodes) in the force/RPMS directory generates a bootable CD with the original 2.4.18BOOT kernel. What I (and I think others) need, is a simple/easy to follow method that will generate a bootable CD with the correct linux kernel, and the correct modules. Is this in process somewhere? What would be tremendously helpful is if we can generate a binary module, and put that into the boot process by placing it into the force/modules/binary directory (assuming one exists) with the appropriate entry of a similar name in the force/modules/meta directory as a simple XML document giving pci-ids, description, name, etc. Anything close to this coming? Modules are killing future ROCKS installs, the inability to easily inject a new module in there has created a problem whereby ROCKS does not function (as the underlying RH does not function). -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615
  • 55.
    From tim.carlson atpnl.gov Tue Dec 9 14:11:43 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Tue, 09 Dec 2003 14:11:43 -0800 (PST) Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <1071007177.18100.58.camel@squash.scalableinformatics.com> Message-ID: <Pine.GSO.4.44.0312091406080.17458-100000@paradox.emsl.pnl.gov> On Tue, 9 Dec 2003, Joe Landman wrote: > The problem I am running into is that it isn't at all obvious/easy how > to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable > this thing to work. Following the examples in the documentation have > not met with success. Running "rocks-dist cdrom" with the new kernels > (2.4.23 works nicely on the nodes) in the force/RPMS directory generates > a bootable CD with the original 2.4.18BOOT kernel. So you built a 2.4.23BOOT rpm? The problem people have is with the naming convention of kernels. A kernel.org spec file isn't going to generate proper kernel rpms IMHO. What you really want to do (and maybe you are already doing this) is steal the bit of the Redhat spec building scripts that generage the -smp .i686 and BOOT rpms. New hardware is tough for any distro. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From tmartin at physics.ucsd.edu Tue Dec 9 15:57:17 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Tue, 09 Dec 2003 15:57:17 -0800 Subject: [Rocks-Discuss]Intel MT based Gigabit controllers Message-ID: <3FD6615D.8090200@physics.ucsd.edu> Does Rocks 3.0 support the Intel MT based Gigabit controllers (PCI 8086:1013) without any modifications? My new cluster has these new controllers. Rocks 2.3.1 does not seem detect/drive these cards correctly (install failes to detect and the e1000 driver does not seem to work). So I was going to go ahead and move my new head node to 3.0.0 and was wondering if I am going to have to do additional work to get the intel drivers on the boot image (for cluster nodes) to have the working Intel driver with these cards. Terrence From tmartin at physics.ucsd.edu Tue Dec 9 15:59:29 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Tue, 09 Dec 2003 15:59:29 -0800 Subject: [Rocks-Discuss]how to include custom driver In-Reply-To: <Pine.GSO.4.44.0306092142150.18083-100000@poincare.emsl.pnl.gov>
  • 56.
    References: <Pine.GSO.4.44.0306092142150.18083-100000@poincare.emsl.pnl.gov> Message-ID: <3FD661E1.90307@physics.ucsd.edu> TimCarlson wrote: > On Mon, 9 Jun 2003, Greg Bruno wrote: > > >>what driver did you have to add? >> >>we may be able to provide a patch for your compute nodes. > > > Ah!!!.. I didn't see this repsonse before I sent off my reply to Matthew. > Can I please have the aic79xx driver and while your at it can I get a > module-info file that has this entry for gigabit? Not sure if it is > already in there? ;) > > 0x8086 0x100f "e1000" "Intel Corp. 82545EM Gigabit Ethernet Controller rev (01)" > > It is also quite possible that I burned the 2.3.0 media instead of > 2.3.2. It was late in the day when I tried to do my install. > > Tim > > Tim Carlson > Voice: (509) 376 3423 > Email: Tim.Carlson at pnl.gov > EMSL UNIX System Support I would also like to request that this driver/change be made. I have a cluster with these newer Intel gigabit chipsets. Terrence From tmartin at physics.ucsd.edu Tue Dec 9 16:33:18 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Tue, 09 Dec 2003 16:33:18 -0800 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <Pine.GSO.4.44.0312091406080.17458-100000@paradox.emsl.pnl.gov> References: <Pine.GSO.4.44.0312091406080.17458-100000@paradox.emsl.pnl.gov> Message-ID: <3FD669CE.1070700@physics.ucsd.edu> Tim Carlson wrote: > On Tue, 9 Dec 2003, Joe Landman wrote: > > >> The problem I am running into is that it isn't at all obvious/easy how >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable >>this thing to work. Following the examples in the documentation have >>not met with success. Running "rocks-dist cdrom" with the new kernels >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates >>a bootable CD with the original 2.4.18BOOT kernel. > > > So you built a 2.4.23BOOT rpm? The problem people have is with the naming
  • 57.
    > convention of kernels. A kernel.org spec file isn't going to generate > proper kernel rpms IMHO. What you really want to do (and maybe you are > already doing this) is steal the bit of the Redhat spec building scripts > that generage the -smp .i686 and BOOT rpms. > > New hardware is tough for any distro. > > Tim > > Tim Carlson > Voice: (509) 376 3423 > Email: Tim.Carlson at pnl.gov > EMSL UNIX System Support > Where do you start if you want to update the PXE boot image to support a new kernel? Terrence From tmartin at physics.ucsd.edu Tue Dec 9 16:58:08 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Tue, 09 Dec 2003 16:58:08 -0800 Subject: [Rocks-Discuss]Could not allocate requested partitions Message-ID: <3FD66FA0.5070401@physics.ucsd.edu> I am getting the following error when trying to install a Rocks 3.0.0 headnode. The headnode works find in rocks 2.3.2. Could not allocate requested partitions: Partitioning failed: Could not allocate partitions as primary partitions What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it cannot find that device (unable to open /dev/hda). However when I watch the boot messages hda definitely comes up. Also the headnode works fine with 2.3.2. Any ideas? Terrence From tmartin at physics.ucsd.edu Tue Dec 9 17:33:24 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Tue, 09 Dec 2003 17:33:24 -0800 Subject: [Rocks-Discuss]Could not allocate requested partitions In-Reply-To: <3FD66FA0.5070401@physics.ucsd.edu> References: <3FD66FA0.5070401@physics.ucsd.edu> Message-ID: <3FD677E4.8050806@physics.ucsd.edu> Terrence Martin wrote: > I am getting the following error when trying to install a Rocks 3.0.0 > headnode. The headnode works find in rocks 2.3.2. >
  • 58.
    > Could not allocate requested partitions: Partitioning failed: Could not > allocate partitions as primary partitions > > What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it > cannot find that device (unable to open /dev/hda). However when I watch > the boot messages hda definitely comes up. Also the headnode works fine > with 2.3.2. > > Any ideas? > > Terrence > > > Figured it out, aparently rocks 3.0.0 did not like my partitions from rocks 2.3.2. I booted knoppix, blew away the partition table and so far so good on the head node. Terrence From mjk at sdsc.edu Tue Dec 9 17:54:01 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 9 Dec 2003 17:54:01 -0800 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <1071007177.18100.58.camel@squash.scalableinformatics.com> References: <1071007177.18100.58.camel@squash.scalableinformatics.com> Message-ID: <BA0ADEC6-2AB3-11D8-981C-000A95DA5638@sdsc.edu> If the underlying RedHat doesn't support your hardware you are pretty much dead in the water. We do at times include drivers that RH does not but this is an exception and only for hardware we physically have access to. The rocks-boot (rocks/src/rock/boot in CVS) package controls the boot kernel and module selection. You can look into this to see what it would take to add your own module. We do plan on refining and documenting this not for several months. We also have some very good idea on how we can track this faster than RH, but again nothing coming in the next few months. To continue my earlier rant for today, until more hardware vendors start taking the linux market place seriously buying bleeding edge hardware and CPUs is asking for problems. It takes several months for any new hardware to become supported by RedHat and several years for any new CPU to be supported well. This isn't killing future Rocks installs, it's just correctly delaying them until the underlying OS supports the hardware. -mjk On Dec 9, 2003, at 1:59 PM, Joe Landman wrote: > Folks: > > As indicated previously, I am wrestling with a Supermicro based > cluster. None of the RH distributions come with the correct E1000 > driver, so a new kernel is needed (in the boot CD, and for
  • 59.
    > installation). > > The problem I am running into is that it isn't at all obvious/easy > how > to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable > this thing to work. Following the examples in the documentation have > not met with success. Running "rocks-dist cdrom" with the new kernels > (2.4.23 works nicely on the nodes) in the force/RPMS directory > generates > a bootable CD with the original 2.4.18BOOT kernel. > > What I (and I think others) need, is a simple/easy to follow method > that will generate a bootable CD with the correct linux kernel, and the > correct modules. > > Is this in process somewhere? What would be tremendously helpful is > if we can generate a binary module, and put that into the boot process > by placing it into the force/modules/binary directory (assuming one > exists) with the appropriate entry of a similar name in the > force/modules/meta directory as a simple XML document giving pci-ids, > description, name, etc. > > Anything close to this coming? Modules are killing future ROCKS > installs, the inability to easily inject a new module in there has > created a problem whereby ROCKS does not function (as the underlying RH > does not function). > > > > -- > Joseph Landman, Ph.D > Scalable Informatics LLC, > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > phone: +1 734 612 4615 From gotero at linuxprophet.com Tue Dec 9 18:02:23 2003 From: gotero at linuxprophet.com (gotero at linuxprophet.com) Date: Tue, 09 Dec 2003 18:02:23 -0800 (PST) Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) Message-ID: <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net> Daniel- I recently had the same problem when building a quadrics cluster on Rocks 2.3.2 with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The problem is definitely in the naming of the rpms, in that anaconda running on the compute nodes is not going to recognize kernel rpms that begin with 'qsnet' as potential boot options. Unfortunately, being under a severe time contraint, I resorted to manually installing the qsnet kernel on all nodes of the cluster, which isn't the Rocks way. The long term solution is to mangle the kernel makefiles so that the qsnet kernel rpms have conventional kernel rpm names, which is what Greg's post referred to. Glen
  • 60.
    On Mon, 8Dec 2003 17:54:53 -0000, daniel.kidger at quadrics.com wrote: > > Dear all, > Previously I have been installing a custom kernel on the compute nodes > with an "extend-compute.xml" and an "/etc/init.d/qsconfigure" (to fix grub.conf). > > However I am now trying to do it the 'proper' way. So I do (on : > # cp qsnet-RedHat-kernel-2.4.18-27.3.10qsnet.i686.rpm > /home/install/rocks-dist/7.3/en/os/i386/force/RPMS > # cd /home/install > # rocks-dist dist > # SSH_NO_PASSWD=1 shoot-node compute-0-0 > > Hence: > # find /home/install/ |xargs -l grep -nH qsnet > shows me that hdlist and hdlist2 now contain this RPM. (and indeed If I > duplicate my rpm in that directory rocks-dist notices this and warns me.) > > However the node always ends up with "2.4.20-20.7smp" again. > anaconda-ks.cfg contains just "kernel-smp" and install.log has "Installing > kernel-smp-2.4.20-20.7." > > So my question is: > It looks like my RPM has a name that Rocks doesn't understand properly. > What is wrong with my name ? > and what are the rules for getting the correct name ? > (.i686.rpm is of course correct, but I don't have -smp. in the name Is this > the problem ?) > > cf. Greg Bruno's wisdom: > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-April/001770.html > > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > > > Glen Otero, Ph.D. Linux Prophet From gotero at linuxprophet.com Tue Dec 9 18:05:04 2003 From: gotero at linuxprophet.com (gotero at linuxprophet.com) Date: Tue, 09 Dec 2003 18:05:04 -0800 (PST) Subject: [Rocks-Discuss]Could not allocate requested partitions Message-ID: <20031209180504.716.h014.c001.wm@mail.linuxprophet.com.criticalpath.net>
  • 61.
    On Tue, 09Dec 2003 17:33:24 -0800, Terrence Martin wrote: > > Terrence Martin wrote: > > I am getting the following error when trying to install a Rocks 3.0.0 > > headnode. The headnode works find in rocks 2.3.2. > > > > Could not allocate requested partitions: Partitioning failed: Could not > > allocate partitions as primary partitions > > > > What is also odd is when I alt-f2 and run fdisk /dev/hda it tells me it > > cannot find that device (unable to open /dev/hda). However when I watch > > the boot messages hda definitely comes up. Also the headnode works fine > > with 2.3.2. > > > > Any ideas? > > > > Terrence > > > > > > > > Figured it out, aparently rocks 3.0.0 did not like my partitions from > rocks 2.3.2. I booted knoppix, blew away the partition table and so far > so good on the head node. I had the same problem with moving from 2.3.2 to 3.1. I'll try your solution. Glen > > Terrence Glen Otero, Ph.D. Linux Prophet From jorge at phys.ufl.edu Tue Dec 9 18:55:02 2003 From: jorge at phys.ufl.edu (Jorge L. Rodriguez) Date: Tue, 09 Dec 2003 21:55:02 -0500 Subject: [Rocks-Discuss]Adding partitions that are not reformatted under hard boots or shoot-node Message-ID: <3FD68B06.9010709@phys.ufl.edu> Hi, How do I add an extra partition to my compute nodes and retain the data on all non / partitions when system hard boots or is shot? I tried the suggestion in the documentation under "Customizing your ROCKS Installation" where you replace the auto-partition.xml but hard boots or shoot-nodes on these reformat all partitions instead of just the /. I have also tried to modify the installclass.xml so that an extra partition is added into the python code see below. This does mostly what I want but now I can't shoot-node even though a hard boot reinstalls without reformatting all but /. Is this the right approach? I'd rather avoid having to replace installclass since I don't really want to partition all nodes this way but if I must I will. Jorge
  • 62.
    # # set up the root partition # args = [ "/" , "--size" , "4096", "--fstype", "&fstype;", "--ondisk", devnames[0] ] KickstartBase.definePartition(self, id, args) # ---- Jorge, I added this args args = [ "/state/partition1" , "--size" , "55000", "--fstype", "&fstype;", "--ondisk", devnames[0] ] KickstartBase.definePartition(self, id, args) # ----- args = [ "swap" , "--size" , "1000", "--ondisk", devnames[0] ] KickstartBase.definePartition(self, id, args) # # greedy partitioning # # ----- Jorge, I change this from i = 1 i = 2 # ----- for devname in devnames: partname = "/state/partition%d" % (i) args = [ partname, "--size", "1", "--fstype", "&fstype;", "--grow", "--ondisk", devname ] KickstartBase.definePartition(self, id, args) i = i + 1 From bruno at rocksclusters.org Tue Dec 9 22:43:04 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 9 Dec 2003 22:43:04 -0800 Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform In-Reply-To: <20031209212259.39587.qmail@web14810.mail.yahoo.com> References: <20031209212259.39587.qmail@web14810.mail.yahoo.com> Message-ID: <1B097BEE-2ADC-11D8-9715-000A95C4E3B4@rocksclusters.org> > Okay, came up my own quick hack: > > Edit atlas.spec.in, go to "other x86" section, remove > 2 lines right above "linux", seems to make rpm now. > > A more formal patch would be put in a section for > cpuid eq 4 with this correction I suppose. if you provide the patch, we'll include it in our next release. - gb
  • 63.
    From tlw atcs.unm.edu Tue Dec 9 23:23:43 2003 From: tlw at cs.unm.edu (Tiffani Williams) Date: Wed, 10 Dec 2003 00:23:43 -0700 Subject: [Rocks-Discuss]PBS errors Message-ID: <3FD6C9FF.60603@cs.unm.edu> Hello, I am trying to submit a job through PBS, but I receive 2 errors. The first error is Job cannot be executed See job standard error file The second error is that the standard error file cannot be written into my home directory. I downloaded the sample script at http://rocks.npaci.edu/papers/rocks-documentation/launching-batch-jobs.html and have tried a more simple script with PBS directives and echo commands. I do not know what I am doing wrong? I have used PBS successfully on other clusters. Does anyone have any suggestions? Tiffani From bruno at rocksclusters.org Tue Dec 9 23:35:59 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 9 Dec 2003 23:35:59 -0800 Subject: [Rocks-Discuss]PBS errors In-Reply-To: <3FD6C9FF.60603@cs.unm.edu> References: <3FD6C9FF.60603@cs.unm.edu> Message-ID: <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org> > I am trying to submit a job through PBS, but I receive 2 errors. The > first error is > Job cannot be executed > See job standard error file > > The second error is that the standard error file cannot be written > into my home directory. > I downloaded the sample script at > > http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- > jobs.html > and have tried a more simple script with PBS directives and echo > commands. >
  • 64.
    > I donot know what I am doing wrong? I have used PBS successfully on > other clusters. > > Does anyone have any suggestions? can you login to the compute nodes successfully? if not, try restarting autofs on all the compute nodes. on the frontend, execute: # ssh-agent $SHELL # ssh-add # cluster-fork "/etc/rc.d/init.d/autofs restart" we've found the startup of autofs to be flaky at times. - gb From tlw at cs.unm.edu Wed Dec 10 00:03:13 2003 From: tlw at cs.unm.edu (Tiffani Williams) Date: Wed, 10 Dec 2003 01:03:13 -0700 Subject: [Rocks-Discuss]PBS errors In-Reply-To: <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org> References: <3FD6C9FF.60603@cs.unm.edu> <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org> Message-ID: <3FD6D341.5070501@cs.unm.edu> >> I am trying to submit a job through PBS, but I receive 2 errors. >> The first error is >> Job cannot be executed >> See job standard error file >> >> The second error is that the standard error file cannot be written >> into my home directory. >> I downloaded the sample script at >> >> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- >> jobs.html >> and have tried a more simple script with PBS directives and echo >> commands. >> >> I do not know what I am doing wrong? I have used PBS successfully >> on other clusters. >> >> Does anyone have any suggestions? > > > can you login to the compute nodes successfully? > > if not, try restarting autofs on all the compute nodes. on the > frontend, execute: > > # ssh-agent $SHELL > # ssh-add > > # cluster-fork "/etc/rc.d/init.d/autofs restart"
  • 65.
    > > we've foundthe startup of autofs to be flaky at times. > > - gb Do these commands have to be run by an administrator? If so, I do not have such privileges. I can ssh to the compute nodes, but I am denied entry. Am I supposed to be able to login to a compute node as a user. Tiffani From bruno at rocksclusters.org Wed Dec 10 06:37:05 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 10 Dec 2003 06:37:05 -0800 Subject: [Rocks-Discuss]PBS errors In-Reply-To: <3FD6D341.5070501@cs.unm.edu> References: <3FD6C9FF.60603@cs.unm.edu> <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org> <3FD6D341.5070501@cs.unm.edu> Message-ID: <53451392-2B1E-11D8-9715-000A95C4E3B4@rocksclusters.org> On Dec 10, 2003, at 12:03 AM, Tiffani Williams wrote: > >>> I am trying to submit a job through PBS, but I receive 2 errors. >>> The first error is >>> Job cannot be executed >>> See job standard error file >>> >>> The second error is that the standard error file cannot be written >>> into my home directory. >>> I downloaded the sample script at >>> >>> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- >>> jobs.html >>> and have tried a more simple script with PBS directives and echo >>> commands. >>> >>> I do not know what I am doing wrong? I have used PBS successfully >>> on other clusters. >>> >>> Does anyone have any suggestions? >> >> >> can you login to the compute nodes successfully? >> >> if not, try restarting autofs on all the compute nodes. on the >> frontend, execute: >> >> # ssh-agent $SHELL >> # ssh-add >> >> # cluster-fork "/etc/rc.d/init.d/autofs restart" >> >> we've found the startup of autofs to be flaky at times. >>
  • 66.
    >> - gb > > >Do these commands have to be run by an administrator? If so, I do not > have such privileges. I can ssh to the compute nodes, but I am denied > entry. Am I supposed to be able to login to a compute node as a user. yes, you need to be 'root'. it appears your home directory is not being mounted when you login -- have your administrator run the commands above. - gb From mjk at sdsc.edu Wed Dec 10 07:20:47 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 10 Dec 2003 07:20:47 -0800 Subject: [Rocks-Discuss]PBS errors In-Reply-To: <53451392-2B1E-11D8-9715-000A95C4E3B4@rocksclusters.org> References: <3FD6C9FF.60603@cs.unm.edu> <7F75D3D2-2AE3-11D8-9715-000A95C4E3B4@rocksclusters.org> <3FD6D341.5070501@cs.unm.edu> <53451392-2B1E-11D8-9715-000A95C4E3B4@rocksclusters.org> Message-ID: <6E659550-2B24-11D8-981C-000A95DA5638@sdsc.edu> This is most likely the dreaded NIS-crash. You'll need to restart the ypserver on the frontend and the ypbind daemon on all the nodes. We've seen this on our clusters maybe 4 times (on production systems) in the last several years. Others have seen this on a weekly basis. This is why NIS is dead in Rocks 3.1 - it served us reasonably well but never matured to a stable system. -mjk On Dec 10, 2003, at 6:37 AM, Greg Bruno wrote: > > On Dec 10, 2003, at 12:03 AM, Tiffani Williams wrote: > >> >>>> I am trying to submit a job through PBS, but I receive 2 errors. >>>> The first error is >>>> Job cannot be executed >>>> See job standard error file >>>> >>>> The second error is that the standard error file cannot be written >>>> into my home directory. >>>> I downloaded the sample script at >>>> >>>> http://rocks.npaci.edu/papers/rocks-documentation/launching-batch- >>>> jobs.html >>>> and have tried a more simple script with PBS directives and echo >>>> commands. >>>> >>>> I do not know what I am doing wrong? I have used PBS successfully >>>> on other clusters. >>>>
  • 67.
    >>>> Does anyonehave any suggestions? >>> >>> >>> can you login to the compute nodes successfully? >>> >>> if not, try restarting autofs on all the compute nodes. on the >>> frontend, execute: >>> >>> # ssh-agent $SHELL >>> # ssh-add >>> >>> # cluster-fork "/etc/rc.d/init.d/autofs restart" >>> >>> we've found the startup of autofs to be flaky at times. >>> >>> - gb >> >> >> Do these commands have to be run by an administrator? If so, I do not >> have such privileges. I can ssh to the compute nodes, but I am >> denied entry. Am I supposed to be able to login to a compute node as >> a user. > > yes, you need to be 'root'. > > it appears your home directory is not being mounted when you login -- > have your administrator run the commands above. > > - gb From vincent_b_fox at yahoo.com Wed Dec 10 07:59:14 2003 From: vincent_b_fox at yahoo.com (Vincent Fox) Date: Wed, 10 Dec 2003 07:59:14 -0800 (PST) Subject: [Rocks-Discuss]one node short in "labels" Message-ID: <20031210155914.55789.qmail@web14812.mail.yahoo.com> So I go to the "labels" selection on the web page to print out the pretty labels. What a nice idea by the way! EXCEPT....it's one node short! I go up to 0-13 and this stops at 0-12. Any ideas where I should check to fix this? --------------------------------- Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031210/c5bf5e79/attachment-0001.html From cdwan at mail.ahc.umn.edu Wed Dec 10 12:04:53 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST) Subject: [Rocks-Discuss]Non-homogenous legacy hardware Message-ID: <Pine.GSO.4.58.0312101359380.22@lenti.med.umn.edu>
  • 68.
    I am integratinglegacy systems into a ROCKS cluster, and have hit a snag with the auto-partition configuration: The new (old) systems have SCSI disks, while old (new) ones contain IDE. This is a non-issue so long as the initial install does its default partitioning. However, I have a "replace-auto-partition.xml" file which is unworkable for the SCSI based systems since it makes specific reference to "hda" rather than "sda." I would like to have a site-nodes/replace-auto-partition.xml file with a conditional such that "hda" or "sda" is used, based on the name of the node (or some other criterion). Is this possible? Thanks, in advance. If this is out there on the mailing list archives, a pointer would be greatly appreciated. -Chris Dwan The University of Minnesota From tmartin at physics.ucsd.edu Wed Dec 10 12:09:11 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Wed, 10 Dec 2003 12:09:11 -0800 Subject: [Rocks-Discuss]Error during Make when building a new install floppy Message-ID: <3FD77D67.7000708@physics.ucsd.edu> I get the following error when I try to rebuild a boot floppy for rocks. This is with the default CVS checkout with an update today according to the rocks userguide. I have not actually attempted to make any changes. make[3]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader' make[2]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3' strip -o loader anaconda-7.3/loader/loader strip: anaconda-7.3/loader/loader: No such file or directory make[1]: *** [loader] Error 1 make[1]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader' make: *** [loader] Error 2 Of course I could avoid all of this together and just put my binary module into the appropriate location in the boot image. Would it be correct to modify the following image file with my changes and then write it to a floppy via dd? /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks- dist/7.3/en/os/i386/images/bootnet.img Basically I am injecting an updated e1000 driver with changes to pcitable to support the address of my gigabit cards. Terrence
  • 69.
    From tim.carlson atpnl.gov Wed Dec 10 12:40:41 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST) Subject: [Rocks-Discuss]Error during Make when building a new install floppy In-Reply-To: <3FD77D67.7000708@physics.ucsd.edu> Message-ID: <Pine.LNX.4.44.0312101235310.20272-100000@scorpion.emsl.pnl.gov> On Wed, 10 Dec 2003, Terrence Martin wrote: > I get the following error when I try to rebuild a boot floppy for rocks. > You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at least it wasn't the last time I checked > Of course I could avoid all of this together and just put my binary > module into the appropriate location in the boot image. > > Would it be correct to modify the following image file with my changes > and then write it to a floppy via dd? > > /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks- dist/7.3/en/os/i386/images/bootnet.img > > Basically I am injecting an updated e1000 driver with changes to > pcitable to support the address of my gigabit cards. Modifiying the bootnet.img is about 1/3 of what you need to do if you go down that path. You also need to work on netstg1.img and you'll need to update the drive in the kernel rpm that gets installed on the box. None of this is trivial. If it were me, I would go down the same path I took for updating the AIC79XX driver https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From tim.carlson at pnl.gov Wed Dec 10 12:52:38 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST) Subject: [Rocks-Discuss]Non-homogenous legacy hardware In-Reply-To: <Pine.GSO.4.58.0312101359380.22@lenti.med.umn.edu> Message-ID: <Pine.LNX.4.44.0312101249400.20272-100000@scorpion.emsl.pnl.gov> On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote: > > I am integrating legacy systems into a ROCKS cluster, and have hit a > snag with the auto-partition configuration: The new (old) systems have > SCSI disks, while old (new) ones contain IDE. This is a non-issue so
  • 70.
    > long as the initial install does its default partitioning. However, I > have a "replace-auto-partition.xml" file which is unworkable for the SCSI > based systems since it makes specific reference to "hda" rather than > "sda." If you have just a single drive, then you should be able to skip the "--ondisk" bits of your "part" command Otherwise, you would have first to do something ugly like the following: http://penguin.epfl.ch/slides/kickstart/ks.cfg You could probably (maybe) wrap most of that in an <eval sh="bash"> </eval> block in the <main> block. Just guessing.. haven't tried this. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From agrajag at dragaera.net Wed Dec 10 10:21:07 2003 From: agrajag at dragaera.net (Jag) Date: Wed, 10 Dec 2003 13:21:07 -0500 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia Message-ID: <1071080467.4693.6.camel@pel> I noticed a previous post on this list (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934.html) indicating that Rocks distributes ssh keys for all the nodes over ganglia. Can anyone enlighten me as to how this is done? I looked through the ganglia docs and didn't see anything indicating how to do this, so I'm assuming Rocks made some changes. Unfortunately the rocks iso images don't seem to contain srpms, so I'm now coming here. What did Rocks do to ganglia to make the distribution of ssh keys work? Also, does anyone know where Rocks SRPMs can be found? I've done quite a bit of searching, but haven't found them anywhere. From mjk at sdsc.edu Wed Dec 10 14:39:15 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 10 Dec 2003 14:39:15 -0800 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia In-Reply-To: <1071080467.4693.6.camel@pel> References: <1071080467.4693.6.camel@pel> Message-ID: <AF006859-2B61-11D8-981C-000A95DA5638@sdsc.edu> Most of the SRPMS are on our FTP site, but we've screwed this up
  • 71.
    before. The SRPMSare entirely Rocks specific so they are of little value outside of Rocks. You can also checkout our CVS tree (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We have a ganglia-python package we created to allow us to write our own metrics at a high level than the provide gmetric application. We've also moved from this method to a single cluster-wide ssh key for Rocks 3.1. -mjk On Dec 10, 2003, at 10:21 AM, Jag wrote: > I noticed a previous post on this list > (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ > 001934.html) indicating that Rocks distributes ssh keys for all the > nodes over > ganglia. Can anyone enlighten me as to how this is done? > > I looked through the ganglia docs and didn't see anything indicating > how > to do this, so I'm assuming Rocks made some changes. Unfortunately the > rocks iso images don't seem to contain srpms, so I'm now coming here. > What did Rocks do to ganglia to make the distribution of ssh keys work? > > Also, does anyone know where Rocks SRPMs can be found? I've done quite > a bit of searching, but haven't found them anywhere. From vrowley at ucsd.edu Wed Dec 10 14:43:49 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Wed, 10 Dec 2003 14:43:49 -0800 Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Message-ID: <3FD7A1A5.2030805@ucsd.edu> When I run this: [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist --dist=cdrom cdrom on a server installed with ROCKS 3.0.0, I eventually get this: > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Adding support for rebuild distribution from source > Creating files (symbolic links - fast) > Creating symlinks to kickstart files > Fixing Comps Database > Generating hdlist (rpm database) > Patching second stage loader (eKV, partioning, ...) > patching "rocks-ekv" into distribution ... > patching "rocks-piece-pipe" into distribution ... > patching "PyXML" into distribution ... > patching "expat" into distribution ... > patching "rocks-pylib" into distribution ... > patching "MySQL-python" into distribution ... > patching "rocks-kickstart" into distribution ...
  • 72.
    > patching "rocks-kickstart-profiles" into distribution ... > patching "rocks-kickstart-dtds" into distribution ... > building CRAM filesystem ... > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Creating symlinks to kickstart files > Generating hdlist (rpm database) > Segregating RPMs (rocks, non-rocks) > sh: ./kickstart.cgi: No such file or directory > sh: ./kickstart.cgi: No such file or directory > Traceback (innermost last): > File "/opt/rocks/bin/rocks-dist", line 807, in ? > app.run() > File "/opt/rocks/bin/rocks-dist", line 623, in run > eval('self.command_%s()' % (command)) > File "<string>", line 0, in ? > File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > builder.build() > File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > (rocks, nonrocks) = self.segregateRPMS() > File "/opt/rocks/lib/python/rocks/build.py", line 1107, in segregateRPMS > for pkg in ks.getSection('packages'): > TypeError: loop over non-sequence Any ideas? -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb From bruno at rocksclusters.org Wed Dec 10 15:12:49 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 10 Dec 2003 15:12:49 -0800 Subject: [Rocks-Discuss]one node short in "labels" In-Reply-To: <20031210155914.55789.qmail@web14812.mail.yahoo.com> References: <20031210155914.55789.qmail@web14812.mail.yahoo.com> Message-ID: <5F8539FC-2B66-11D8-9715-000A95C4E3B4@rocksclusters.org> > So I go to the "labels" selection on the web page to print out the > pretty labels. What a nice idea by the way! > ? > EXCEPT....it's one node short! I go up to 0-13 and this stops at > 0-12.? Any ideas where I should check to fix this? yeah, we found this corner case -- it'll be fixed in the next release. thanks for bug report. - gb
  • 73.
    From mjk atsdsc.edu Wed Dec 10 15:16:27 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 10 Dec 2003 15:16:27 -0800 Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro In-Reply-To: <3FD7A1A5.2030805@ucsd.edu> References: <3FD7A1A5.2030805@ucsd.edu> Message-ID: <E17B3F9E-2B66-11D8-981C-000A95DA5638@sdsc.edu> It looks like someone moved the profiles directory to profiles.orig. -mjk [root at rocks14 install]# ls -l total 56 drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 ftp.rocksclusters.org drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 ftp.rocksclusters.org.orig -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > When I run this: > > [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > rocks-dist --dist=cdrom cdrom > > on a server installed with ROCKS 3.0.0, I eventually get this: > >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Adding support for rebuild distribution from source >> Creating files (symbolic links - fast) >> Creating symlinks to kickstart files >> Fixing Comps Database >> Generating hdlist (rpm database) >> Patching second stage loader (eKV, partioning, ...) >> patching "rocks-ekv" into distribution ... >> patching "rocks-piece-pipe" into distribution ... >> patching "PyXML" into distribution ... >> patching "expat" into distribution ... >> patching "rocks-pylib" into distribution ... >> patching "MySQL-python" into distribution ... >> patching "rocks-kickstart" into distribution ... >> patching "rocks-kickstart-profiles" into distribution ... >> patching "rocks-kickstart-dtds" into distribution ... >> building CRAM filesystem ... >> Cleaning distribution
  • 74.
    >> Resolving versions(RPMs) >> Resolving versions (SRPMs) >> Creating symlinks to kickstart files >> Generating hdlist (rpm database) >> Segregating RPMs (rocks, non-rocks) >> sh: ./kickstart.cgi: No such file or directory >> sh: ./kickstart.cgi: No such file or directory >> Traceback (innermost last): >> File "/opt/rocks/bin/rocks-dist", line 807, in ? >> app.run() >> File "/opt/rocks/bin/rocks-dist", line 623, in run >> eval('self.command_%s()' % (command)) >> File "<string>", line 0, in ? >> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >> builder.build() >> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >> (rocks, nonrocks) = self.segregateRPMS() >> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >> segregateRPMS >> for pkg in ks.getSection('packages'): >> TypeError: loop over non-sequence > > Any ideas? > > -- > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at > http://www.sagacitech.com/Chinaweb From vrowley at ucsd.edu Wed Dec 10 16:50:16 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Wed, 10 Dec 2003 16:50:16 -0800 Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro In-Reply-To: <E17B3F9E-2B66-11D8-981C-000A95DA5638@sdsc.edu> References: <3FD7A1A5.2030805@ucsd.edu> <E17B3F9E-2B66-11D8-981C-000A95DA5638@sdsc.edu> Message-ID: <3FD7BF48.9020409@ucsd.edu> Yep, I did that, but only *AFTER* getting the error. [Thought it was generated by the rocks-dist sequence, but apparently not.] Go ahead. Move it back. Same difference. Vicky Mason J. Katz wrote: > It looks like someone moved the profiles directory to profiles.orig. > > -mjk > >
  • 75.
    > [root atrocks14 install]# ls -l > total 56 > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > ftp.rocksclusters.org > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > ftp.rocksclusters.org.orig > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > >> When I run this: >> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >> rocks-dist --dist=cdrom cdrom >> >> on a server installed with ROCKS 3.0.0, I eventually get this: >> >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Adding support for rebuild distribution from source >>> Creating files (symbolic links - fast) >>> Creating symlinks to kickstart files >>> Fixing Comps Database >>> Generating hdlist (rpm database) >>> Patching second stage loader (eKV, partioning, ...) >>> patching "rocks-ekv" into distribution ... >>> patching "rocks-piece-pipe" into distribution ... >>> patching "PyXML" into distribution ... >>> patching "expat" into distribution ... >>> patching "rocks-pylib" into distribution ... >>> patching "MySQL-python" into distribution ... >>> patching "rocks-kickstart" into distribution ... >>> patching "rocks-kickstart-profiles" into distribution ... >>> patching "rocks-kickstart-dtds" into distribution ... >>> building CRAM filesystem ... >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Creating symlinks to kickstart files >>> Generating hdlist (rpm database) >>> Segregating RPMs (rocks, non-rocks) >>> sh: ./kickstart.cgi: No such file or directory >>> sh: ./kickstart.cgi: No such file or directory >>> Traceback (innermost last): >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>> app.run() >>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>> eval('self.command_%s()' % (command)) >>> File "<string>", line 0, in ? >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>> builder.build() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build
  • 76.
    >>> (rocks, nonrocks) = self.segregateRPMS() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>> segregateRPMS >>> for pkg in ks.getSection('packages'): >>> TypeError: loop over non-sequence >> >> >> Any ideas? >> >> -- >> Vicky Rowley email: vrowley at ucsd.edu >> Biomedical Informatics Research Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb From tim.carlson at pnl.gov Wed Dec 10 17:23:25 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST) Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro In-Reply-To: <3FD7BF48.9020409@ucsd.edu> Message-ID: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov> On Wed, 10 Dec 2003, V. Rowley wrote: Did you remove python by chance? kickstart.cgi calls python directly in /usr/bin/python while rocks-dist does an "env python" Tim > Yep, I did that, but only *AFTER* getting the error. [Thought it was > generated by the rocks-dist sequence, but apparently not.] Go ahead. > Move it back. Same difference. > > Vicky > > Mason J. Katz wrote: > > It looks like someone moved the profiles directory to profiles.orig. > > > > -mjk
  • 77.
    > > > > > > [root at rocks14 install]# ls -l > > total 56 > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > > ftp.rocksclusters.org > > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > > ftp.rocksclusters.org.orig > > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > > > >> When I run this: > >> > >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > >> rocks-dist --dist=cdrom cdrom > >> > >> on a server installed with ROCKS 3.0.0, I eventually get this: > >> > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Adding support for rebuild distribution from source > >>> Creating files (symbolic links - fast) > >>> Creating symlinks to kickstart files > >>> Fixing Comps Database > >>> Generating hdlist (rpm database) > >>> Patching second stage loader (eKV, partioning, ...) > >>> patching "rocks-ekv" into distribution ... > >>> patching "rocks-piece-pipe" into distribution ... > >>> patching "PyXML" into distribution ... > >>> patching "expat" into distribution ... > >>> patching "rocks-pylib" into distribution ... > >>> patching "MySQL-python" into distribution ... > >>> patching "rocks-kickstart" into distribution ... > >>> patching "rocks-kickstart-profiles" into distribution ... > >>> patching "rocks-kickstart-dtds" into distribution ... > >>> building CRAM filesystem ... > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Creating symlinks to kickstart files > >>> Generating hdlist (rpm database) > >>> Segregating RPMs (rocks, non-rocks) > >>> sh: ./kickstart.cgi: No such file or directory > >>> sh: ./kickstart.cgi: No such file or directory > >>> Traceback (innermost last): > >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? > >>> app.run() > >>> File "/opt/rocks/bin/rocks-dist", line 623, in run > >>> eval('self.command_%s()' % (command)) > >>> File "<string>", line 0, in ? > >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom
  • 78.
    > >>> builder.build() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > >>> (rocks, nonrocks) = self.segregateRPMS() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in > >>> segregateRPMS > >>> for pkg in ks.getSection('packages'): > >>> TypeError: loop over non-sequence > >> > >> > >> Any ideas? > >> > >> -- > >> Vicky Rowley email: vrowley at ucsd.edu > >> Biomedical Informatics Research Network work: (858) 536-5980 > >> University of California, San Diego fax: (858) 822-0828 > >> 9500 Gilman Drive > >> La Jolla, CA 92093-0715 > >> > >> > >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > > > > > > > -- > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > From naihh at imcb.a-star.edu.sg Wed Dec 10 17:45:18 2003 From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis) Date: Thu, 11 Dec 2003 09:45:18 +0800 Subject: [Rocks-Discuss]RE: Do you have a list of the various models of Gigabit Ethernet Interfaces compatible to Rocks 3? Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCD66@EXIMCB2.imcb.a-star.edu.sg> Hi All, Do you have a list of the various gigabit Ethernet interfaces that are compatible to Rocks 3? I am changing my nodes connectivity from 10/100 to 1000. Have anyone done that and how are the differences in performance or turnaround time? Have anyone successfully build a set of grid compute nodes using Rocks 3?
  • 79.
    Thanks and Regards NaiHong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: npaci-rocks-discussion-request at sdsc.edu [mailto:npaci-rocks-discussion-request at sdsc.edu] Sent: Thursday, December 11, 2003 9:25 AM To: npaci-rocks-discussion at sdsc.edu Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs Send npaci-rocks-discussion mailing list submissions to npaci-rocks-discussion at sdsc.edu To subscribe or unsubscribe via the World Wide Web, visit http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion or, via email, send a message with subject or body 'help' to npaci-rocks-discussion-request at sdsc.edu You can reach the person managing the list at npaci-rocks-discussion-admin at sdsc.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of npaci-rocks-discussion digest..." Today's Topics: 1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) 2. Error during Make when building a new install floppy (Terrence Martin) 3. Re: Error during Make when building a new install floppy (Tim Carlson) 4. Re: Non-homogenous legacy hardware (Tim Carlson) 5. ssh_known_hosts and ganglia (Jag) 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) 7. "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 8. Re: one node short in "labels" (Greg Bruno) 9. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Mason J. Katz) 10. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 11. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Tim Carlson) --__--__-- Message: 1 Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST) From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> To: npaci-rocks-discussion at sdsc.edu
  • 80.
    Subject: [Rocks-Discuss]Non-homogenous legacyhardware I am integrating legacy systems into a ROCKS cluster, and have hit a snag with the auto-partition configuration: The new (old) systems have SCSI disks, while old (new) ones contain IDE. This is a non-issue so long as the initial install does its default partitioning. However, I have a "replace-auto-partition.xml" file which is unworkable for the SCSI based systems since it makes specific reference to "hda" rather than "sda." I would like to have a site-nodes/replace-auto-partition.xml file with a conditional such that "hda" or "sda" is used, based on the name of the node (or some other criterion). Is this possible? Thanks, in advance. If this is out there on the mailing list archives, a pointer would be greatly appreciated. -Chris Dwan The University of Minnesota --__--__-- Message: 2 Date: Wed, 10 Dec 2003 12:09:11 -0800 From: Terrence Martin <tmartin at physics.ucsd.edu> To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> Subject: [Rocks-Discuss]Error during Make when building a new install floppy I get the following error when I try to rebuild a boot floppy for rocks. This is with the default CVS checkout with an update today according to the rocks userguide. I have not actually attempted to make any changes. make[3]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader' make[2]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3' strip -o loader anaconda-7.3/loader/loader strip: anaconda-7.3/loader/loader: No such file or directory make[1]: *** [loader] Error 1 make[1]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader' make: *** [loader] Error 2 Of course I could avoid all of this together and just put my binary module into the appropriate location in the boot image. Would it be correct to modify the following image file with my changes and then write it to a floppy via dd? /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3 /en/os/i386/images/bootnet.img
  • 81.
    Basically I aminjecting an updated e1000 driver with changes to pcitable to support the address of my gigabit cards. Terrence --__--__-- Message: 3 Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]Error during Make when building a new install floppy To: Terrence Martin <tmartin at physics.ucsd.edu> Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, Terrence Martin wrote: > I get the following error when I try to rebuild a boot floppy for rocks. > You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at least it wasn't the last time I checked > Of course I could avoid all of this together and just put my binary > module into the appropriate location in the boot image. > > Would it be correct to modify the following image file with my changes > and then write it to a floppy via dd? > > /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3 /en/os/i386/images/bootnet.img > > Basically I am injecting an updated e1000 driver with changes to > pcitable to support the address of my gigabit cards. Modifiying the bootnet.img is about 1/3 of what you need to do if you go down that path. You also need to work on netstg1.img and you'll need to update the drive in the kernel rpm that gets installed on the box. None of this is trivial. If it were me, I would go down the same path I took for updating the AIC79XX driver https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003 533.html Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support
  • 82.
    --__--__-- Message: 4 Date: Wed,10 Dec 2003 12:52:38 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> Cc: npaci-rocks-discussion at sdsc.edu Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote: > > I am integrating legacy systems into a ROCKS cluster, and have hit a > snag with the auto-partition configuration: The new (old) systems have > SCSI disks, while old (new) ones contain IDE. This is a non-issue so > long as the initial install does its default partitioning. However, I > have a "replace-auto-partition.xml" file which is unworkable for the SCSI > based systems since it makes specific reference to "hda" rather than > "sda." If you have just a single drive, then you should be able to skip the "--ondisk" bits of your "part" command Otherwise, you would have first to do something ugly like the following: http://penguin.epfl.ch/slides/kickstart/ks.cfg You could probably (maybe) wrap most of that in an <eval sh="bash"> </eval> block in the <main> block. Just guessing.. haven't tried this. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support --__--__-- Message: 5 From: Jag <agrajag at dragaera.net> To: npaci-rocks-discussion at sdsc.edu Date: Wed, 10 Dec 2003 13:21:07 -0500 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia I noticed a previous post on this list (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934 .html) indicating that Rocks distributes ssh keys for all the nodes over ganglia. Can anyone enlighten me as to how this is done?
  • 83.
    I looked throughthe ganglia docs and didn't see anything indicating how to do this, so I'm assuming Rocks made some changes. Unfortunately the rocks iso images don't seem to contain srpms, so I'm now coming here. What did Rocks do to ganglia to make the distribution of ssh keys work? Also, does anyone know where Rocks SRPMs can be found? I've done quite a bit of searching, but haven't found them anywhere. --__--__-- Message: 6 Cc: npaci-rocks-discussion at sdsc.edu From: "Mason J. Katz" <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia Date: Wed, 10 Dec 2003 14:39:15 -0800 To: Jag <agrajag at dragaera.net> Most of the SRPMS are on our FTP site, but we've screwed this up before. The SRPMS are entirely Rocks specific so they are of little value outside of Rocks. You can also checkout our CVS tree (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We have a ganglia-python package we created to allow us to write our own metrics at a high level than the provide gmetric application. We've also moved from this method to a single cluster-wide ssh key for Rocks 3.1. -mjk On Dec 10, 2003, at 10:21 AM, Jag wrote: > I noticed a previous post on this list > (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ > 001934.html) indicating that Rocks distributes ssh keys for all the > nodes over > ganglia. Can anyone enlighten me as to how this is done? > > I looked through the ganglia docs and didn't see anything indicating > how > to do this, so I'm assuming Rocks made some changes. Unfortunately the > rocks iso images don't seem to contain srpms, so I'm now coming here. > What did Rocks do to ganglia to make the distribution of ssh keys work? > > Also, does anyone know where Rocks SRPMs can be found? I've done quite > a bit of searching, but haven't found them anywhere. --__--__-- Message: 7 Date: Wed, 10 Dec 2003 14:43:49 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro
  • 84.
    When I runthis: [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist --dist=cdrom cdrom on a server installed with ROCKS 3.0.0, I eventually get this: > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Adding support for rebuild distribution from source > Creating files (symbolic links - fast) > Creating symlinks to kickstart files > Fixing Comps Database > Generating hdlist (rpm database) > Patching second stage loader (eKV, partioning, ...) > patching "rocks-ekv" into distribution ... > patching "rocks-piece-pipe" into distribution ... > patching "PyXML" into distribution ... > patching "expat" into distribution ... > patching "rocks-pylib" into distribution ... > patching "MySQL-python" into distribution ... > patching "rocks-kickstart" into distribution ... > patching "rocks-kickstart-profiles" into distribution ... > patching "rocks-kickstart-dtds" into distribution ... > building CRAM filesystem ... > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Creating symlinks to kickstart files > Generating hdlist (rpm database) > Segregating RPMs (rocks, non-rocks) > sh: ./kickstart.cgi: No such file or directory > sh: ./kickstart.cgi: No such file or directory > Traceback (innermost last): > File "/opt/rocks/bin/rocks-dist", line 807, in ? > app.run() > File "/opt/rocks/bin/rocks-dist", line 623, in run > eval('self.command_%s()' % (command)) > File "<string>", line 0, in ? > File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > builder.build() > File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > (rocks, nonrocks) = self.segregateRPMS() > File "/opt/rocks/lib/python/rocks/build.py", line 1107, in segregateRPMS > for pkg in ks.getSection('packages'): > TypeError: loop over non-sequence Any ideas? -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715
  • 85.
    See pictures fromour trip to China at http://www.sagacitech.com/Chinaweb --__--__-- Message: 8 Cc: rocks <npaci-rocks-discussion at sdsc.edu> From: Greg Bruno <bruno at rocksclusters.org> Subject: Re: [Rocks-Discuss]one node short in "labels" Date: Wed, 10 Dec 2003 15:12:49 -0800 To: Vincent Fox <vincent_b_fox at yahoo.com> > So I go to the "labels" selection on the web page to print out the=20 > pretty labels. What a nice idea by the way! > =A0 > EXCEPT....it's one node short! I go up to 0-13 and this stops at=20 > 0-12.=A0 Any ideas where I should check to fix this? yeah, we found this corner case -- it'll be fixed in the next release. thanks for bug report. - gb --__--__-- Message: 9 Cc: npaci-rocks-discussion at sdsc.edu From: "Mason J. Katz" <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Date: Wed, 10 Dec 2003 15:16:27 -0800 To: "V. Rowley" <vrowley at ucsd.edu> It looks like someone moved the profiles directory to profiles.orig. -mjk [root at rocks14 install]# ls -l total 56 drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 ftp.rocksclusters.org drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 ftp.rocksclusters.org.orig -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > When I run this:
  • 86.
    > > [root atrocks14 install]# rocks-dist mirror ; rocks-dist dist ; > rocks-dist --dist=cdrom cdrom > > on a server installed with ROCKS 3.0.0, I eventually get this: > >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Adding support for rebuild distribution from source >> Creating files (symbolic links - fast) >> Creating symlinks to kickstart files >> Fixing Comps Database >> Generating hdlist (rpm database) >> Patching second stage loader (eKV, partioning, ...) >> patching "rocks-ekv" into distribution ... >> patching "rocks-piece-pipe" into distribution ... >> patching "PyXML" into distribution ... >> patching "expat" into distribution ... >> patching "rocks-pylib" into distribution ... >> patching "MySQL-python" into distribution ... >> patching "rocks-kickstart" into distribution ... >> patching "rocks-kickstart-profiles" into distribution ... >> patching "rocks-kickstart-dtds" into distribution ... >> building CRAM filesystem ... >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Creating symlinks to kickstart files >> Generating hdlist (rpm database) >> Segregating RPMs (rocks, non-rocks) >> sh: ./kickstart.cgi: No such file or directory >> sh: ./kickstart.cgi: No such file or directory >> Traceback (innermost last): >> File "/opt/rocks/bin/rocks-dist", line 807, in ? >> app.run() >> File "/opt/rocks/bin/rocks-dist", line 623, in run >> eval('self.command_%s()' % (command)) >> File "<string>", line 0, in ? >> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >> builder.build() >> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >> (rocks, nonrocks) = self.segregateRPMS() >> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >> segregateRPMS >> for pkg in ks.getSection('packages'): >> TypeError: loop over non-sequence > > Any ideas? > > -- > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at
  • 87.
    > http://www.sagacitech.com/Chinaweb --__--__-- Message: 10 Date:Wed, 10 Dec 2003 16:50:16 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: "Mason J. Katz" <mjk at sdsc.edu> CC: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Yep, I did that, but only *AFTER* getting the error. [Thought it was generated by the rocks-dist sequence, but apparently not.] Go ahead. Move it back. Same difference. Vicky Mason J. Katz wrote: > It looks like someone moved the profiles directory to profiles.orig. > > -mjk > > > [root at rocks14 install]# ls -l > total 56 > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > ftp.rocksclusters.org > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > ftp.rocksclusters.org.orig > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > >> When I run this: >> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >> rocks-dist --dist=cdrom cdrom >> >> on a server installed with ROCKS 3.0.0, I eventually get this: >> >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Adding support for rebuild distribution from source >>> Creating files (symbolic links - fast) >>> Creating symlinks to kickstart files >>> Fixing Comps Database >>> Generating hdlist (rpm database) >>> Patching second stage loader (eKV, partioning, ...)
  • 88.
    >>> patching "rocks-ekv" into distribution ... >>> patching "rocks-piece-pipe" into distribution ... >>> patching "PyXML" into distribution ... >>> patching "expat" into distribution ... >>> patching "rocks-pylib" into distribution ... >>> patching "MySQL-python" into distribution ... >>> patching "rocks-kickstart" into distribution ... >>> patching "rocks-kickstart-profiles" into distribution ... >>> patching "rocks-kickstart-dtds" into distribution ... >>> building CRAM filesystem ... >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Creating symlinks to kickstart files >>> Generating hdlist (rpm database) >>> Segregating RPMs (rocks, non-rocks) >>> sh: ./kickstart.cgi: No such file or directory >>> sh: ./kickstart.cgi: No such file or directory >>> Traceback (innermost last): >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>> app.run() >>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>> eval('self.command_%s()' % (command)) >>> File "<string>", line 0, in ? >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>> builder.build() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>> (rocks, nonrocks) = self.segregateRPMS() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>> segregateRPMS >>> for pkg in ks.getSection('packages'): >>> TypeError: loop over non-sequence >> >> >> Any ideas? >> >> -- >> Vicky Rowley email: vrowley at ucsd.edu >> Biomedical Informatics Research Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at
  • 89.
    http://www.sagacitech.com/Chinaweb --__--__-- Message: 11 Date: Wed,10 Dec 2003 17:23:25 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro To: "V. Rowley" <vrowley at ucsd.edu> Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, V. Rowley wrote: Did you remove python by chance? kickstart.cgi calls python directly in /usr/bin/python while rocks-dist does an "env python" Tim > Yep, I did that, but only *AFTER* getting the error. [Thought it was > generated by the rocks-dist sequence, but apparently not.] Go ahead. > Move it back. Same difference. > > Vicky > > Mason J. Katz wrote: > > It looks like someone moved the profiles directory to profiles.orig. > > > > -mjk > > > > > > [root at rocks14 install]# ls -l > > total 56 > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > > ftp.rocksclusters.org > > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > > ftp.rocksclusters.org.orig > > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > > > >> When I run this: > >> > >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > >> rocks-dist --dist=cdrom cdrom > >> > >> on a server installed with ROCKS 3.0.0, I eventually get this:
  • 90.
    > >> > >>>Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Adding support for rebuild distribution from source > >>> Creating files (symbolic links - fast) > >>> Creating symlinks to kickstart files > >>> Fixing Comps Database > >>> Generating hdlist (rpm database) > >>> Patching second stage loader (eKV, partioning, ...) > >>> patching "rocks-ekv" into distribution ... > >>> patching "rocks-piece-pipe" into distribution ... > >>> patching "PyXML" into distribution ... > >>> patching "expat" into distribution ... > >>> patching "rocks-pylib" into distribution ... > >>> patching "MySQL-python" into distribution ... > >>> patching "rocks-kickstart" into distribution ... > >>> patching "rocks-kickstart-profiles" into distribution ... > >>> patching "rocks-kickstart-dtds" into distribution ... > >>> building CRAM filesystem ... > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Creating symlinks to kickstart files > >>> Generating hdlist (rpm database) > >>> Segregating RPMs (rocks, non-rocks) > >>> sh: ./kickstart.cgi: No such file or directory > >>> sh: ./kickstart.cgi: No such file or directory > >>> Traceback (innermost last): > >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? > >>> app.run() > >>> File "/opt/rocks/bin/rocks-dist", line 623, in run > >>> eval('self.command_%s()' % (command)) > >>> File "<string>", line 0, in ? > >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > >>> builder.build() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > >>> (rocks, nonrocks) = self.segregateRPMS() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in > >>> segregateRPMS > >>> for pkg in ks.getSection('packages'): > >>> TypeError: loop over non-sequence > >> > >> > >> Any ideas? > >> > >> -- > >> Vicky Rowley email: vrowley at ucsd.edu > >> Biomedical Informatics Research Network work: (858) 536-5980 > >> University of California, San Diego fax: (858) 822-0828 > >> 9500 Gilman Drive > >> La Jolla, CA 92093-0715 > >> > >> > >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > > > > >
  • 91.
    > > -- > VickyRowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > --__--__-- _______________________________________________ npaci-rocks-discussion mailing list npaci-rocks-discussion at sdsc.edu http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest DISCLAIMER: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From tmartin at physics.ucsd.edu Wed Dec 10 18:03:41 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Wed, 10 Dec 2003 18:03:41 -0800 Subject: [Rocks-Discuss]Rocks 3.0.0 Message-ID: <3FD7D07D.8090108@physics.ucsd.edu> I am having a problem on install of rocks 3.0.0 on my new cluster. The python error occurs right after anaconda starts and just before the install asks for the roll CDROM. The error refers to an inability to find or load rocks.file. The error is associated I think with the window that pops up and asks you in put the roll CDROM in. The process I followed to get to this point is Put the Rocks 3.0.0 CDROM into the CDROM drive Boot the system At the prompt type frontend Wait till anaconda starts Error referring to unable to load rocks.file. I have successfully installed rocks on a smaller cluster but that has
  • 92.
    different hardware. Iused the same CDROM for both installs. Any thoughts? Terrence From vrowley at ucsd.edu Wed Dec 10 19:52:49 2003 From: vrowley at ucsd.edu (V. Rowley) Date: Wed, 10 Dec 2003 19:52:49 -0800 Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro In-Reply-To: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov> References: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov> Message-ID: <3FD7EA11.10204@ucsd.edu> Looks like python is okay: > [root at rocks14 birn-oracle1]# which python > /usr/bin/python > [root at rocks14 birn-oracle1]# python --help > Unknown option: -- > usage: python [option] ... [-c cmd | file | -] [arg] ... > Options and arguments (and corresponding environment variables): > -d : debug output from parser (also PYTHONDEBUG=x) > -i : inspect interactively after running script, (also PYTHONINSPECT=x) > and force prompts, even if stdin does not appear to be a terminal > -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x) > -OO : remove doc-strings in addition to the -O optimizations > -S : don't imply 'import site' on initialization > -t : issue warnings about inconsistent tab usage (-tt: issue errors) > -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x) > -v : verbose (trace import statements) (also PYTHONVERBOSE=x) > -x : skip first line of source, allowing use of non-Unix forms of #!cmd > -X : disable class based built-in exceptions > -c cmd : program passed in as string (terminates option list) > file : program read from script file > - : program read from stdin (default; interactive mode if a tty) > arg ...: arguments passed to program in sys.argv[1:] > Other environment variables: > PYTHONSTARTUP: file executed on interactive startup (no default) > PYTHONPATH : ':'-separated list of directories prefixed to the > default module search path. The result is sys.path. > PYTHONHOME : alternate <prefix> directory (or <prefix>:<exec_prefix>). > The default module search path uses <prefix>/python1.5. > [root at rocks14 birn-oracle1]# Tim Carlson wrote: > On Wed, 10 Dec 2003, V. Rowley wrote: > > Did you remove python by chance? kickstart.cgi calls python directly in > /usr/bin/python while rocks-dist does an "env python" > > Tim >
  • 93.
    > >>Yep, I didthat, but only *AFTER* getting the error. [Thought it was >>generated by the rocks-dist sequence, but apparently not.] Go ahead. >>Move it back. Same difference. >> >>Vicky >> >>Mason J. Katz wrote: >> >>>It looks like someone moved the profiles directory to profiles.orig. >>> >>> -mjk >>> >>> >>>[root at rocks14 install]# ls -l >>>total 56 >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>>drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 >>>ftp.rocksclusters.org >>>drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 >>>ftp.rocksclusters.org.orig >>>-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi >>>drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>>drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>>drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >>> >>> >>>>When I run this: >>>> >>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >>>>rocks-dist --dist=cdrom cdrom >>>> >>>>on a server installed with ROCKS 3.0.0, I eventually get this: >>>> >>>> >>>>>Cleaning distribution >>>>>Resolving versions (RPMs) >>>>>Resolving versions (SRPMs) >>>>>Adding support for rebuild distribution from source >>>>>Creating files (symbolic links - fast) >>>>>Creating symlinks to kickstart files >>>>>Fixing Comps Database >>>>>Generating hdlist (rpm database) >>>>>Patching second stage loader (eKV, partioning, ...) >>>>> patching "rocks-ekv" into distribution ... >>>>> patching "rocks-piece-pipe" into distribution ... >>>>> patching "PyXML" into distribution ... >>>>> patching "expat" into distribution ... >>>>> patching "rocks-pylib" into distribution ... >>>>> patching "MySQL-python" into distribution ... >>>>> patching "rocks-kickstart" into distribution ... >>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>> building CRAM filesystem ... >>>>>Cleaning distribution
  • 94.
    >>>>>Resolving versions (RPMs) >>>>>Resolvingversions (SRPMs) >>>>>Creating symlinks to kickstart files >>>>>Generating hdlist (rpm database) >>>>>Segregating RPMs (rocks, non-rocks) >>>>>sh: ./kickstart.cgi: No such file or directory >>>>>sh: ./kickstart.cgi: No such file or directory >>>>>Traceback (innermost last): >>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>> app.run() >>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>> eval('self.command_%s()' % (command)) >>>>> File "<string>", line 0, in ? >>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>> builder.build() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>> (rocks, nonrocks) = self.segregateRPMS() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>>>>segregateRPMS >>>>> for pkg in ks.getSection('packages'): >>>>>TypeError: loop over non-sequence >>>> >>>> >>>>Any ideas? >>>> >>>>-- >>>>Vicky Rowley email: vrowley at ucsd.edu >>>>Biomedical Informatics Research Network work: (858) 536-5980 >>>>University of California, San Diego fax: (858) 822-0828 >>>>9500 Gilman Drive >>>>La Jolla, CA 92093-0715 >>>> >>>> >>>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >>> >>> >>> >>-- >>Vicky Rowley email: vrowley at ucsd.edu >>Biomedical Informatics Research Network work: (858) 536-5980 >>University of California, San Diego fax: (858) 822-0828 >>9500 Gilman Drive >>La Jolla, CA 92093-0715 >> >> >>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >> >> > > > > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715
  • 95.
    See pictures fromour trip to China at http://www.sagacitech.com/Chinaweb From wyzhong78 at msn.com Wed Dec 10 20:38:53 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Thu, 11 Dec 2003 12:38:53 +0800 Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up Message-ID: <BAY3-F3296PnPlpNvHX000097eb@hotmail.com> >From: Greg Bruno <bruno at rocksclusters.org> >To: "zhong wenyu" <wyzhong78 at msn.com> >CC: npaci-rocks-discussion at sdsc.edu >Subject: Re: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up >Date: Mon, 8 Dec 2003 15:31:08 -0800 > >>I have installed Rocks 3.0.0 with default options successful,there >>was not any trouble.But I boot it up,it stopped at beginning,just >>show "GRUB" on the screen and waiting... > >when you built the frontend, did you start with the rocks base CD >then add the HPC roll? > > - gb > I have raveled out this trouble.But I don't know why. I have one SCSI harddisk and one IDE disk On the frontend,I choose SCSI to be the first HDD and installed "/" on it.then it can not boot up.Even disabled the IDE HDD and install it again,It can not boot up also.at last I choose SCSI to be the first HDD and install,then choose IDE HDD to be the first and boot up, it's ok! GRUB must be installed on IDE HDD? thanks! _________________________________________________________________ ??????????????? MSN Hotmail? http://www.hotmail.com From wyzhong78 at msn.com Wed Dec 10 20:44:09 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Thu, 11 Dec 2003 12:44:09 +0800 Subject: [Rocks-Discuss]I can't use xpbs in rocks Message-ID: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com> Hi,everyone! I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of them. typed:xpbs[enter] showed:xpbs: initialization failed! output: invalid command name "Pref_Init" thanks! _________________________________________________________________ ?????????????? MSN Messenger: http://messenger.msn.com/cn
  • 96.
    From phil atsdsc.edu Wed Dec 10 21:26:50 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Wed, 10 Dec 2003 21:26:50 -0800 Subject: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up In-Reply-To: <BAY3-F3296PnPlpNvHX000097eb@hotmail.com> References: <BAY3-F3296PnPlpNvHX000097eb@hotmail.com> Message-ID: <3FD8001A.9030702@sdsc.edu> There is a conflict in the way the BIOS numbers drives and the way the install kernel numbers the drive (and this is not standard). You should check in your BIOS if you can select which is the boot device. If it just says "Hard Disk" (no choice between IDE and SCSI), then you are stuck with needing to have Grub on the device that BIOS thinks is the boot device. If you can choose, then SCSI can probably be made to work. These sorts of issues (this is a general redhat/linux problem) can be quite troublesome (and annoying). We had some older HW that had two different types of SCSI controllers with drives on each controller. The boot kernel labeled the /sda differently than the BIOS. Install went fine, by the dreaded "OS Not Found" BIOS message when rebooting. The cause was that the Grub loader was being put on Linux's notion of /sda, but when BIOS loaded, it found nothing (because grub was installed on BIOS's idea of /sdb). For this particular machine, we were not able to change BIOSes notion -- we had to force Linux to boot the bootloader on linuxes idea of /sdb. -P zhong wenyu wrote: > > > >> From: Greg Bruno <bruno at rocksclusters.org> >> To: "zhong wenyu" <wyzhong78 at msn.com> >> CC: npaci-rocks-discussion at sdsc.edu >> Subject: Re: [Rocks-Discuss]Rocks 3.0.0 problem:not able to boot up >> Date: Mon, 8 Dec 2003 15:31:08 -0800 >> >>> I have installed Rocks 3.0.0 with default options successful,there >>> was not any trouble.But I boot it up,it stopped at beginning,just >>> show "GRUB" on the screen and waiting... >> >> >> when you built the frontend, did you start with the rocks base CD >> then add the HPC roll?
  • 97.
    >> >> - gb >> >I have raveled out this trouble.But I don't know why. > I have one SCSI harddisk and one IDE disk On the frontend,I choose > SCSI to be the first HDD and installed "/" on it.then it can not boot > up.Even disabled the IDE HDD and install it again,It can not boot up > also.at last I choose SCSI to be the first HDD and install,then choose > IDE HDD to be the first and boot up, it's ok! > GRUB must be installed on IDE HDD? > thanks! > > _________________________________________________________________ > ??????????????? MSN Hotmail? http://www.hotmail.com From mjk at sdsc.edu Wed Dec 10 22:04:57 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 10 Dec 2003 22:04:57 -0800 Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro In-Reply-To: <3FD7EA11.10204@ucsd.edu> References: <Pine.GSO.4.44.0312101722470.711-100000@poincare.emsl.pnl.gov> <3FD7EA11.10204@ucsd.edu> Message-ID: <F23F7B5E-2B9F-11D8-981C-000A95DA5638@sdsc.edu> Hi Vicky, The following directory cannot resolve its symlinks anymore. If you start removing the profiles and mirror directories around Rocks cannot find them to build kickstart files. -mjk [root at rocks14 default]# ls -l total 16 lrwxrwxrwx 1 root root 113 Nov 13 20:19 core.xml -> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ 7.3/en/os/i386/build/graphs/default/core.xml -rwxrwsr-x 1 root wheel 3123 Sep 3 17:10 hpc.xml -rwxr-xr-x 1 root root 495 Sep 9 22:55 patch.xml -rwxrwsr-x 1 root wheel 452 Sep 3 17:10 root.xml lrwxrwxrwx 1 root root 112 Nov 13 20:19 rsh.xml -> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ 7.3/en/os/i386/build/graphs/default/rsh.xml -rwxrwsr-x 1 root wheel 923 Sep 3 17:10 sge.xml On Dec 10, 2003, at 7:52 PM, V. Rowley wrote: > Looks like python is okay: > >> [root at rocks14 birn-oracle1]# which python >> /usr/bin/python >> [root at rocks14 birn-oracle1]# python --help >> Unknown option: -- >> usage: python [option] ... [-c cmd | file | -] [arg] ...
  • 98.
    >> Options andarguments (and corresponding environment variables): >> -d : debug output from parser (also PYTHONDEBUG=x) >> -i : inspect interactively after running script, (also >> PYTHONINSPECT=x) >> and force prompts, even if stdin does not appear to be a >> terminal >> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x) >> -OO : remove doc-strings in addition to the -O optimizations >> -S : don't imply 'import site' on initialization >> -t : issue warnings about inconsistent tab usage (-tt: issue >> errors) >> -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x) >> -v : verbose (trace import statements) (also PYTHONVERBOSE=x) >> -x : skip first line of source, allowing use of non-Unix forms of >> #!cmd >> -X : disable class based built-in exceptions >> -c cmd : program passed in as string (terminates option list) >> file : program read from script file >> - : program read from stdin (default; interactive mode if a tty) >> arg ...: arguments passed to program in sys.argv[1:] >> Other environment variables: >> PYTHONSTARTUP: file executed on interactive startup (no default) >> PYTHONPATH : ':'-separated list of directories prefixed to the >> default module search path. The result is sys.path. >> PYTHONHOME : alternate <prefix> directory (or >> <prefix>:<exec_prefix>). >> The default module search path uses <prefix>/python1.5. >> [root at rocks14 birn-oracle1]# > > > > Tim Carlson wrote: >> On Wed, 10 Dec 2003, V. Rowley wrote: >> Did you remove python by chance? kickstart.cgi calls python directly >> in >> /usr/bin/python while rocks-dist does an "env python" >> Tim >>> Yep, I did that, but only *AFTER* getting the error. [Thought it was >>> generated by the rocks-dist sequence, but apparently not.] Go ahead. >>> Move it back. Same difference. >>> >>> Vicky >>> >>> Mason J. Katz wrote: >>> >>>> It looks like someone moved the profiles directory to profiles.orig. >>>> >>>> -mjk >>>> >>>> >>>> [root at rocks14 install]# ls -l >>>> total 56 >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 >>>> ftp.rocksclusters.org >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 >>>> ftp.rocksclusters.org.orig >>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40
  • 99.
    >>>> kickstart.cgi >>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 >>>> profiles.orig >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 >>>> rocks-dist.orig >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >>>> >>>> >>>>> When I run this: >>>>> >>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >>>>> rocks-dist --dist=cdrom cdrom >>>>> >>>>> on a server installed with ROCKS 3.0.0, I eventually get this: >>>>> >>>>> >>>>>> Cleaning distribution >>>>>> Resolving versions (RPMs) >>>>>> Resolving versions (SRPMs) >>>>>> Adding support for rebuild distribution from source >>>>>> Creating files (symbolic links - fast) >>>>>> Creating symlinks to kickstart files >>>>>> Fixing Comps Database >>>>>> Generating hdlist (rpm database) >>>>>> Patching second stage loader (eKV, partioning, ...) >>>>>> patching "rocks-ekv" into distribution ... >>>>>> patching "rocks-piece-pipe" into distribution ... >>>>>> patching "PyXML" into distribution ... >>>>>> patching "expat" into distribution ... >>>>>> patching "rocks-pylib" into distribution ... >>>>>> patching "MySQL-python" into distribution ... >>>>>> patching "rocks-kickstart" into distribution ... >>>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>>> building CRAM filesystem ... >>>>>> Cleaning distribution >>>>>> Resolving versions (RPMs) >>>>>> Resolving versions (SRPMs) >>>>>> Creating symlinks to kickstart files >>>>>> Generating hdlist (rpm database) >>>>>> Segregating RPMs (rocks, non-rocks) >>>>>> sh: ./kickstart.cgi: No such file or directory >>>>>> sh: ./kickstart.cgi: No such file or directory >>>>>> Traceback (innermost last): >>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>>> app.run() >>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>>> eval('self.command_%s()' % (command)) >>>>>> File "<string>", line 0, in ? >>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>>> builder.build() >>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>>> (rocks, nonrocks) = self.segregateRPMS() >>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>>>>> segregateRPMS >>>>>> for pkg in ks.getSection('packages'):
  • 100.
    >>>>>> TypeError: loopover non-sequence >>>>> >>>>> >>>>> Any ideas? >>>>> >>>>> -- >>>>> Vicky Rowley email: vrowley at ucsd.edu >>>>> Biomedical Informatics Research Network work: (858) 536-5980 >>>>> University of California, San Diego fax: (858) 822-0828 >>>>> 9500 Gilman Drive >>>>> La Jolla, CA 92093-0715 >>>>> >>>>> >>>>> See pictures from our trip to China at >>>>> http://www.sagacitech.com/Chinaweb >>>> >>>> >>>> >>> -- >>> Vicky Rowley email: vrowley at ucsd.edu >>> Biomedical Informatics Research Network work: (858) 536-5980 >>> University of California, San Diego fax: (858) 822-0828 >>> 9500 Gilman Drive >>> La Jolla, CA 92093-0715 >>> >>> >>> See pictures from our trip to China at >>> http://www.sagacitech.com/Chinaweb >>> >>> > > -- > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at > http://www.sagacitech.com/Chinaweb From bruno at rocksclusters.org Wed Dec 10 22:31:11 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 10 Dec 2003 22:31:11 -0800 Subject: [Rocks-Discuss]Rocks 3.0.0 In-Reply-To: <3FD7D07D.8090108@physics.ucsd.edu> References: <3FD7D07D.8090108@physics.ucsd.edu> Message-ID: <9C7EE8E9-2BA3-11D8-9715-000A95C4E3B4@rocksclusters.org> > I am having a problem on install of rocks 3.0.0 on my new cluster. > > The python error occurs right after anaconda starts and just before > the install asks for the roll CDROM. > > The error refers to an inability to find or load rocks.file. The error > is associated I think with the window that pops up and asks you in put
  • 101.
    > the roll CDROM in. > > The process I followed to get to this point is > > Put the Rocks 3.0.0 CDROM into the CDROM drive > Boot the system > At the prompt type frontend > Wait till anaconda starts > Error referring to unable to load rocks.file. > > I have successfully installed rocks on a smaller cluster but that has > different hardware. I used the same CDROM for both installs. > > Any thoughts? hard to say -- but some folks had similar problems due to bad memory: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-February/ 001246.html - gb From vincent_b_fox at yahoo.com Wed Dec 10 22:43:21 2003 From: vincent_b_fox at yahoo.com (Vincent Fox) Date: Wed, 10 Dec 2003 22:43:21 -0800 (PST) Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform In-Reply-To: <1B097BEE-2ADC-11D8-9715-000A95C4E3B4@rocksclusters.org> Message-ID: <20031211064321.41781.qmail@web14801.mail.yahoo.com> Okay, here's the context diff as plain text. I test-applied it using "patch -p0 < atlas.patch" and did a compile on my PII box successfully. I can send it as attachment or submit to CVS or some other way if you need: *** atlas.spec.in.orig Thu Dec 11 06:27:13 2003 --- atlas.spec.in Thu Dec 11 06:30:46 2003 *************** *** 111,117 **** --- 111,133 ---- y " | make + elif [ $CPUID -eq 4 ] + then + # + # Pentium II + # + echo "0 + y + y + n + y + linux + 0 + /usr/bin/g77 + -O + y + " | make else
  • 102.
    # Greg Bruno <brunoat rocksclusters.org> wrote: > Okay, came up my own quick hack: > > Edit atlas.spec.in, go to "other x86" section, remove > 2 lines right above "linux", seems to make rpm now. > > A more formal patch would be put in a section for > cpuid eq 4 with this correction I suppose. if you provide the patch, we'll include it in our next release. - gb --------------------------------- Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031210/be5c8b04/attachment-0001.html From naihh at imcb.a-star.edu.sg Thu Dec 11 00:08:14 2003 From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis) Date: Thu, 11 Dec 2003 16:08:14 +0800 Subject: [Rocks-Discuss]RE: Have anyone successfully build a set of grid compute nodes using Rocks? Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCDB9@EXIMCB2.imcb.a-star.edu.sg> Hi, Have anyone successfully build a set of grid compute nodes using Rocks 3? Anyone care to share? Nai Hong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: npaci-rocks-discussion-request at sdsc.edu [mailto:npaci-rocks-discussion-request at sdsc.edu] Sent: Thursday, December 11, 2003 11:54 AM To: npaci-rocks-discussion at sdsc.edu Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs Send npaci-rocks-discussion mailing list submissions to npaci-rocks-discussion at sdsc.edu To subscribe or unsubscribe via the World Wide Web, visit
  • 103.
    http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion or, via email,send a message with subject or body 'help' to npaci-rocks-discussion-request at sdsc.edu You can reach the person managing the list at npaci-rocks-discussion-admin at sdsc.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of npaci-rocks-discussion digest..." Today's Topics: 1. RE: Do you have a list of the various models of Gigabit Ethernet Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis) 2. Rocks 3.0.0 (Terrence Martin) 3. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) --__--__-- Message: 1 Date: Thu, 11 Dec 2003 09:45:18 +0800 From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg> To: <npaci-rocks-discussion at sdsc.edu> Subject: [Rocks-Discuss]RE: Do you have a list of the various models of Gigabit Ethernet Interfaces compatible to Rocks 3? Hi All, Do you have a list of the various gigabit Ethernet interfaces that are compatible to Rocks 3? I am changing my nodes connectivity from 10/100 to 1000. Have anyone done that and how are the differences in performance or turnaround time? Thanks and Regards Nai Hong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: npaci-rocks-discussion-request at sdsc.edu [mailto:npaci-rocks-discussion-request at sdsc.edu]=20 Sent: Thursday, December 11, 2003 9:25 AM To: npaci-rocks-discussion at sdsc.edu Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs Send npaci-rocks-discussion mailing list submissions to npaci-rocks-discussion at sdsc.edu
  • 104.
    To subscribe orunsubscribe via the World Wide Web, visit =09 http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion or, via email, send a message with subject or body 'help' to npaci-rocks-discussion-request at sdsc.edu You can reach the person managing the list at npaci-rocks-discussion-admin at sdsc.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of npaci-rocks-discussion digest..." Today's Topics: 1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) 2. Error during Make when building a new install floppy (Terrence Martin) 3. Re: Error during Make when building a new install floppy (Tim Carlson) 4. Re: Non-homogenous legacy hardware (Tim Carlson) 5. ssh_known_hosts and ganglia (Jag) 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) 7. "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 8. Re: one node short in "labels" (Greg Bruno) 9. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Mason J. Katz) 10. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 11. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Tim Carlson) -- __--__-- Message: 1 Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST) From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]Non-homogenous legacy hardware I am integrating legacy systems into a ROCKS cluster, and have hit a snag with the auto-partition configuration: The new (old) systems have SCSI disks, while old (new) ones contain IDE. This is a non-issue so long as the initial install does its default partitioning. However, I have a "replace-auto-partition.xml" file which is unworkable for the SCSI based systems since it makes specific reference to "hda" rather than "sda." I would like to have a site-nodes/replace-auto-partition.xml file with a conditional such that "hda" or "sda" is used, based on the name of the node (or some other criterion). Is this possible? Thanks, in advance. If this is out there on the mailing list archives,
  • 105.
    a pointer would begreatly appreciated. -Chris Dwan The University of Minnesota -- __--__-- Message: 2 Date: Wed, 10 Dec 2003 12:09:11 -0800 From: Terrence Martin <tmartin at physics.ucsd.edu> To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> Subject: [Rocks-Discuss]Error during Make when building a new install floppy I get the following error when I try to rebuild a boot floppy for rocks. This is with the default CVS checkout with an update today according to=20 the rocks userguide. I have not actually attempted to make any changes. make[3]: Leaving directory=20 `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader' make[2]: Leaving directory=20 `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3' strip -o loader anaconda-7.3/loader/loader strip: anaconda-7.3/loader/loader: No such file or directory make[1]: *** [loader] Error 1 make[1]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader' make: *** [loader] Error 2 Of course I could avoid all of this together and just put my binary=20 module into the appropriate location in the boot image. Would it be correct to modify the following image file with my changes=20 and then write it to a floppy via dd? /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3 /en/os/i386/images/bootnet.img Basically I am injecting an updated e1000 driver with changes to=20 pcitable to support the address of my gigabit cards. Terrence -- __--__-- Message: 3 Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]Error during Make when building a new install floppy To: Terrence Martin <tmartin at physics.ucsd.edu> Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> Reply-to: Tim Carlson <tim.carlson at pnl.gov>
  • 106.
    On Wed, 10Dec 2003, Terrence Martin wrote: > I get the following error when I try to rebuild a boot floppy for rocks. > You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at least it wasn't the last time I checked > Of course I could avoid all of this together and just put my binary > module into the appropriate location in the boot image. > > Would it be correct to modify the following image file with my changes > and then write it to a floppy via dd? > > /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3 /en/os/i386/images/bootnet.img > > Basically I am injecting an updated e1000 driver with changes to > pcitable to support the address of my gigabit cards. Modifiying the bootnet.img is about 1/3 of what you need to do if you go down that path. You also need to work on netstg1.img and you'll need to update the drive in the kernel rpm that gets installed on the box. None of this is trivial. If it were me, I would go down the same path I took for updating the AIC79XX driver https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003 533.html Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support -- __--__-- Message: 4 Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> Cc: npaci-rocks-discussion at sdsc.edu Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote: > > I am integrating legacy systems into a ROCKS cluster, and have hit a > snag with the auto-partition configuration: The new (old) systems have > SCSI disks, while old (new) ones contain IDE. This is a non-issue so
  • 107.
    > long asthe initial install does its default partitioning. However, I > have a "replace-auto-partition.xml" file which is unworkable for the SCSI > based systems since it makes specific reference to "hda" rather than > "sda." If you have just a single drive, then you should be able to skip the "--ondisk" bits of your "part" command Otherwise, you would have first to do something ugly like the following: http://penguin.epfl.ch/slides/kickstart/ks.cfg You could probably (maybe) wrap most of that in an <eval sh=3D"bash"> </eval> block in the <main> block. Just guessing.. haven't tried this. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support -- __--__-- Message: 5 From: Jag <agrajag at dragaera.net> To: npaci-rocks-discussion at sdsc.edu Date: Wed, 10 Dec 2003 13:21:07 -0500 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia I noticed a previous post on this list (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934 .html) indicating that Rocks distributes ssh keys for all the nodes over ganglia. Can anyone enlighten me as to how this is done? I looked through the ganglia docs and didn't see anything indicating how to do this, so I'm assuming Rocks made some changes. Unfortunately the rocks iso images don't seem to contain srpms, so I'm now coming here.=20 What did Rocks do to ganglia to make the distribution of ssh keys work? Also, does anyone know where Rocks SRPMs can be found? I've done quite a bit of searching, but haven't found them anywhere. -- __--__-- Message: 6 Cc: npaci-rocks-discussion at sdsc.edu From: "Mason J. Katz" <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia Date: Wed, 10 Dec 2003 14:39:15 -0800 To: Jag <agrajag at dragaera.net>
  • 108.
    Most of theSRPMS are on our FTP site, but we've screwed this up =20 before. The SRPMS are entirely Rocks specific so they are of little =20 value outside of Rocks. You can also checkout our CVS tree =20 (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We =20 have a ganglia-python package we created to allow us to write our own =20 metrics at a high level than the provide gmetric application. We've =20 also moved from this method to a single cluster-wide ssh key for Rocks =20 3.1. -mjk On Dec 10, 2003, at 10:21 AM, Jag wrote: > I noticed a previous post on this list > (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20 > 001934.html) indicating that Rocks distributes ssh keys for all the =20 > nodes over > ganglia. Can anyone enlighten me as to how this is done? > > I looked through the ganglia docs and didn't see anything indicating =20 > how > to do this, so I'm assuming Rocks made some changes. Unfortunately the > rocks iso images don't seem to contain srpms, so I'm now coming here. > What did Rocks do to ganglia to make the distribution of ssh keys work? > > Also, does anyone know where Rocks SRPMs can be found? I've done quite > a bit of searching, but haven't found them anywhere. -- __--__-- Message: 7 Date: Wed, 10 Dec 2003 14:43:49 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro When I run this: [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist --dist=3Dcdrom cdrom on a server installed with ROCKS 3.0.0, I eventually get this: > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Adding support for rebuild distribution from source
  • 109.
    > Creating files(symbolic links - fast) > Creating symlinks to kickstart files > Fixing Comps Database > Generating hdlist (rpm database) > Patching second stage loader (eKV, partioning, ...) > patching "rocks-ekv" into distribution ... > patching "rocks-piece-pipe" into distribution ... > patching "PyXML" into distribution ... > patching "expat" into distribution ... > patching "rocks-pylib" into distribution ... > patching "MySQL-python" into distribution ... > patching "rocks-kickstart" into distribution ... > patching "rocks-kickstart-profiles" into distribution ... > patching "rocks-kickstart-dtds" into distribution ... > building CRAM filesystem ... > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Creating symlinks to kickstart files > Generating hdlist (rpm database) > Segregating RPMs (rocks, non-rocks) > sh: ./kickstart.cgi: No such file or directory > sh: ./kickstart.cgi: No such file or directory > Traceback (innermost last): > File "/opt/rocks/bin/rocks-dist", line 807, in ? > app.run() > File "/opt/rocks/bin/rocks-dist", line 623, in run > eval('self.command_%s()' % (command)) > File "<string>", line 0, in ? > File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > builder.build() > File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > (rocks, nonrocks) =3D self.segregateRPMS() > File "/opt/rocks/lib/python/rocks/build.py", line 1107, in segregateRPMS > for pkg in ks.getSection('packages'): > TypeError: loop over non-sequence Any ideas? --=20 Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb -- __--__-- Message: 8 Cc: rocks <npaci-rocks-discussion at sdsc.edu> From: Greg Bruno <bruno at rocksclusters.org> Subject: Re: [Rocks-Discuss]one node short in "labels" Date: Wed, 10 Dec 2003 15:12:49 -0800
  • 110.
    To: Vincent Fox<vincent_b_fox at yahoo.com> > So I go to the "labels" selection on the web page to print out = the=3D20 > pretty labels. What a nice idea by the way! > =3DA0 > EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20 > 0-12.=3DA0 Any ideas where I should check to fix this? yeah, we found this corner case -- it'll be fixed in the next release. thanks for bug report. - gb -- __--__-- Message: 9 Cc: npaci-rocks-discussion at sdsc.edu From: "Mason J. Katz" <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Date: Wed, 10 Dec 2003 15:16:27 -0800 To: "V. Rowley" <vrowley at ucsd.edu> It looks like someone moved the profiles directory to profiles.orig. -mjk [root at rocks14 install]# ls -l total 56 drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20 ftp.rocksclusters.org drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20 ftp.rocksclusters.org.orig -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > When I run this: > > [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20 > rocks-dist --dist=3Dcdrom cdrom > > on a server installed with ROCKS 3.0.0, I eventually get this: > >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Adding support for rebuild distribution from source >> Creating files (symbolic links - fast)
  • 111.
    >> Creating symlinksto kickstart files >> Fixing Comps Database >> Generating hdlist (rpm database) >> Patching second stage loader (eKV, partioning, ...) >> patching "rocks-ekv" into distribution ... >> patching "rocks-piece-pipe" into distribution ... >> patching "PyXML" into distribution ... >> patching "expat" into distribution ... >> patching "rocks-pylib" into distribution ... >> patching "MySQL-python" into distribution ... >> patching "rocks-kickstart" into distribution ... >> patching "rocks-kickstart-profiles" into distribution ... >> patching "rocks-kickstart-dtds" into distribution ... >> building CRAM filesystem ... >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Creating symlinks to kickstart files >> Generating hdlist (rpm database) >> Segregating RPMs (rocks, non-rocks) >> sh: ./kickstart.cgi: No such file or directory >> sh: ./kickstart.cgi: No such file or directory >> Traceback (innermost last): >> File "/opt/rocks/bin/rocks-dist", line 807, in ? >> app.run() >> File "/opt/rocks/bin/rocks-dist", line 623, in run >> eval('self.command_%s()' % (command)) >> File "<string>", line 0, in ? >> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >> builder.build() >> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >> (rocks, nonrocks) =3D self.segregateRPMS() >> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20 >> segregateRPMS >> for pkg in ks.getSection('packages'): >> TypeError: loop over non-sequence > > Any ideas? > > --=20 > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at=20 > http://www.sagacitech.com/Chinaweb -- __--__-- Message: 10 Date: Wed, 10 Dec 2003 16:50:16 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: "Mason J. Katz" <mjk at sdsc.edu> CC: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
  • 112.
    trying to buildCD distro Yep, I did that, but only *AFTER* getting the error. [Thought it was=20 generated by the rocks-dist sequence, but apparently not.] Go ahead.=20 Move it back. Same difference. Vicky Mason J. Katz wrote: > It looks like someone moved the profiles directory to profiles.orig. >=20 > -mjk >=20 >=20 > [root at rocks14 install]# ls -l > total 56 > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20 > ftp.rocksclusters.org > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20 > ftp.rocksclusters.org.orig > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >=20 >> When I run this: >> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20 >> rocks-dist --dist=3Dcdrom cdrom >> >> on a server installed with ROCKS 3.0.0, I eventually get this: >> >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Adding support for rebuild distribution from source >>> Creating files (symbolic links - fast) >>> Creating symlinks to kickstart files >>> Fixing Comps Database >>> Generating hdlist (rpm database) >>> Patching second stage loader (eKV, partioning, ...) >>> patching "rocks-ekv" into distribution ... >>> patching "rocks-piece-pipe" into distribution ... >>> patching "PyXML" into distribution ... >>> patching "expat" into distribution ... >>> patching "rocks-pylib" into distribution ... >>> patching "MySQL-python" into distribution ... >>> patching "rocks-kickstart" into distribution ... >>> patching "rocks-kickstart-profiles" into distribution ... >>> patching "rocks-kickstart-dtds" into distribution ... >>> building CRAM filesystem ... >>> Cleaning distribution
  • 113.
    >>> Resolving versions(RPMs) >>> Resolving versions (SRPMs) >>> Creating symlinks to kickstart files >>> Generating hdlist (rpm database) >>> Segregating RPMs (rocks, non-rocks) >>> sh: ./kickstart.cgi: No such file or directory >>> sh: ./kickstart.cgi: No such file or directory >>> Traceback (innermost last): >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>> app.run() >>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>> eval('self.command_%s()' % (command)) >>> File "<string>", line 0, in ? >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>> builder.build() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>> (rocks, nonrocks) =3D self.segregateRPMS() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20 >>> segregateRPMS >>> for pkg in ks.getSection('packages'): >>> TypeError: loop over non-sequence >> >> >> Any ideas? >> >> --=20 >> Vicky Rowley email: vrowley at ucsd.edu >> Biomedical Informatics Research Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >=20 >=20 >=20 --=20 Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb -- __--__-- Message: 11 Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro
  • 114.
    To: "V. Rowley"<vrowley at ucsd.edu> Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, V. Rowley wrote: Did you remove python by chance? kickstart.cgi calls python directly in /usr/bin/python while rocks-dist does an "env python" Tim > Yep, I did that, but only *AFTER* getting the error. [Thought it was > generated by the rocks-dist sequence, but apparently not.] Go ahead. > Move it back. Same difference. > > Vicky > > Mason J. Katz wrote: > > It looks like someone moved the profiles directory to profiles.orig. > > > > -mjk > > > > > > [root at rocks14 install]# ls -l > > total 56 > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > > ftp.rocksclusters.org > > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > > ftp.rocksclusters.org.orig > > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > > > >> When I run this: > >> > >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > >> rocks-dist --dist=3Dcdrom cdrom > >> > >> on a server installed with ROCKS 3.0.0, I eventually get this: > >> > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Adding support for rebuild distribution from source > >>> Creating files (symbolic links - fast) > >>> Creating symlinks to kickstart files > >>> Fixing Comps Database > >>> Generating hdlist (rpm database) > >>> Patching second stage loader (eKV, partioning, ...) > >>> patching "rocks-ekv" into distribution ...
  • 115.
    > >>> patching "rocks-piece-pipe" into distribution ... > >>> patching "PyXML" into distribution ... > >>> patching "expat" into distribution ... > >>> patching "rocks-pylib" into distribution ... > >>> patching "MySQL-python" into distribution ... > >>> patching "rocks-kickstart" into distribution ... > >>> patching "rocks-kickstart-profiles" into distribution ... > >>> patching "rocks-kickstart-dtds" into distribution ... > >>> building CRAM filesystem ... > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Creating symlinks to kickstart files > >>> Generating hdlist (rpm database) > >>> Segregating RPMs (rocks, non-rocks) > >>> sh: ./kickstart.cgi: No such file or directory > >>> sh: ./kickstart.cgi: No such file or directory > >>> Traceback (innermost last): > >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? > >>> app.run() > >>> File "/opt/rocks/bin/rocks-dist", line 623, in run > >>> eval('self.command_%s()' % (command)) > >>> File "<string>", line 0, in ? > >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > >>> builder.build() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > >>> (rocks, nonrocks) =3D self.segregateRPMS() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in > >>> segregateRPMS > >>> for pkg in ks.getSection('packages'): > >>> TypeError: loop over non-sequence > >> > >> > >> Any ideas? > >> > >> -- > >> Vicky Rowley email: vrowley at ucsd.edu > >> Biomedical Informatics Research Network work: (858) 536-5980 > >> University of California, San Diego fax: (858) 822-0828 > >> 9500 Gilman Drive > >> La Jolla, CA 92093-0715 > >> > >> > >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > > > > > > > -- > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
  • 116.
    > > -- __--__-- _______________________________________________ npaci-rocks-discussion mailinglist npaci-rocks-discussion at sdsc.edu http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest DISCLAIMER: This email is confidential and may be privileged. If you are not the = intended recipient, please delete it and notify us immediately. Please = do not copy or use it for any purpose, or disclose its contents to any = other person as it may be an offence under the Official Secrets Act. = Thank you. --__--__-- Message: 2 Date: Wed, 10 Dec 2003 18:03:41 -0800 From: Terrence Martin <tmartin at physics.ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]Rocks 3.0.0 I am having a problem on install of rocks 3.0.0 on my new cluster. The python error occurs right after anaconda starts and just before the install asks for the roll CDROM. The error refers to an inability to find or load rocks.file. The error is associated I think with the window that pops up and asks you in put the roll CDROM in. The process I followed to get to this point is Put the Rocks 3.0.0 CDROM into the CDROM drive Boot the system At the prompt type frontend Wait till anaconda starts Error referring to unable to load rocks.file. I have successfully installed rocks on a smaller cluster but that has different hardware. I used the same CDROM for both installs. Any thoughts? Terrence --__--__--
  • 117.
    Message: 3 Date: Wed,10 Dec 2003 19:52:49 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Looks like python is okay: > [root at rocks14 birn-oracle1]# which python > /usr/bin/python > [root at rocks14 birn-oracle1]# python --help > Unknown option: -- > usage: python [option] ... [-c cmd | file | -] [arg] ... > Options and arguments (and corresponding environment variables): > -d : debug output from parser (also PYTHONDEBUG=x) > -i : inspect interactively after running script, (also PYTHONINSPECT=x) > and force prompts, even if stdin does not appear to be a terminal > -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x) > -OO : remove doc-strings in addition to the -O optimizations > -S : don't imply 'import site' on initialization > -t : issue warnings about inconsistent tab usage (-tt: issue errors) > -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x) > -v : verbose (trace import statements) (also PYTHONVERBOSE=x) > -x : skip first line of source, allowing use of non-Unix forms of #!cmd > -X : disable class based built-in exceptions > -c cmd : program passed in as string (terminates option list) > file : program read from script file > - : program read from stdin (default; interactive mode if a tty) > arg ...: arguments passed to program in sys.argv[1:] > Other environment variables: > PYTHONSTARTUP: file executed on interactive startup (no default) > PYTHONPATH : ':'-separated list of directories prefixed to the > default module search path. The result is sys.path. > PYTHONHOME : alternate <prefix> directory (or <prefix>:<exec_prefix>). > The default module search path uses <prefix>/python1.5. > [root at rocks14 birn-oracle1]# Tim Carlson wrote: > On Wed, 10 Dec 2003, V. Rowley wrote: > > Did you remove python by chance? kickstart.cgi calls python directly in > /usr/bin/python while rocks-dist does an "env python" > > Tim > > >>Yep, I did that, but only *AFTER* getting the error. [Thought it was >>generated by the rocks-dist sequence, but apparently not.] Go ahead.
  • 118.
    >>Move it back.Same difference. >> >>Vicky >> >>Mason J. Katz wrote: >> >>>It looks like someone moved the profiles directory to profiles.orig. >>> >>> -mjk >>> >>> >>>[root at rocks14 install]# ls -l >>>total 56 >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>>drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 >>>ftp.rocksclusters.org >>>drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 >>>ftp.rocksclusters.org.orig >>>-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi >>>drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>>drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>>drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >>> >>> >>>>When I run this: >>>> >>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >>>>rocks-dist --dist=cdrom cdrom >>>> >>>>on a server installed with ROCKS 3.0.0, I eventually get this: >>>> >>>> >>>>>Cleaning distribution >>>>>Resolving versions (RPMs) >>>>>Resolving versions (SRPMs) >>>>>Adding support for rebuild distribution from source >>>>>Creating files (symbolic links - fast) >>>>>Creating symlinks to kickstart files >>>>>Fixing Comps Database >>>>>Generating hdlist (rpm database) >>>>>Patching second stage loader (eKV, partioning, ...) >>>>> patching "rocks-ekv" into distribution ... >>>>> patching "rocks-piece-pipe" into distribution ... >>>>> patching "PyXML" into distribution ... >>>>> patching "expat" into distribution ... >>>>> patching "rocks-pylib" into distribution ... >>>>> patching "MySQL-python" into distribution ... >>>>> patching "rocks-kickstart" into distribution ... >>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>> building CRAM filesystem ... >>>>>Cleaning distribution >>>>>Resolving versions (RPMs) >>>>>Resolving versions (SRPMs)
  • 119.
    >>>>>Creating symlinks tokickstart files >>>>>Generating hdlist (rpm database) >>>>>Segregating RPMs (rocks, non-rocks) >>>>>sh: ./kickstart.cgi: No such file or directory >>>>>sh: ./kickstart.cgi: No such file or directory >>>>>Traceback (innermost last): >>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>> app.run() >>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>> eval('self.command_%s()' % (command)) >>>>> File "<string>", line 0, in ? >>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>> builder.build() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>> (rocks, nonrocks) = self.segregateRPMS() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>>>>segregateRPMS >>>>> for pkg in ks.getSection('packages'): >>>>>TypeError: loop over non-sequence >>>> >>>> >>>>Any ideas? >>>> >>>>-- >>>>Vicky Rowley email: vrowley at ucsd.edu >>>>Biomedical Informatics Research Network work: (858) 536-5980 >>>>University of California, San Diego fax: (858) 822-0828 >>>>9500 Gilman Drive >>>>La Jolla, CA 92093-0715 >>>> >>>> >>>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >>> >>> >>> >>-- >>Vicky Rowley email: vrowley at ucsd.edu >>Biomedical Informatics Research Network work: (858) 536-5980 >>University of California, San Diego fax: (858) 822-0828 >>9500 Gilman Drive >>La Jolla, CA 92093-0715 >> >> >>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >> >> > > > > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715
  • 120.
    See pictures fromour trip to China at http://www.sagacitech.com/Chinaweb --__--__-- _______________________________________________ npaci-rocks-discussion mailing list npaci-rocks-discussion at sdsc.edu http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest DISCLAIMER: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From naihh at imcb.a-star.edu.sg Thu Dec 11 00:09:34 2003 From: naihh at imcb.a-star.edu.sg (Nai Hong Hwa Francis) Date: Thu, 11 Dec 2003 16:09:34 +0800 Subject: [Rocks-Discuss]RE: Install rocks on Titan64 Superblade Classic with Dual Opteron 244 Message-ID: <5E118EED7CC277468A275F11EEEC39B94CCDBA@EXIMCB2.imcb.a-star.edu.sg> Hi, Has anyone successfully install rocks on Titan64 Superblade Classic with Dual Opteron 244? Nai Hong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: npaci-rocks-discussion-request at sdsc.edu [mailto:npaci-rocks-discussion-request at sdsc.edu] Sent: Thursday, December 11, 2003 11:54 AM To: npaci-rocks-discussion at sdsc.edu Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs Send npaci-rocks-discussion mailing list submissions to npaci-rocks-discussion at sdsc.edu To subscribe or unsubscribe via the World Wide Web, visit http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion or, via email, send a message with subject or body 'help' to
  • 121.
    npaci-rocks-discussion-request at sdsc.edu Youcan reach the person managing the list at npaci-rocks-discussion-admin at sdsc.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of npaci-rocks-discussion digest..." Today's Topics: 1. RE: Do you have a list of the various models of Gigabit Ethernet Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis) 2. Rocks 3.0.0 (Terrence Martin) 3. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) --__--__-- Message: 1 Date: Thu, 11 Dec 2003 09:45:18 +0800 From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg> To: <npaci-rocks-discussion at sdsc.edu> Subject: [Rocks-Discuss]RE: Do you have a list of the various models of Gigabit Ethernet Interfaces compatible to Rocks 3? Hi All, Do you have a list of the various gigabit Ethernet interfaces that are compatible to Rocks 3? I am changing my nodes connectivity from 10/100 to 1000. Have anyone done that and how are the differences in performance or turnaround time? Have anyone successfully build a set of grid compute nodes using Rocks 3? Thanks and Regards Nai Hong Hwa Francis Institute of Molecular and Cell Biology (A*STAR) 30 Medical Drive Singapore 117609. DID: (65) 6874-6196 -----Original Message----- From: npaci-rocks-discussion-request at sdsc.edu [mailto:npaci-rocks-discussion-request at sdsc.edu]=20 Sent: Thursday, December 11, 2003 9:25 AM To: npaci-rocks-discussion at sdsc.edu Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs Send npaci-rocks-discussion mailing list submissions to npaci-rocks-discussion at sdsc.edu
  • 122.
    To subscribe orunsubscribe via the World Wide Web, visit =09 http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion or, via email, send a message with subject or body 'help' to npaci-rocks-discussion-request at sdsc.edu You can reach the person managing the list at npaci-rocks-discussion-admin at sdsc.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of npaci-rocks-discussion digest..." Today's Topics: 1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) 2. Error during Make when building a new install floppy (Terrence Martin) 3. Re: Error during Make when building a new install floppy (Tim Carlson) 4. Re: Non-homogenous legacy hardware (Tim Carlson) 5. ssh_known_hosts and ganglia (Jag) 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) 7. "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 8. Re: one node short in "labels" (Greg Bruno) 9. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Mason J. Katz) 10. Re: "TypeError: loop over non-sequence" when trying to build CD distro (V. Rowley) 11. Re: "TypeError: loop over non-sequence" when trying to build CD distro (Tim Carlson) -- __--__-- Message: 1 Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST) From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]Non-homogenous legacy hardware I am integrating legacy systems into a ROCKS cluster, and have hit a snag with the auto-partition configuration: The new (old) systems have SCSI disks, while old (new) ones contain IDE. This is a non-issue so long as the initial install does its default partitioning. However, I have a "replace-auto-partition.xml" file which is unworkable for the SCSI based systems since it makes specific reference to "hda" rather than "sda." I would like to have a site-nodes/replace-auto-partition.xml file with a conditional such that "hda" or "sda" is used, based on the name of the node (or some other criterion). Is this possible? Thanks, in advance. If this is out there on the mailing list archives,
  • 123.
    a pointer would begreatly appreciated. -Chris Dwan The University of Minnesota -- __--__-- Message: 2 Date: Wed, 10 Dec 2003 12:09:11 -0800 From: Terrence Martin <tmartin at physics.ucsd.edu> To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> Subject: [Rocks-Discuss]Error during Make when building a new install floppy I get the following error when I try to rebuild a boot floppy for rocks. This is with the default CVS checkout with an update today according to=20 the rocks userguide. I have not actually attempted to make any changes. make[3]: Leaving directory=20 `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader' make[2]: Leaving directory=20 `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3' strip -o loader anaconda-7.3/loader/loader strip: anaconda-7.3/loader/loader: No such file or directory make[1]: *** [loader] Error 1 make[1]: Leaving directory `/home/install/rocks/src/rocks/boot/7.3/loader' make: *** [loader] Error 2 Of course I could avoid all of this together and just put my binary=20 module into the appropriate location in the boot image. Would it be correct to modify the following image file with my changes=20 and then write it to a floppy via dd? /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3 /en/os/i386/images/bootnet.img Basically I am injecting an updated e1000 driver with changes to=20 pcitable to support the address of my gigabit cards. Terrence -- __--__-- Message: 3 Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]Error during Make when building a new install floppy To: Terrence Martin <tmartin at physics.ucsd.edu> Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> Reply-to: Tim Carlson <tim.carlson at pnl.gov>
  • 124.
    On Wed, 10Dec 2003, Terrence Martin wrote: > I get the following error when I try to rebuild a boot floppy for rocks. > You can't make a boot floppy with Rocks 3.0. That isn't supported. Or at least it wasn't the last time I checked > Of course I could avoid all of this together and just put my binary > module into the appropriate location in the boot image. > > Would it be correct to modify the following image file with my changes > and then write it to a floppy via dd? > > /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/7.3 /en/os/i386/images/bootnet.img > > Basically I am injecting an updated e1000 driver with changes to > pcitable to support the address of my gigabit cards. Modifiying the bootnet.img is about 1/3 of what you need to do if you go down that path. You also need to work on netstg1.img and you'll need to update the drive in the kernel rpm that gets installed on the box. None of this is trivial. If it were me, I would go down the same path I took for updating the AIC79XX driver https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003 533.html Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support -- __--__-- Message: 4 Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> Cc: npaci-rocks-discussion at sdsc.edu Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, Chris Dwan (CCGB) wrote: > > I am integrating legacy systems into a ROCKS cluster, and have hit a > snag with the auto-partition configuration: The new (old) systems have > SCSI disks, while old (new) ones contain IDE. This is a non-issue so
  • 125.
    > long asthe initial install does its default partitioning. However, I > have a "replace-auto-partition.xml" file which is unworkable for the SCSI > based systems since it makes specific reference to "hda" rather than > "sda." If you have just a single drive, then you should be able to skip the "--ondisk" bits of your "part" command Otherwise, you would have first to do something ugly like the following: http://penguin.epfl.ch/slides/kickstart/ks.cfg You could probably (maybe) wrap most of that in an <eval sh=3D"bash"> </eval> block in the <main> block. Just guessing.. haven't tried this. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support -- __--__-- Message: 5 From: Jag <agrajag at dragaera.net> To: npaci-rocks-discussion at sdsc.edu Date: Wed, 10 Dec 2003 13:21:07 -0500 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia I noticed a previous post on this list (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001934 .html) indicating that Rocks distributes ssh keys for all the nodes over ganglia. Can anyone enlighten me as to how this is done? I looked through the ganglia docs and didn't see anything indicating how to do this, so I'm assuming Rocks made some changes. Unfortunately the rocks iso images don't seem to contain srpms, so I'm now coming here.=20 What did Rocks do to ganglia to make the distribution of ssh keys work? Also, does anyone know where Rocks SRPMs can be found? I've done quite a bit of searching, but haven't found them anywhere. -- __--__-- Message: 6 Cc: npaci-rocks-discussion at sdsc.edu From: "Mason J. Katz" <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia Date: Wed, 10 Dec 2003 14:39:15 -0800 To: Jag <agrajag at dragaera.net>
  • 126.
    Most of theSRPMS are on our FTP site, but we've screwed this up =20 before. The SRPMS are entirely Rocks specific so they are of little =20 value outside of Rocks. You can also checkout our CVS tree =20 (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We =20 have a ganglia-python package we created to allow us to write our own =20 metrics at a high level than the provide gmetric application. We've =20 also moved from this method to a single cluster-wide ssh key for Rocks =20 3.1. -mjk On Dec 10, 2003, at 10:21 AM, Jag wrote: > I noticed a previous post on this list > (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20 > 001934.html) indicating that Rocks distributes ssh keys for all the =20 > nodes over > ganglia. Can anyone enlighten me as to how this is done? > > I looked through the ganglia docs and didn't see anything indicating =20 > how > to do this, so I'm assuming Rocks made some changes. Unfortunately the > rocks iso images don't seem to contain srpms, so I'm now coming here. > What did Rocks do to ganglia to make the distribution of ssh keys work? > > Also, does anyone know where Rocks SRPMs can be found? I've done quite > a bit of searching, but haven't found them anywhere. -- __--__-- Message: 7 Date: Wed, 10 Dec 2003 14:43:49 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro When I run this: [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; rocks-dist --dist=3Dcdrom cdrom on a server installed with ROCKS 3.0.0, I eventually get this: > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Adding support for rebuild distribution from source
  • 127.
    > Creating files(symbolic links - fast) > Creating symlinks to kickstart files > Fixing Comps Database > Generating hdlist (rpm database) > Patching second stage loader (eKV, partioning, ...) > patching "rocks-ekv" into distribution ... > patching "rocks-piece-pipe" into distribution ... > patching "PyXML" into distribution ... > patching "expat" into distribution ... > patching "rocks-pylib" into distribution ... > patching "MySQL-python" into distribution ... > patching "rocks-kickstart" into distribution ... > patching "rocks-kickstart-profiles" into distribution ... > patching "rocks-kickstart-dtds" into distribution ... > building CRAM filesystem ... > Cleaning distribution > Resolving versions (RPMs) > Resolving versions (SRPMs) > Creating symlinks to kickstart files > Generating hdlist (rpm database) > Segregating RPMs (rocks, non-rocks) > sh: ./kickstart.cgi: No such file or directory > sh: ./kickstart.cgi: No such file or directory > Traceback (innermost last): > File "/opt/rocks/bin/rocks-dist", line 807, in ? > app.run() > File "/opt/rocks/bin/rocks-dist", line 623, in run > eval('self.command_%s()' % (command)) > File "<string>", line 0, in ? > File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > builder.build() > File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > (rocks, nonrocks) =3D self.segregateRPMS() > File "/opt/rocks/lib/python/rocks/build.py", line 1107, in segregateRPMS > for pkg in ks.getSection('packages'): > TypeError: loop over non-sequence Any ideas? --=20 Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb -- __--__-- Message: 8 Cc: rocks <npaci-rocks-discussion at sdsc.edu> From: Greg Bruno <bruno at rocksclusters.org> Subject: Re: [Rocks-Discuss]one node short in "labels" Date: Wed, 10 Dec 2003 15:12:49 -0800
  • 128.
    To: Vincent Fox<vincent_b_fox at yahoo.com> > So I go to the "labels" selection on the web page to print out = the=3D20 > pretty labels. What a nice idea by the way! > =3DA0 > EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20 > 0-12.=3DA0 Any ideas where I should check to fix this? yeah, we found this corner case -- it'll be fixed in the next release. thanks for bug report. - gb -- __--__-- Message: 9 Cc: npaci-rocks-discussion at sdsc.edu From: "Mason J. Katz" <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Date: Wed, 10 Dec 2003 15:16:27 -0800 To: "V. Rowley" <vrowley at ucsd.edu> It looks like someone moved the profiles directory to profiles.orig. -mjk [root at rocks14 install]# ls -l total 56 drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20 ftp.rocksclusters.org drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20 ftp.rocksclusters.org.orig -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > When I run this: > > [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20 > rocks-dist --dist=3Dcdrom cdrom > > on a server installed with ROCKS 3.0.0, I eventually get this: > >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Adding support for rebuild distribution from source >> Creating files (symbolic links - fast)
  • 129.
    >> Creating symlinksto kickstart files >> Fixing Comps Database >> Generating hdlist (rpm database) >> Patching second stage loader (eKV, partioning, ...) >> patching "rocks-ekv" into distribution ... >> patching "rocks-piece-pipe" into distribution ... >> patching "PyXML" into distribution ... >> patching "expat" into distribution ... >> patching "rocks-pylib" into distribution ... >> patching "MySQL-python" into distribution ... >> patching "rocks-kickstart" into distribution ... >> patching "rocks-kickstart-profiles" into distribution ... >> patching "rocks-kickstart-dtds" into distribution ... >> building CRAM filesystem ... >> Cleaning distribution >> Resolving versions (RPMs) >> Resolving versions (SRPMs) >> Creating symlinks to kickstart files >> Generating hdlist (rpm database) >> Segregating RPMs (rocks, non-rocks) >> sh: ./kickstart.cgi: No such file or directory >> sh: ./kickstart.cgi: No such file or directory >> Traceback (innermost last): >> File "/opt/rocks/bin/rocks-dist", line 807, in ? >> app.run() >> File "/opt/rocks/bin/rocks-dist", line 623, in run >> eval('self.command_%s()' % (command)) >> File "<string>", line 0, in ? >> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >> builder.build() >> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >> (rocks, nonrocks) =3D self.segregateRPMS() >> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20 >> segregateRPMS >> for pkg in ks.getSection('packages'): >> TypeError: loop over non-sequence > > Any ideas? > > --=20 > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at=20 > http://www.sagacitech.com/Chinaweb -- __--__-- Message: 10 Date: Wed, 10 Dec 2003 16:50:16 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: "Mason J. Katz" <mjk at sdsc.edu> CC: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when
  • 130.
    trying to buildCD distro Yep, I did that, but only *AFTER* getting the error. [Thought it was=20 generated by the rocks-dist sequence, but apparently not.] Go ahead.=20 Move it back. Same difference. Vicky Mason J. Katz wrote: > It looks like someone moved the profiles directory to profiles.orig. >=20 > -mjk >=20 >=20 > [root at rocks14 install]# ls -l > total 56 > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20 > ftp.rocksclusters.org > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20 > ftp.rocksclusters.org.orig > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >=20 >> When I run this: >> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20 >> rocks-dist --dist=3Dcdrom cdrom >> >> on a server installed with ROCKS 3.0.0, I eventually get this: >> >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Adding support for rebuild distribution from source >>> Creating files (symbolic links - fast) >>> Creating symlinks to kickstart files >>> Fixing Comps Database >>> Generating hdlist (rpm database) >>> Patching second stage loader (eKV, partioning, ...) >>> patching "rocks-ekv" into distribution ... >>> patching "rocks-piece-pipe" into distribution ... >>> patching "PyXML" into distribution ... >>> patching "expat" into distribution ... >>> patching "rocks-pylib" into distribution ... >>> patching "MySQL-python" into distribution ... >>> patching "rocks-kickstart" into distribution ... >>> patching "rocks-kickstart-profiles" into distribution ... >>> patching "rocks-kickstart-dtds" into distribution ... >>> building CRAM filesystem ... >>> Cleaning distribution
  • 131.
    >>> Resolving versions(RPMs) >>> Resolving versions (SRPMs) >>> Creating symlinks to kickstart files >>> Generating hdlist (rpm database) >>> Segregating RPMs (rocks, non-rocks) >>> sh: ./kickstart.cgi: No such file or directory >>> sh: ./kickstart.cgi: No such file or directory >>> Traceback (innermost last): >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>> app.run() >>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>> eval('self.command_%s()' % (command)) >>> File "<string>", line 0, in ? >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>> builder.build() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>> (rocks, nonrocks) =3D self.segregateRPMS() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20 >>> segregateRPMS >>> for pkg in ks.getSection('packages'): >>> TypeError: loop over non-sequence >> >> >> Any ideas? >> >> --=20 >> Vicky Rowley email: vrowley at ucsd.edu >> Biomedical Informatics Research Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >=20 >=20 >=20 --=20 Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715 See pictures from our trip to China at http://www.sagacitech.com/Chinaweb -- __--__-- Message: 11 Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST) From: Tim Carlson <tim.carlson at pnl.gov> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro
  • 132.
    To: "V. Rowley"<vrowley at ucsd.edu> Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu Reply-to: Tim Carlson <tim.carlson at pnl.gov> On Wed, 10 Dec 2003, V. Rowley wrote: Did you remove python by chance? kickstart.cgi calls python directly in /usr/bin/python while rocks-dist does an "env python" Tim > Yep, I did that, but only *AFTER* getting the error. [Thought it was > generated by the rocks-dist sequence, but apparently not.] Go ahead. > Move it back. Same difference. > > Vicky > > Mason J. Katz wrote: > > It looks like someone moved the profiles directory to profiles.orig. > > > > -mjk > > > > > > [root at rocks14 install]# ls -l > > total 56 > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom > > drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 > > ftp.rocksclusters.org > > drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 > > ftp.rocksclusters.org.orig > > -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi > > drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist > > drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig > > drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src > > drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo > > On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: > > > >> When I run this: > >> > >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; > >> rocks-dist --dist=3Dcdrom cdrom > >> > >> on a server installed with ROCKS 3.0.0, I eventually get this: > >> > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Adding support for rebuild distribution from source > >>> Creating files (symbolic links - fast) > >>> Creating symlinks to kickstart files > >>> Fixing Comps Database > >>> Generating hdlist (rpm database) > >>> Patching second stage loader (eKV, partioning, ...) > >>> patching "rocks-ekv" into distribution ...
  • 133.
    > >>> patching "rocks-piece-pipe" into distribution ... > >>> patching "PyXML" into distribution ... > >>> patching "expat" into distribution ... > >>> patching "rocks-pylib" into distribution ... > >>> patching "MySQL-python" into distribution ... > >>> patching "rocks-kickstart" into distribution ... > >>> patching "rocks-kickstart-profiles" into distribution ... > >>> patching "rocks-kickstart-dtds" into distribution ... > >>> building CRAM filesystem ... > >>> Cleaning distribution > >>> Resolving versions (RPMs) > >>> Resolving versions (SRPMs) > >>> Creating symlinks to kickstart files > >>> Generating hdlist (rpm database) > >>> Segregating RPMs (rocks, non-rocks) > >>> sh: ./kickstart.cgi: No such file or directory > >>> sh: ./kickstart.cgi: No such file or directory > >>> Traceback (innermost last): > >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? > >>> app.run() > >>> File "/opt/rocks/bin/rocks-dist", line 623, in run > >>> eval('self.command_%s()' % (command)) > >>> File "<string>", line 0, in ? > >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom > >>> builder.build() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build > >>> (rocks, nonrocks) =3D self.segregateRPMS() > >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in > >>> segregateRPMS > >>> for pkg in ks.getSection('packages'): > >>> TypeError: loop over non-sequence > >> > >> > >> Any ideas? > >> > >> -- > >> Vicky Rowley email: vrowley at ucsd.edu > >> Biomedical Informatics Research Network work: (858) 536-5980 > >> University of California, San Diego fax: (858) 822-0828 > >> 9500 Gilman Drive > >> La Jolla, CA 92093-0715 > >> > >> > >> See pictures from our trip to China at http://www.sagacitech.com/Chinaweb > > > > > > > > -- > Vicky Rowley email: vrowley at ucsd.edu > Biomedical Informatics Research Network work: (858) 536-5980 > University of California, San Diego fax: (858) 822-0828 > 9500 Gilman Drive > La Jolla, CA 92093-0715 > > > See pictures from our trip to China at http://www.sagacitech.com/Chinaweb
  • 134.
    > > -- __--__-- _______________________________________________ npaci-rocks-discussion mailinglist npaci-rocks-discussion at sdsc.edu http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest DISCLAIMER: This email is confidential and may be privileged. If you are not the = intended recipient, please delete it and notify us immediately. Please = do not copy or use it for any purpose, or disclose its contents to any = other person as it may be an offence under the Official Secrets Act. = Thank you. --__--__-- Message: 2 Date: Wed, 10 Dec 2003 18:03:41 -0800 From: Terrence Martin <tmartin at physics.ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]Rocks 3.0.0 I am having a problem on install of rocks 3.0.0 on my new cluster. The python error occurs right after anaconda starts and just before the install asks for the roll CDROM. The error refers to an inability to find or load rocks.file. The error is associated I think with the window that pops up and asks you in put the roll CDROM in. The process I followed to get to this point is Put the Rocks 3.0.0 CDROM into the CDROM drive Boot the system At the prompt type frontend Wait till anaconda starts Error referring to unable to load rocks.file. I have successfully installed rocks on a smaller cluster but that has different hardware. I used the same CDROM for both installs. Any thoughts? Terrence --__--__--
  • 135.
    Message: 3 Date: Wed,10 Dec 2003 19:52:49 -0800 From: "V. Rowley" <vrowley at ucsd.edu> To: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when trying to build CD distro Looks like python is okay: > [root at rocks14 birn-oracle1]# which python > /usr/bin/python > [root at rocks14 birn-oracle1]# python --help > Unknown option: -- > usage: python [option] ... [-c cmd | file | -] [arg] ... > Options and arguments (and corresponding environment variables): > -d : debug output from parser (also PYTHONDEBUG=x) > -i : inspect interactively after running script, (also PYTHONINSPECT=x) > and force prompts, even if stdin does not appear to be a terminal > -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x) > -OO : remove doc-strings in addition to the -O optimizations > -S : don't imply 'import site' on initialization > -t : issue warnings about inconsistent tab usage (-tt: issue errors) > -u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x) > -v : verbose (trace import statements) (also PYTHONVERBOSE=x) > -x : skip first line of source, allowing use of non-Unix forms of #!cmd > -X : disable class based built-in exceptions > -c cmd : program passed in as string (terminates option list) > file : program read from script file > - : program read from stdin (default; interactive mode if a tty) > arg ...: arguments passed to program in sys.argv[1:] > Other environment variables: > PYTHONSTARTUP: file executed on interactive startup (no default) > PYTHONPATH : ':'-separated list of directories prefixed to the > default module search path. The result is sys.path. > PYTHONHOME : alternate <prefix> directory (or <prefix>:<exec_prefix>). > The default module search path uses <prefix>/python1.5. > [root at rocks14 birn-oracle1]# Tim Carlson wrote: > On Wed, 10 Dec 2003, V. Rowley wrote: > > Did you remove python by chance? kickstart.cgi calls python directly in > /usr/bin/python while rocks-dist does an "env python" > > Tim > > >>Yep, I did that, but only *AFTER* getting the error. [Thought it was >>generated by the rocks-dist sequence, but apparently not.] Go ahead.
  • 136.
    >>Move it back.Same difference. >> >>Vicky >> >>Mason J. Katz wrote: >> >>>It looks like someone moved the profiles directory to profiles.orig. >>> >>> -mjk >>> >>> >>>[root at rocks14 install]# ls -l >>>total 56 >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>>drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 >>>ftp.rocksclusters.org >>>drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 >>>ftp.rocksclusters.org.orig >>>-r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi >>>drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>>drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 rocks-dist.orig >>>drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>>drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>>On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >>> >>> >>>>When I run this: >>>> >>>>[root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >>>>rocks-dist --dist=cdrom cdrom >>>> >>>>on a server installed with ROCKS 3.0.0, I eventually get this: >>>> >>>> >>>>>Cleaning distribution >>>>>Resolving versions (RPMs) >>>>>Resolving versions (SRPMs) >>>>>Adding support for rebuild distribution from source >>>>>Creating files (symbolic links - fast) >>>>>Creating symlinks to kickstart files >>>>>Fixing Comps Database >>>>>Generating hdlist (rpm database) >>>>>Patching second stage loader (eKV, partioning, ...) >>>>> patching "rocks-ekv" into distribution ... >>>>> patching "rocks-piece-pipe" into distribution ... >>>>> patching "PyXML" into distribution ... >>>>> patching "expat" into distribution ... >>>>> patching "rocks-pylib" into distribution ... >>>>> patching "MySQL-python" into distribution ... >>>>> patching "rocks-kickstart" into distribution ... >>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>> building CRAM filesystem ... >>>>>Cleaning distribution >>>>>Resolving versions (RPMs) >>>>>Resolving versions (SRPMs)
  • 137.
    >>>>>Creating symlinks tokickstart files >>>>>Generating hdlist (rpm database) >>>>>Segregating RPMs (rocks, non-rocks) >>>>>sh: ./kickstart.cgi: No such file or directory >>>>>sh: ./kickstart.cgi: No such file or directory >>>>>Traceback (innermost last): >>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>> app.run() >>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>> eval('self.command_%s()' % (command)) >>>>> File "<string>", line 0, in ? >>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>> builder.build() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>> (rocks, nonrocks) = self.segregateRPMS() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>>>>segregateRPMS >>>>> for pkg in ks.getSection('packages'): >>>>>TypeError: loop over non-sequence >>>> >>>> >>>>Any ideas? >>>> >>>>-- >>>>Vicky Rowley email: vrowley at ucsd.edu >>>>Biomedical Informatics Research Network work: (858) 536-5980 >>>>University of California, San Diego fax: (858) 822-0828 >>>>9500 Gilman Drive >>>>La Jolla, CA 92093-0715 >>>> >>>> >>>>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >>> >>> >>> >>-- >>Vicky Rowley email: vrowley at ucsd.edu >>Biomedical Informatics Research Network work: (858) 536-5980 >>University of California, San Diego fax: (858) 822-0828 >>9500 Gilman Drive >>La Jolla, CA 92093-0715 >> >> >>See pictures from our trip to China at http://www.sagacitech.com/Chinaweb >> >> > > > > -- Vicky Rowley email: vrowley at ucsd.edu Biomedical Informatics Research Network work: (858) 536-5980 University of California, San Diego fax: (858) 822-0828 9500 Gilman Drive La Jolla, CA 92093-0715
  • 138.
    See pictures fromour trip to China at http://www.sagacitech.com/Chinaweb --__--__-- _______________________________________________ npaci-rocks-discussion mailing list npaci-rocks-discussion at sdsc.edu http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest DISCLAIMER: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person as it may be an offence under the Official Secrets Act. Thank you. From wyzhong78 at msn.com Thu Dec 11 07:27:39 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Thu, 11 Dec 2003 23:27:39 +0800 Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node? Message-ID: <BAY3-F25UBUhr3ukkwu000156fe@hotmail.com> I have build a rocks cluster with four double Xeon computer to run namd.one frontend and the other three to be compute.with intel's hyper threading tecnology i have 16 cpus at all. now I have some troubles. Maybe someone can help me. I created bellow pbs script named mytask. #!/bin/csh #PBS -N NAMD #PBS -m be #PBS -l ncpus=8 #PBS -l nodes=2 # cd $PBS_O_WORKDIR /charmrun namd2 +p8 mytask.namd i typed: qsub mytask qrun N then i use qstat -f N the message feedback showed(i'm sorry i can't copy the orgin message,just the meaning) host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1 cpu used: 8 it's strange why 4 hosts and 8 cpu used?
  • 139.
    but when isaw ganlia, the cluster status. it show me only one node used (fore example ,compute-0-0).both the other two are idle. i want to know whether the job was doing by one or two node. so i creat a new task specify to compute-0-1,message feedback show no resource availabe. while the task ended,i checked the information, found that the cpu time per step is half of 4 cpus (1 nodes),but the whole time(include wall time) is equal. Does my namd job allocate to each node? please help me! thanks _________________________________________________________________ ?????????????? MSN Messenger: http://messenger.msn.com/cn From bruno at rocksclusters.org Thu Dec 11 07:55:17 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Thu, 11 Dec 2003 07:55:17 -0800 Subject: [Rocks-Discuss]ATLAS rpm build problems on PII platform In-Reply-To: <20031211064321.41781.qmail@web14801.mail.yahoo.com> References: <20031211064321.41781.qmail@web14801.mail.yahoo.com> Message-ID: <6A67C95F-2BF2-11D8-B821-000A95C4E3B4@rocksclusters.org> outstanding -- thanks for the patch! i just committed the change to cvs. the fix will be reflected in the upcoming release (or immediately for anyone who has the rocks source tree checked out on their local frontend). - gb On Dec 10, 2003, at 10:43 PM, Vincent Fox wrote: > Okay, here's the?context diff?as plain text. I test-applied it using > "patch -p0 < atlas.patch" and did a compile on my PII box > successfully. I can send it as attachment or submit to CVS or some > other way if you need: > ? > *** atlas.spec.in.orig? Thu Dec 11 06:27:13 2003 > --- atlas.spec.in?????? Thu Dec 11 06:30:46 2003 > *************** > *** 111,117 **** > --- 111,133 ---- > ? y > ? " | make > + elif [ $CPUID -eq 4 ] > + then > + # > + # Pentium II > + # > + echo "0 > + y > + y > + n > + y > + linux
  • 140.
    > + 0 > + /usr/bin/g77 > + -O > + y > + " | make > ? else > ? # > > > Greg Bruno <bruno at rocksclusters.org>wrote: > > Okay, came up my own quick hack: > > > > Edit atlas.spec.in, go to "other x86" section, remove > > 2 lines right above "linux", seems to make rpm now. > > > > A more formal patch would be put in a section for > > cpuid eq 4 with this correction I suppose. > > if you provide the patch, we'll include it in our next release. > > - gb > > Do you Yahoo!? > New Yahoo! Photos - easier uploading and sharing From phil at sdsc.edu Thu Dec 11 08:00:06 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Thu, 11 Dec 2003 12:00:06 -0400 Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node? Message-ID: <1920451470-1071158479-cardhu_blackberry.rim.net-21416-@engine05> The important thing to understand is the pbs only gives an allocation of nodes (listed in the PBS_NODES environment variable) when the job is run. It is the user's responsibility to actually start the code on multiple nodes. This is the way pbs works on all platforms, not just rocks. Pbs will start the submitted code (usually a script) on the first node listed in PBS_NODES. This environment variable is only available once the queued job is running. Your mytask script must explicitly start on the allocated nodes. Pbs (actually maui) will pack jobs onto nodes by default, so allocating 8 cpu jobs to four nodes is normal, but changable. -p -----Original Message----- From: "zhong wenyu" <wyzhong78 at msn.com> Date: Thu, 11 Dec 2003 23:27:39 To:npaci-rocks-discussion at sdsc.edu Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node? I have build a rocks cluster with four double Xeon computer to run namd.one frontend and the other three to be compute.with intel's hyper threading tecnology i have 16 cpus at all. now I have some troubles. Maybe someone can help me. I created bellow pbs script named mytask. #!/bin/csh #PBS -N NAMD
  • 141.
    #PBS -m be #PBS-l ncpus=8 #PBS -l nodes=2 # cd $PBS_O_WORKDIR /charmrun namd2 +p8 mytask.namd i typed: qsub mytask qrun N then i use qstat -f N the message feedback showed(i'm sorry i can't copy the orgin message,just the meaning) host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1 cpu used: 8 it's strange why 4 hosts and 8 cpu used? but when i saw ganlia, the cluster status. it show me only one node used (fore example ,compute-0-0).both the other two are idle. i want to know whether the job was doing by one or two node. so i creat a new task specify to compute-0-1,message feedback show no resource availabe. while the task ended,i checked the information, found that the cpu time per step is half of 4 cpus (1 nodes),but the whole time(include wall time) is equal. Does my namd job allocate to each node? please help me! thanks _________________________________________________________________ ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn Sent via BlackBerry - a service from AT&T Wireless. From jlkaiser at fnal.gov Thu Dec 11 08:28:08 2003 From: jlkaiser at fnal.gov (Joe Kaiser) Date: Thu, 11 Dec 2003 10:28:08 -0600 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <1071007177.18100.58.camel@squash.scalableinformatics.com> References: <1071007177.18100.58.camel@squash.scalableinformatics.com> Message-ID: <1071160088.18486.25.camel@nietzsche.fnal.gov> Hi, I'm sorry, I thought I sent email to the list reporting how I did this. You have not said what motherboard you are using or what the error exactly is. The instructions below are for the X5DPA-GG and the error isn't reported as an error, I just get prompted to insert my driver. If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to make a change to the pcitable on the initrd.img. The current pcitable on the initrd.img does NOT have the proper deviceId for the e1000 for this board. If you look in /etc/sysconfig/hwconf and search for the
  • 142.
    e1000, you willfind this: class: NETWORK bus: PCI detached: 0 device: eth driver: e1000 desc: "Unknown vendor|Generic e1000 device" vendorId: 8086 deviceId: 1013 subVendorId: 8086 subDeviceId: 1213 pciType: 1 The device ID is 1013. If you look in the pcitable that comes off of the initrd.img you will see that the highest the e1000 device id's go is 1012. Just add in the proper line to the initrd.img in your /tftpboot directory and it should work. Instructions are below. Here are the instructions: This should be done on the frontend: cd /tftpboot/X86PC/UNDI/pxelinux/ cp initrd.img initrd.img.orig cp initrd.img /tmp cd /tmp mv initrd.img initrd.gz gunzip initrd.gz mkdir /mnt/loop mount -o loop initrd /mnt/loop cd /mnt/loop/modules/ vi pcitable Search for the e1000 drivers and add the following line: 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet Controller" write the file cd /tmp umount /mnt/loop gzip initrd mv initrd.gz initrd.img mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/ Then boot the node. Hope this helps. Thanks, Joe On Tue, 2003-12-09 at 15:59, Joe Landman wrote: > Folks: > > As indicated previously, I am wrestling with a Supermicro based
  • 143.
    > cluster. Noneof the RH distributions come with the correct E1000 > driver, so a new kernel is needed (in the boot CD, and for > installation). > > The problem I am running into is that it isn't at all obvious/easy how > to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable > this thing to work. Following the examples in the documentation have > not met with success. Running "rocks-dist cdrom" with the new kernels > (2.4.23 works nicely on the nodes) in the force/RPMS directory generates > a bootable CD with the original 2.4.18BOOT kernel. > > What I (and I think others) need, is a simple/easy to follow method > that will generate a bootable CD with the correct linux kernel, and the > correct modules. > > Is this in process somewhere? What would be tremendously helpful is > if we can generate a binary module, and put that into the boot process > by placing it into the force/modules/binary directory (assuming one > exists) with the appropriate entry of a similar name in the > force/modules/meta directory as a simple XML document giving pci-ids, > description, name, etc. > > Anything close to this coming? Modules are killing future ROCKS > installs, the inability to easily inject a new module in there has > created a problem whereby ROCKS does not function (as the underlying RH > does not function). > > > -- =================================================================== Joe Kaiser - Systems Administrator Fermi Lab CD/OSS-SCS Never laugh at live dragons. 630-840-6444 jlkaiser at fnal.gov =================================================================== From jghobrial at uh.edu Thu Dec 11 08:41:42 2003 From: jghobrial at uh.edu (Joseph) Date: Thu, 11 Dec 2003 10:41:42 -0600 (CST) Subject: [Rocks-Discuss]Re: Rocks Pythone Error with rocks.file In-Reply-To: <3FD82F68.9070600@physics.ucsd.edu> References: <3FD82F68.9070600@physics.ucsd.edu> Message-ID: <Pine.LNX.4.56.0312111001150.9106@mail.tlc2.uh.edu> On Thu, 11 Dec 2003, Terrence Martin wrote: > I am having the exact same error that you reported to the list on my > cluster when I try to install rocks 3.0.0. > > X tries to start, fails, then just before the HPC roll is supposed to > start I get the python error about not being able to load the rocks.file. > > The thing is that my system is a dual Xeon supermicro not AMD, so it > must not be an AMD specific issue.
  • 144.
    > > Did youever find a resolution to the problem? > > Thanks, > > Terrence > Yes, I guess you should check your memory as Greg suggests, but my solution was to install the frontend on a different machine and then take the HD back to the original frontend. The only problem that I had was that the build box was a single processor setup so when I went back to the dual-AMD pvfs fails because it was built against a non-SMP kernel. I installed the SMP kernel and noticed this problem. It seems the problem may be related to an SMP issue do to the fact we both have an SMP setup. I did not check the frontend's memory so this may still be a factor, but I have had no trouble with the box after the installation. My initial problem was a booting problem on the frontend due to a cdrom issue. All my other attempts at installing failed with the error you mentioned, but as I posted early I tried 3 different AMD single processor boxes and they failed. The boxes are up all the time and stressed pretty hard so I don't believe it is a memory issue. This is some very strange behaviour. Thanks, Joseph From shewa at inel.gov Thu Dec 11 10:02:59 2003 From: shewa at inel.gov (Andrew Shewmaker) Date: Thu, 11 Dec 2003 11:02:59 -0700 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia Message-ID: <3FD8B153.6000205@inel.gov> "Mason J. Katz" <mjk at sdsc.edu> wrote: > We've also moved from this method to a single cluster-wide ssh key for > Rocks 3.1. How does a single key work? I have successfully set up ssh host based authentication for some non-Rocks systems using http://www.omega.telia.net/vici/openssh/ (Note that OpenSSH_3.7.1p2 requires one more setting in addition to those mentioned in the above url. In <dir-of-ssh-conf-files>/ssh_config: EnableSSHKeysign yes) But I thought it still requires that each host in the has a key... am I wrong? Do you do it differently? Thanks,
  • 145.
    Andrew -- Andrew Shewmaker, AssociateEngineer Phone: 1-208-526-1415 Idaho National Eng. and Environmental Lab. P.0. Box 1625, M.S. 3605 Idaho Falls, Idaho 83415-3605 From tmartin at physics.ucsd.edu Thu Dec 11 11:13:16 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Thu, 11 Dec 2003 11:13:16 -0800 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <1071160088.18486.25.camel@nietzsche.fnal.gov> References: <1071007177.18100.58.camel@squash.scalableinformatics.com> <1071160088.18486.25.camel@nietzsche.fnal.gov> Message-ID: <3FD8C1CC.20700@physics.ucsd.edu> Hi Joe, Do you know if 2.3.2 can also benefit from the same small change? Terrence Joe Kaiser wrote: > Hi, > > I'm sorry, I thought I sent email to the list reporting how I did this. > > You have not said what motherboard you are using or what the error > exactly is. The instructions below are for the X5DPA-GG and the error > isn't reported as an error, I just get prompted to insert my driver. > > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to > make a change to the pcitable on the initrd.img. The current pcitable > on the initrd.img does NOT have the proper deviceId for the e1000 for > this board. If you look in /etc/sysconfig/hwconf and search for the > e1000, you will find this: > > class: NETWORK > bus: PCI > detached: 0 > device: eth > driver: e1000 > desc: "Unknown vendor|Generic e1000 device" > vendorId: 8086 > deviceId: 1013 > subVendorId: 8086 > subDeviceId: 1213 > pciType: 1 > > The device ID is 1013. If you look in the pcitable that comes off of > the initrd.img you will see that the highest the e1000 device id's go is > 1012. Just add in the proper line to the initrd.img in your /tftpboot > directory and it should work. Instructions are below.
  • 146.
    > > Here arethe instructions: > > This should be done on the frontend: > > cd /tftpboot/X86PC/UNDI/pxelinux/ > cp initrd.img initrd.img.orig > cp initrd.img /tmp > cd /tmp > mv initrd.img initrd.gz > gunzip initrd.gz > mkdir /mnt/loop > mount -o loop initrd /mnt/loop > cd /mnt/loop/modules/ > vi pcitable > > Search for the e1000 drivers and add the following line: > > 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet > Controller" > > write the file > > cd /tmp > umount /mnt/loop > gzip initrd > mv initrd.gz initrd.img > mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/ > > Then boot the node. > > Hope this helps. > > Thanks, > > Joe > > On Tue, 2003-12-09 at 15:59, Joe Landman wrote: > >>Folks: >> >> As indicated previously, I am wrestling with a Supermicro based >>cluster. None of the RH distributions come with the correct E1000 >>driver, so a new kernel is needed (in the boot CD, and for >>installation). >> >> The problem I am running into is that it isn't at all obvious/easy how >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable >>this thing to work. Following the examples in the documentation have >>not met with success. Running "rocks-dist cdrom" with the new kernels >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates >>a bootable CD with the original 2.4.18BOOT kernel. >> >> What I (and I think others) need, is a simple/easy to follow method >>that will generate a bootable CD with the correct linux kernel, and the >>correct modules. >> >> Is this in process somewhere? What would be tremendously helpful is >>if we can generate a binary module, and put that into the boot process
  • 147.
    >>by placing itinto the force/modules/binary directory (assuming one >>exists) with the appropriate entry of a similar name in the >>force/modules/meta directory as a simple XML document giving pci-ids, >>description, name, etc. >> >> Anything close to this coming? Modules are killing future ROCKS >>installs, the inability to easily inject a new module in there has >>created a problem whereby ROCKS does not function (as the underlying RH >>does not function). >> >> >> From tmartin at physics.ucsd.edu Thu Dec 11 11:19:55 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Thu, 11 Dec 2003 11:19:55 -0800 Subject: [Rocks-Discuss]Re: Rocks Pythone Error with rocks.file In-Reply-To: <Pine.LNX.4.56.0312111001150.9106@mail.tlc2.uh.edu> References: <3FD82F68.9070600@physics.ucsd.edu> <Pine.LNX.4.56.0312111001150.9106@mail.tlc2.uh.edu> Message-ID: <3FD8C35B.2090309@physics.ucsd.edu> I am fairly certain it is not the memory even without memtest86. I have in my office the same Supermicro 613A-Xi (SB-613A-Xi-B) with a SUPER X5DPA-GG motherboard as the ones at the SDSC but it is from a different vendor and completely different ram from another manufacturer. When I put rocks 3.0.0 into it I get the crash of the installer in the same spot, right after the system attempts to start Xwindows and fails (either it fails because it just fails to start X or if a mouse is not present) a python error comes up complaining that the rocks.file could not be found. On the exact same system rocks 2.3.2 installs fine. Terrence Joseph wrote: > On Thu, 11 Dec 2003, Terrence Martin wrote: > > >>I am having the exact same error that you reported to the list on my >>cluster when I try to install rocks 3.0.0. >> >>X tries to start, fails, then just before the HPC roll is supposed to >>start I get the python error about not being able to load the rocks.file. >> >>The thing is that my system is a dual Xeon supermicro not AMD, so it >>must not be an AMD specific issue. >> >>Did you ever find a resolution to the problem? >> >>Thanks, >> >>Terrence >>
  • 148.
    > > > Yes, Iguess you should check your memory as Greg suggests, but my > solution was to install the frontend on a different machine and then take > the HD back to the original frontend. The only problem that I had was that > the build box was a single processor setup so when I went back to the > dual-AMD pvfs fails because it was built against a non-SMP kernel. > I installed the SMP kernel and noticed this problem. > > It seems the problem may be related to an SMP issue do to the fact we both > have an SMP setup. I did not check the frontend's memory so this may still > be a factor, but I have had no trouble with the box after the installation. > > My initial problem was a booting problem on the frontend due to a cdrom > issue. All my other attempts at installing failed with the error you mentioned, but as I > posted early I tried 3 different AMD single processor boxes and they > failed. The boxes are up all the time and stressed pretty hard so I don't > believe it is a memory issue. > > This is some very strange behaviour. > > Thanks, > Joseph > From landman at scalableinformatics.com Thu Dec 11 11:42:14 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 11 Dec 2003 14:42:14 -0500 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <3FD8C1CC.20700@physics.ucsd.edu> References: <1071007177.18100.58.camel@squash.scalableinformatics.com> <1071160088.18486.25.camel@nietzsche.fnal.gov> <3FD8C1CC.20700@physics.ucsd.edu> Message-ID: <1071171734.6164.12.camel@squash.scalableinformatics.com> Hi Terrence and Joe: These are indeed X5DPA-GG. I am working on a device driver disk for 3.0 ROCKS. If this works, it is a weak hack, but it might be fine. More later (testing it now as we speak).. Joe On Thu, 2003-12-11 at 14:13, Terrence Martin wrote: > Hi Joe, > > Do you know if 2.3.2 can also benefit from the same small change? > > Terrence > > Joe Kaiser wrote: > > Hi,
  • 149.
    > > > > I'm sorry, I thought I sent email to the list reporting how I did this. > > > > You have not said what motherboard you are using or what the error > > exactly is. The instructions below are for the X5DPA-GG and the error > > isn't reported as an error, I just get prompted to insert my driver. > > > > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to > > make a change to the pcitable on the initrd.img. The current pcitable > > on the initrd.img does NOT have the proper deviceId for the e1000 for > > this board. If you look in /etc/sysconfig/hwconf and search for the > > e1000, you will find this: > > > > class: NETWORK > > bus: PCI > > detached: 0 > > device: eth > > driver: e1000 > > desc: "Unknown vendor|Generic e1000 device" > > vendorId: 8086 > > deviceId: 1013 > > subVendorId: 8086 > > subDeviceId: 1213 > > pciType: 1 > > > > The device ID is 1013. If you look in the pcitable that comes off of > > the initrd.img you will see that the highest the e1000 device id's go is > > 1012. Just add in the proper line to the initrd.img in your /tftpboot > > directory and it should work. Instructions are below. > > > > Here are the instructions: > > > > This should be done on the frontend: > > > > cd /tftpboot/X86PC/UNDI/pxelinux/ > > cp initrd.img initrd.img.orig > > cp initrd.img /tmp > > cd /tmp > > mv initrd.img initrd.gz > > gunzip initrd.gz > > mkdir /mnt/loop > > mount -o loop initrd /mnt/loop > > cd /mnt/loop/modules/ > > vi pcitable > > > > Search for the e1000 drivers and add the following line: > > > > 0x8086 0x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet > > Controller" > > > > write the file > > > > cd /tmp > > umount /mnt/loop > > gzip initrd > > mv initrd.gz initrd.img > > mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/ > > > > Then boot the node.
  • 150.
    > > > >Hope this helps. > > > > Thanks, > > > > Joe > > > > On Tue, 2003-12-09 at 15:59, Joe Landman wrote: > > > >>Folks: > >> > >> As indicated previously, I am wrestling with a Supermicro based > >>cluster. None of the RH distributions come with the correct E1000 > >>driver, so a new kernel is needed (in the boot CD, and for > >>installation). > >> > >> The problem I am running into is that it isn't at all obvious/easy how > >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable > >>this thing to work. Following the examples in the documentation have > >>not met with success. Running "rocks-dist cdrom" with the new kernels > >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates > >>a bootable CD with the original 2.4.18BOOT kernel. > >> > >> What I (and I think others) need, is a simple/easy to follow method > >>that will generate a bootable CD with the correct linux kernel, and the > >>correct modules. > >> > >> Is this in process somewhere? What would be tremendously helpful is > >>if we can generate a binary module, and put that into the boot process > >>by placing it into the force/modules/binary directory (assuming one > >>exists) with the appropriate entry of a similar name in the > >>force/modules/meta directory as a simple XML document giving pci-ids, > >>description, name, etc. > >> > >> Anything close to this coming? Modules are killing future ROCKS > >>installs, the inability to easily inject a new module in there has > >>created a problem whereby ROCKS does not function (as the underlying RH > >>does not function). > >> > >> > >> -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From jlkaiser at fnal.gov Thu Dec 11 11:33:03 2003 From: jlkaiser at fnal.gov (Joe Kaiser) Date: Thu, 11 Dec 2003 13:33:03 -0600 Subject: [Rocks-Discuss]a name for pain ... modules/kernels/ethernets ... In-Reply-To: <3FD8C1CC.20700@physics.ucsd.edu> References: <1071007177.18100.58.camel@squash.scalableinformatics.com> <1071160088.18486.25.camel@nietzsche.fnal.gov> <3FD8C1CC.20700@physics.ucsd.edu> Message-ID: <1071171183.18486.28.camel@nietzsche.fnal.gov>
  • 151.
    I am notsure. Presumably, yes.... On Thu, 2003-12-11 at 13:13, Terrence Martin wrote: > Hi Joe, > > Do you know if 2.3.2 can also benefit from the same small change? > > Terrence > > Joe Kaiser wrote: > > Hi, > > > > I'm sorry, I thought I sent email to the list reporting how I did this. > > > > You have not said what motherboard you are using or what the error > > exactly is. The instructions below are for the X5DPA-GG and the error > > isn't reported as an error, I just get prompted to insert my driver. > > > > If it is the X5DPA-GG then 3.0.0 will support the e1000 but you have to > > make a change to the pcitable on the initrd.img. The current pcitable > > on the initrd.img does NOT have the proper deviceId for the e1000 for > > this board. If you look in /etc/sysconfig/hwconf and search for the > > e1000, you will find this: > > > > class: NETWORK > > bus: PCI > > detached: 0 > > device: eth > > driver: e1000 > > desc: "Unknown vendor|Generic e1000 device" > > vendorId: 8086 > > deviceId: 1013 > > subVendorId: 8086 > > subDeviceId: 1213 > > pciType: 1 > > > > The device ID is 1013. If you look in the pcitable that comes off of > > the initrd.img you will see that the highest the e1000 device id's go is > > 1012. Just add in the proper line to the initrd.img in your /tftpboot > > directory and it should work. Instructions are below. > > > > Here are the instructions: > > > > This should be done on the frontend: > > > > cd /tftpboot/X86PC/UNDI/pxelinux/ > > cp initrd.img initrd.img.orig > > cp initrd.img /tmp > > cd /tmp > > mv initrd.img initrd.gz > > gunzip initrd.gz > > mkdir /mnt/loop > > mount -o loop initrd /mnt/loop > > cd /mnt/loop/modules/ > > vi pcitable > > > > Search for the e1000 drivers and add the following line: > >
  • 152.
    > > 0x80860x1013 "e1000" "Intel Corp.|82546EB Gigabit Ethernet > > Controller" > > > > write the file > > > > cd /tmp > > umount /mnt/loop > > gzip initrd > > mv initrd.gz initrd.img > > mv initrd.img /tftpboot/X86PC/UNDI/pxelinux/ > > > > Then boot the node. > > > > Hope this helps. > > > > Thanks, > > > > Joe > > > > On Tue, 2003-12-09 at 15:59, Joe Landman wrote: > > > >>Folks: > >> > >> As indicated previously, I am wrestling with a Supermicro based > >>cluster. None of the RH distributions come with the correct E1000 > >>driver, so a new kernel is needed (in the boot CD, and for > >>installation). > >> > >> The problem I am running into is that it isn't at all obvious/easy how > >>to install a new kernel/modules into ROCKS (3.0 or otherwise) to enable > >>this thing to work. Following the examples in the documentation have > >>not met with success. Running "rocks-dist cdrom" with the new kernels > >>(2.4.23 works nicely on the nodes) in the force/RPMS directory generates > >>a bootable CD with the original 2.4.18BOOT kernel. > >> > >> What I (and I think others) need, is a simple/easy to follow method > >>that will generate a bootable CD with the correct linux kernel, and the > >>correct modules. > >> > >> Is this in process somewhere? What would be tremendously helpful is > >>if we can generate a binary module, and put that into the boot process > >>by placing it into the force/modules/binary directory (assuming one > >>exists) with the appropriate entry of a similar name in the > >>force/modules/meta directory as a simple XML document giving pci-ids, > >>description, name, etc. > >> > >> Anything close to this coming? Modules are killing future ROCKS > >>installs, the inability to easily inject a new module in there has > >>created a problem whereby ROCKS does not function (as the underlying RH > >>does not function). > >> > >> > >> -- =================================================================== Joe Kaiser - Systems Administrator Fermi Lab CD/OSS-SCS Never laugh at live dragons.
  • 153.
    630-840-6444 jlkaiser at fnal.gov =================================================================== Fromlandman at scalableinformatics.com Thu Dec 11 11:51:51 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 11 Dec 2003 14:51:51 -0500 Subject: [Rocks-Discuss]driver disk for e1000 for rocks 3.0.0 Message-ID: <1071172311.6164.18.camel@squash.scalableinformatics.com> Folks: I have built a slightly modified RedHat 7.3 driver disk with the updated 5.2.22 e1000 driver. I verified that this does indeed work on my systems (during initial portion of ROCKS install, I can now insmod e1000 in the shell window and see the ethernet... this is a big change from before). If you want the driver disk grab it from http://scalableinformatics.com/downloads/newdrv.img . To use it while installing a front end, type frontend dd at the boot prompt (not just frontend). I believe it should work for the compute nodes as well (i will test it soon). Now it is time to work around the rest of the Supermicro "features". -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From dtwright at uiuc.edu Thu Dec 11 12:32:54 2003 From: dtwright at uiuc.edu (Dan Wright) Date: Thu, 11 Dec 2003 14:32:54 -0600 Subject: [Rocks-Discuss]3.0.0 problem:Does my namd job allocate to each node? In-Reply-To: <BAY3-F25UBUhr3ukkwu000156fe@hotmail.com> References: <BAY3-F25UBUhr3ukkwu000156fe@hotmail.com> Message-ID: <20031211203254.GP6476@uiuc.edu> NAMD2 needs some more information to be started on multiple nodes like that. You need to give it a nodelist, in particular, so it knows where to run itself. We run namd2 on several clusters here (UIUC chemistry department). Below is a script used to exec namd2 with the right options, etc, on a cluster. Below that is a script that automates the PBS job submission. Hope this helps! - Dan Wright (dtwright at uiuc.edu) (http://www.scs.uiuc.edu/) (UNIX Systems Administrator, School of Chemical Sciences) (333-1728)
  • 154.
    -- namd2.csh -- #!/bin/csh #Script to run NAMD2 on the cluster automatically. # Courtesy of Jim Phillips. setenv CONV_RSH ssh setenv TMPDIR /tmp setenv BINDIR /home/NAMD if ( $?PBS_JOBID ) then if ( $?PBS_NODEFILE ) then set nodes = `cat $PBS_NODEFILE` else set nodes = localhost endif set nodefile = $TMPDIR/namd2.nodelist.$PBS_JOBID echo group main >! $nodefile foreach node ( $nodes ) echo host $node >> $nodefile end $BINDIR/charmrun $BINDIR/namd2 +p$#nodes ++nodelist $nodefile $* else $BINDIR/charmrun $BINDIR/namd2 ++local $* endif ------------- Here's an example script using this to start namd2 on 8 uniprocessor nodes; you'd just run it as "namd2-8p <jobfile>" to automatically do the PBS job submission and everything. -- namd2-8p -- #!/bin/bash # This script runs namd2 on 8 nodes. # echo echo "Please remember to specify the FULL PATH to your namd2 job file." echo "If you haven't done that, please press ctrl-c now and re-run" echo "this command with the full path." echo sleep 10 export SCRIPTFILE=/tmp/namd2-script.$USER.`date "+%s"` export NAMD_SCRIPT=/usr/local/bin/namd2.csh NAMD_CMD="$NAMD_SCRIPT $* > $HOME/namd2.out.`date '+%d%b%Y-%H:%M:%S'` 2>&1" cat >$SCRIPTFILE <<EOF #!/bin/bash #PBS -l nodes=8 EOF echo $NAMD_CMD >> $SCRIPTFILE echo "exit" >> $SCRIPTFILE /usr/apps/pbs/bin/qsub -V $SCRIPTFILE
  • 155.
    sleep 5 rm -f$SCRIPTFILE -------------- zhong wenyu said: > I have build a rocks cluster with four double Xeon computer to run namd.one > frontend and the other three to be compute.with intel's hyper threading > tecnology i have 16 cpus at all. > now I have some troubles. Maybe someone can help me. > I created bellow pbs script named mytask. > #!/bin/csh > #PBS -N NAMD > #PBS -m be > #PBS -l ncpus=8 > #PBS -l nodes=2 > # > cd $PBS_O_WORKDIR > /charmrun namd2 +p8 mytask.namd > > i typed: > qsub mytask > qrun N > > then i use > qstat -f N > > the message feedback showed(i'm sorry i can't copy the orgin message,just > the meaning) > > host: compute-0-0/0+compute-0-0/1+compute-0-1/0+compute-0-1/1 > cpu used: 8 > > it's strange why 4 hosts and 8 cpu used? > but when i saw ganlia, the cluster status. it show me only one node used > (fore example ,compute-0-0).both the other two are idle. > i want to know whether the job was doing by one or two node. > so i creat a new task specify to compute-0-1,message feedback show no > resource availabe. > while the task ended,i checked the information, found that the cpu time per > step is half of 4 cpus (1 nodes),but the whole time(include wall time) is > equal. > Does my namd job allocate to each node? > please help me! > thanks > > _________________________________________________________________ > ???????????????????????????? MSN Messenger: http://messenger.msn.com/cn > - Dan Wright (dtwright at uiuc.edu) (http://www.uiuc.edu/~dtwright) -] ------------------------------ [-] -------------------------------- [- ``Weave a circle round him thrice, / And close your eyes with holy dread, For he on honeydew hath fed, / and drunk the milk of Paradise.'' Samuel Taylor Coleridge, Kubla Khan
  • 156.
    -------------- next part-------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031211/417e39b4/attachment-0001.bin From mjk at sdsc.edu Thu Dec 11 13:16:45 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Thu, 11 Dec 2003 13:16:45 -0800 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia In-Reply-To: <3FD8B153.6000205@inel.gov> References: <3FD8B153.6000205@inel.gov> Message-ID: <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu> Download 3.1 (out very soon now) and poke around. Basically there is a single SSH host key, and all the nodes have a copy. This kills the "man in the middle" warning every time you reinstall. -mjk On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote: > "Mason J. Katz" <mjk at sdsc.edu> wrote: > > > We've also moved from this method to a single cluster-wide ssh key > for > > Rocks 3.1. > > How does a single key work? I have successfully set up ssh host > based authentication for some non-Rocks systems using > > http://www.omega.telia.net/vici/openssh/ > > (Note that OpenSSH_3.7.1p2 requires one more setting in addition > to those mentioned in the above url. > > In <dir-of-ssh-conf-files>/ssh_config: > EnableSSHKeysign yes) > > But I thought it still requires that each host in the has a key... > am I wrong? Do you do it differently? > > Thanks, > > Andrew > > -- > Andrew Shewmaker, Associate Engineer > Phone: 1-208-526-1415 > Idaho National Eng. and Environmental Lab. > P.0. Box 1625, M.S. 3605 > Idaho Falls, Idaho 83415-3605 From landman at scalableinformatics.com Thu Dec 11 13:36:44 2003
  • 157.
    From: landman atscalableinformatics.com (Joe Landman) Date: Thu, 11 Dec 2003 16:36:44 -0500 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia In-Reply-To: <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu> References: <3FD8B153.6000205@inel.gov> <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu> Message-ID: <1071178604.6164.46.camel@squash.scalableinformatics.com> Hi Mason: Eta? I have a non-functional cluster I think I can make function with 3.1. I would be happy to be a real world beta/gamma tester for it (immediately, eg. today). Please send me a URL. ... Joe On Thu, 2003-12-11 at 16:16, Mason J. Katz wrote: > Download 3.1 (out very soon now) and poke around. Basically there is a > single SSH host key, and all the nodes have a copy. This kills the > "man in the middle" warning every time you reinstall. > > -mjk > > On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote: > > > "Mason J. Katz" <mjk at sdsc.edu> wrote: > > > > > We've also moved from this method to a single cluster-wide ssh key > > for > > > Rocks 3.1. > > > > How does a single key work? I have successfully set up ssh host > > based authentication for some non-Rocks systems using > > > > http://www.omega.telia.net/vici/openssh/ > > > > (Note that OpenSSH_3.7.1p2 requires one more setting in addition > > to those mentioned in the above url. > > > > In <dir-of-ssh-conf-files>/ssh_config: > > EnableSSHKeysign yes) > > > > But I thought it still requires that each host in the has a key... > > am I wrong? Do you do it differently? > > > > Thanks, > > > > Andrew > > > > -- > > Andrew Shewmaker, Associate Engineer > > Phone: 1-208-526-1415 > > Idaho National Eng. and Environmental Lab. > > P.0. Box 1625, M.S. 3605 > > Idaho Falls, Idaho 83415-3605 -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com
  • 158.
    web : http://scalableinformatics.com phone:+1 734 612 4615 From mjk at sdsc.edu Thu Dec 11 13:34:30 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Thu, 11 Dec 2003 13:34:30 -0800 Subject: [Rocks-Discuss]ssh_known_hosts and ganglia In-Reply-To: <1071178604.6164.46.camel@squash.scalableinformatics.com> References: <3FD8B153.6000205@inel.gov> <52B4A71C-2C1F-11D8-832A-000A95DA5638@sdsc.edu> <1071178604.6164.46.camel@squash.scalableinformatics.com> Message-ID: <CD814510-2C21-11D8-832A-000A95DA5638@sdsc.edu> We're too close to send out more beta's right now, but if something bad happens before friday we'll reconsider. We are shooting for next week - but absolutely before the holidays. ho ho ho. We recognize that our delay on getting a current release out there is hurting new clusters, and just having the latest redhat kernel is going to fix most of these issues. -mjk On Dec 11, 2003, at 1:36 PM, Joe Landman wrote: > Hi Mason: > > Eta? I have a non-functional cluster I think I can make function > with > 3.1. I would be happy to be a real world beta/gamma tester for it > (immediately, eg. today). Please send me a URL. ... > > Joe > > On Thu, 2003-12-11 at 16:16, Mason J. Katz wrote: >> Download 3.1 (out very soon now) and poke around. Basically there is >> a >> single SSH host key, and all the nodes have a copy. This kills the >> "man in the middle" warning every time you reinstall. >> >> -mjk >> >> On Dec 11, 2003, at 10:02 AM, Andrew Shewmaker wrote: >> >>> "Mason J. Katz" <mjk at sdsc.edu> wrote: >>> >>>> We've also moved from this method to a single cluster-wide ssh key >>> for >>>> Rocks 3.1. >>> >>> How does a single key work? I have successfully set up ssh host >>> based authentication for some non-Rocks systems using >>> >>> http://www.omega.telia.net/vici/openssh/ >>> >>> (Note that OpenSSH_3.7.1p2 requires one more setting in addition >>> to those mentioned in the above url.
  • 159.
    >>> >>> In <dir-of-ssh-conf-files>/ssh_config: >>>EnableSSHKeysign yes) >>> >>> But I thought it still requires that each host in the has a key... >>> am I wrong? Do you do it differently? >>> >>> Thanks, >>> >>> Andrew >>> >>> -- >>> Andrew Shewmaker, Associate Engineer >>> Phone: 1-208-526-1415 >>> Idaho National Eng. and Environmental Lab. >>> P.0. Box 1625, M.S. 3605 >>> Idaho Falls, Idaho 83415-3605 > -- > Joseph Landman, Ph.D > Scalable Informatics LLC, > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > phone: +1 734 612 4615 From purikk at hotmail.com Thu Dec 11 15:06:17 2003 From: purikk at hotmail.com (Purushotham Komaravolu) Date: Thu, 11 Dec 2003 18:06:17 -0500 Subject: [Rocks-Discuss]Kernal of Rocks 3.0 References: <200312112001.hBBK1IJ18815@postal.sdsc.edu> Message-ID: <BAY1-DAV391Zg8eBpx700008b71@hotmail.com> Hi, I am a newbie to Rocks and have a few questions. I would appreciate help with those. 1) what kernel does latest rocks use, if its not latest can I use latest kernal and how? 2) is there any way to have more than 1 fronend nodes for failover redundancy? 3) did anybody install penguin compilers over the cluster Thanks Regards, Puru From bruno at rocksclusters.org Thu Dec 11 15:42:27 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Thu, 11 Dec 2003 15:42:27 -0800 Subject: [Rocks-Discuss]Kernal of Rocks 3.0 In-Reply-To: <BAY1-DAV391Zg8eBpx700008b71@hotmail.com> References: <200312112001.hBBK1IJ18815@postal.sdsc.edu> <BAY1- DAV391Zg8eBpx700008b71@hotmail.com> Message-ID: <AD988A9F-2C33-11D8-B821-000A95C4E3B4@rocksclusters.org> > 1) what kernel does latest rocks use, if its not latest can I use > latest > kernal and how?
  • 160.
    our upcoming release(scheduled to release next week) has kernel version 2.4.21. additionally, the new release includes documentation on how to build your own kernel RPM from a kernel.org tarball. > 2) is there any way to have more than 1 fronend nodes for failover > redundancy? no, that has not yet been implemented. > 3) did anybody install penguin compilers over the cluster i apologize, but i'm not familiar with the penguin compiler. we do have experience with gnu compilers, intel compilers and the portland group compilers. additionally, some folks in the rocks community have also successfully deployed the lahey compiler. - gb From oconnor at ucsd.edu Thu Dec 11 14:29:46 2003 From: oconnor at ucsd.edu (Edward O'Connor) Date: Thu, 11 Dec 2003 14:29:46 -0800 Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends? In-Reply-To: <ddptix48s6.fsf@oecpc11.ucsd.edu> (Edward O'Connor's message of "Fri, 22 Aug 2003 15:39:05 -0700") References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu> <ddptix48s6.fsf@oecpc11.ucsd.edu> Message-ID: <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu> Hi everybody, I'm trying to bring up some ia64 compute nodes in a cluster with an ia32 frontend. Normally, `cd /home/install; rocks-dist mirror dist` only sets up the frontend to handle ia32 compute nodes. I tried to manhandle `rocks-dist mirror` into mirroring the ia64 stuff from ftp.rocksclusters.org by giving it the --arch=ia64 option, but that didn't work, so I went ahead and did the mirroring step by hand. After having done so, `rocks-dist dist` still doesn't do the right thing. So, adding --arch=ia64 to that command yields this error output: ,---- | # rocks-dist --arch=ia64 dist | Cleaning distribution | Resolving versions (RPMs) | Resolving versions (SRPMs) | Adding support for rebuild distribution from source | Creating files (symbolic links - fast) | Creating symlinks to kickstart files | Fixing Comps Database | error - comps file is missing, skipping this step | Generating hdlist (rpm database) | error - could not find rpm anaconda-runtime | error - could not find genhdlist | Patching second stage loader (eKV, partioning, ...) | error - could not find second stage, skipping this step `----
  • 161.
    So my questionis, what do I need to do to the ia32 frontend to enable it to kickstart an ia64 compute node? Thanks. Ted -- Edward O'Connor oconnor at ucsd.edu From gotero at linuxprophet.com Thu Dec 11 21:14:33 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Thu, 11 Dec 2003 21:14:33 -0800 Subject: Fwd: [Rocks-Discuss]RE: Have anyone successfully build a set of grid compute nodes using Rocks? Message-ID: <1279F870-2C62-11D8-AAC6-000A95CD8EC8@linuxprophet.com> > > > We put two Itanium clusters and an x86 cluster together on a grid at > SC2003 using Rocks 3.1 beta and the Grid Roll. Simple CA is installed > on the cluster frontends for you, so all one has to do is create and > exchange certificates and update the grid-mapfiles. This grid was a > joint collaboration between SDSC, Promicro Systems and Callident. > > On Dec 11, 2003, at 12:08 AM, Nai Hong Hwa Francis wrote: > >> >> >> >> Hi, >> >> Have anyone successfully build a set of grid compute nodes using Rocks >> 3? >> Anyone care to share? >> >> >> Nai Hong Hwa Francis >> Institute of Molecular and Cell Biology (A*STAR) >> 30 Medical Drive >> Singapore 117609. >> DID: (65) 6874-6196 >> >> -----Original Message----- >> From: npaci-rocks-discussion-request at sdsc.edu >> [mailto:npaci-rocks-discussion-request at sdsc.edu] >> Sent: Thursday, December 11, 2003 11:54 AM >> To: npaci-rocks-discussion at sdsc.edu >> Subject: npaci-rocks-discussion digest, Vol 1 #642 - 4 msgs >> >> Send npaci-rocks-discussion mailing list submissions to >> npaci-rocks-discussion at sdsc.edu >> >> To subscribe or unsubscribe via the World Wide Web, visit >> >> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion >> or, via email, send a message with subject or body 'help' to
  • 162.
    >> npaci-rocks-discussion-request at sdsc.edu >> >> You can reach the person managing the list at >> npaci-rocks-discussion-admin at sdsc.edu >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of npaci-rocks-discussion digest..." >> >> >> Today's Topics: >> >> 1. RE: Do you have a list of the various models of Gigabit Ethernet >> Interfaces compatible to Rocks 3? (Nai Hong Hwa Francis) >> 2. Rocks 3.0.0 (Terrence Martin) >> 3. Re: "TypeError: loop over non-sequence" when trying >> to build CD distro (V. Rowley) >> >> --__--__-- >> >> Message: 1 >> Date: Thu, 11 Dec 2003 09:45:18 +0800 >> From: "Nai Hong Hwa Francis" <naihh at imcb.a-star.edu.sg> >> To: <npaci-rocks-discussion at sdsc.edu> >> Subject: [Rocks-Discuss]RE: Do you have a list of the various models >> of >> Gigabit Ethernet Interfaces compatible to Rocks 3? >> >> >> >> Hi All, >> >> Do you have a list of the various gigabit Ethernet interfaces that are >> compatible to Rocks 3? >> >> I am changing my nodes connectivity from 10/100 to 1000. >> >> Have anyone done that and how are the differences in performance or >> turnaround time? >> >> >> >> Thanks and Regards >> >> Nai Hong Hwa Francis >> Institute of Molecular and Cell Biology (A*STAR) >> 30 Medical Drive >> Singapore 117609. >> DID: (65) 6874-6196 >> >> -----Original Message----- >> From: npaci-rocks-discussion-request at sdsc.edu >> [mailto:npaci-rocks-discussion-request at sdsc.edu]=20 >> Sent: Thursday, December 11, 2003 9:25 AM >> To: npaci-rocks-discussion at sdsc.edu >> Subject: npaci-rocks-discussion digest, Vol 1 #641 - 13 msgs >> >> Send npaci-rocks-discussion mailing list submissions to >> npaci-rocks-discussion at sdsc.edu >>
  • 163.
    >> To subscribe or unsubscribe via the World Wide Web, visit >> =09 >> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion >> or, via email, send a message with subject or body 'help' to >> npaci-rocks-discussion-request at sdsc.edu >> >> You can reach the person managing the list at >> npaci-rocks-discussion-admin at sdsc.edu >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of npaci-rocks-discussion digest..." >> >> >> Today's Topics: >> >> 1. Non-homogenous legacy hardware (Chris Dwan (CCGB)) >> 2. Error during Make when building a new install floppy (Terrence >> Martin) >> 3. Re: Error during Make when building a new install floppy (Tim >> Carlson) >> 4. Re: Non-homogenous legacy hardware (Tim Carlson) >> 5. ssh_known_hosts and ganglia (Jag) >> 6. Re: ssh_known_hosts and ganglia (Mason J. Katz) >> 7. "TypeError: loop over non-sequence" when trying to build CD >> distro (V. Rowley) >> 8. Re: one node short in "labels" (Greg Bruno) >> 9. Re: "TypeError: loop over non-sequence" when trying to build CD >> distro (Mason J. Katz) >> 10. Re: "TypeError: loop over non-sequence" when trying >> to build CD distro (V. Rowley) >> 11. Re: "TypeError: loop over non-sequence" when trying to >> build CD distro (Tim Carlson) >> >> -- __--__-- >> Message: 1 >> Date: Wed, 10 Dec 2003 14:04:53 -0600 (CST) >> From: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> >> To: npaci-rocks-discussion at sdsc.edu >> Subject: [Rocks-Discuss]Non-homogenous legacy hardware >> >> >> I am integrating legacy systems into a ROCKS cluster, and have hit a >> snag with the auto-partition configuration: The new (old) systems >> have >> SCSI disks, while old (new) ones contain IDE. This is a non-issue so >> long as the initial install does its default partitioning. However, I >> have a "replace-auto-partition.xml" file which is unworkable for the >> SCSI >> based systems since it makes specific reference to "hda" rather than >> "sda." >> >> I would like to have a site-nodes/replace-auto-partition.xml file >> with a >> conditional such that "hda" or "sda" is used, based on the name of the >> node (or some other criterion). >> >> Is this possible? >> >> Thanks, in advance. If this is out there on the mailing list
  • 164.
    >> archives, >> a >> pointer would be greatly appreciated. >> >> -Chris Dwan >> The University of Minnesota >> >> -- __--__-- >> Message: 2 >> Date: Wed, 10 Dec 2003 12:09:11 -0800 >> From: Terrence Martin <tmartin at physics.ucsd.edu> >> To: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu> >> Subject: [Rocks-Discuss]Error during Make when building a new install >> floppy >> >> I get the following error when I try to rebuild a boot floppy for >> rocks. >> >> This is with the default CVS checkout with an update today according >> to=20 >> the rocks userguide. I have not actually attempted to make any >> changes. >> >> make[3]: Leaving directory=20 >> `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3/loader' >> make[2]: Leaving directory=20 >> `/home/install/rocks/src/rocks/boot/7.3/loader/anaconda-7.3' >> strip -o loader anaconda-7.3/loader/loader >> strip: anaconda-7.3/loader/loader: No such file or directory >> make[1]: *** [loader] Error 1 >> make[1]: Leaving directory >> `/home/install/rocks/src/rocks/boot/7.3/loader' >> make: *** [loader] Error 2 >> >> Of course I could avoid all of this together and just put my binary=20 >> module into the appropriate location in the boot image. >> >> Would it be correct to modify the following image file with my >> changes=20 >> and then write it to a floppy via dd? >> >> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ >> 7.3 >> /en/os/i386/images/bootnet.img >> >> Basically I am injecting an updated e1000 driver with changes to=20 >> pcitable to support the address of my gigabit cards. >> >> Terrence >> >> >> -- __--__-- >> Message: 3 >> Date: Wed, 10 Dec 2003 12:40:41 -0800 (PST) >> From: Tim Carlson <tim.carlson at pnl.gov> >> Subject: Re: [Rocks-Discuss]Error during Make when building a new >> install floppy >> To: Terrence Martin <tmartin at physics.ucsd.edu> >> Cc: npaci-rocks-discussion <npaci-rocks-discussion at sdsc.edu>
  • 165.
    >> Reply-to: TimCarlson <tim.carlson at pnl.gov> >> >> On Wed, 10 Dec 2003, Terrence Martin wrote: >> >>> I get the following error when I try to rebuild a boot floppy for >> rocks. >>> >> >> You can't make a boot floppy with Rocks 3.0. That isn't supported. Or >> at >> least it wasn't the last time I checked >> >>> Of course I could avoid all of this together and just put my binary >>> module into the appropriate location in the boot image. >>> >>> Would it be correct to modify the following image file with my >>> changes >>> and then write it to a floppy via dd? >>> >>> >> /home/install/ftp.rocksclusters.org/pub/rocks/rocks-3.0.0/rocks-dist/ >> 7.3 >> /en/os/i386/images/bootnet.img >>> >>> Basically I am injecting an updated e1000 driver with changes to >>> pcitable to support the address of my gigabit cards. >> >> Modifiying the bootnet.img is about 1/3 of what you need to do if you >> go >> down that path. You also need to work on netstg1.img and you'll need >> to >> update the drive in the kernel rpm that gets installed on the box. >> None >> of >> this is trivial. >> >> If it were me, I would go down the same path I took for updating the >> AIC79XX driver >> >> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/ >> 003 >> 533.html >> >> Tim >> >> Tim Carlson >> Voice: (509) 376 3423 >> Email: Tim.Carlson at pnl.gov >> EMSL UNIX System Support >> >> >> -- __--__-- >> Message: 4 >> Date: Wed, 10 Dec 2003 12:52:38 -0800 (PST) >> From: Tim Carlson <tim.carlson at pnl.gov> >> Subject: Re: [Rocks-Discuss]Non-homogenous legacy hardware >> To: "Chris Dwan (CCGB)" <cdwan at mail.ahc.umn.edu> >> Cc: npaci-rocks-discussion at sdsc.edu >> Reply-to: Tim Carlson <tim.carlson at pnl.gov>
  • 166.
    >> >> On Wed,10 Dec 2003, Chris Dwan (CCGB) wrote: >> >>> >>> I am integrating legacy systems into a ROCKS cluster, and have hit a >>> snag with the auto-partition configuration: The new (old) systems >> have >>> SCSI disks, while old (new) ones contain IDE. This is a non-issue so >>> long as the initial install does its default partitioning. However, >>> I >>> have a "replace-auto-partition.xml" file which is unworkable for the >> SCSI >>> based systems since it makes specific reference to "hda" rather than >>> "sda." >> >> If you have just a single drive, then you should be able to skip the >> "--ondisk" bits of your "part" command >> >> Otherwise, you would have first to do something ugly like the >> following: >> >> http://penguin.epfl.ch/slides/kickstart/ks.cfg >> >> You could probably (maybe) wrap most of that in an >> <eval sh=3D"bash"> >> </eval> >> >> block in the <main> block. >> >> Just guessing.. haven't tried this. >> >> Tim >> >> Tim Carlson >> Voice: (509) 376 3423 >> Email: Tim.Carlson at pnl.gov >> EMSL UNIX System Support >> >> >> -- __--__-- >> Message: 5 >> From: Jag <agrajag at dragaera.net> >> To: npaci-rocks-discussion at sdsc.edu >> Date: Wed, 10 Dec 2003 13:21:07 -0500 >> Subject: [Rocks-Discuss]ssh_known_hosts and ganglia >> >> I noticed a previous post on this list >> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ >> 001934 >> .html) indicating that Rocks distributes ssh keys for all the nodes >> over >> ganglia. Can anyone enlighten me as to how this is done? >> >> I looked through the ganglia docs and didn't see anything indicating >> how >> to do this, so I'm assuming Rocks made some changes. Unfortunately >> the >> rocks iso images don't seem to contain srpms, so I'm now coming >> here.=20
  • 167.
    >> What didRocks do to ganglia to make the distribution of ssh keys >> work? >> >> Also, does anyone know where Rocks SRPMs can be found? I've done >> quite >> a bit of searching, but haven't found them anywhere. >> >> >> -- __--__-- >> Message: 6 >> Cc: npaci-rocks-discussion at sdsc.edu >> From: "Mason J. Katz" <mjk at sdsc.edu> >> Subject: Re: [Rocks-Discuss]ssh_known_hosts and ganglia >> Date: Wed, 10 Dec 2003 14:39:15 -0800 >> To: Jag <agrajag at dragaera.net> >> >> Most of the SRPMS are on our FTP site, but we've screwed this up =20 >> before. The SRPMS are entirely Rocks specific so they are of little >> =20 >> value outside of Rocks. You can also checkout our CVS tree =20 >> (cvs.rocksclusters.org) where rocks/src/ganglia shows what we add. We >> =20 >> have a ganglia-python package we created to allow us to write our own >> =20 >> metrics at a high level than the provide gmetric application. We've >> =20 >> also moved from this method to a single cluster-wide ssh key for Rocks >> =20 >> 3.1. >> >> -mjk >> >> On Dec 10, 2003, at 10:21 AM, Jag wrote: >> >>> I noticed a previous post on this list >>> (https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/=20 >>> 001934.html) indicating that Rocks distributes ssh keys for all the >> =20 >>> nodes over >>> ganglia. Can anyone enlighten me as to how this is done? >>> >>> I looked through the ganglia docs and didn't see anything indicating >> =20 >>> how >>> to do this, so I'm assuming Rocks made some changes. Unfortunately >> the >>> rocks iso images don't seem to contain srpms, so I'm now coming here. >>> What did Rocks do to ganglia to make the distribution of ssh keys >> work? >>> >>> Also, does anyone know where Rocks SRPMs can be found? I've done >> quite >>> a bit of searching, but haven't found them anywhere. >> >> >> -- __--__-- >> Message: 7 >> Date: Wed, 10 Dec 2003 14:43:49 -0800 >> From: "V. Rowley" <vrowley at ucsd.edu>
  • 168.
    >> To: npaci-rocks-discussionat sdsc.edu >> Subject: [Rocks-Discuss]"TypeError: loop over non-sequence" when >> trying >> to build CD distro >> >> When I run this: >> >> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >> rocks-dist >> >> --dist=3Dcdrom cdrom >> >> on a server installed with ROCKS 3.0.0, I eventually get this: >> >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Adding support for rebuild distribution from source >>> Creating files (symbolic links - fast) >>> Creating symlinks to kickstart files >>> Fixing Comps Database >>> Generating hdlist (rpm database) >>> Patching second stage loader (eKV, partioning, ...) >>> patching "rocks-ekv" into distribution ... >>> patching "rocks-piece-pipe" into distribution ... >>> patching "PyXML" into distribution ... >>> patching "expat" into distribution ... >>> patching "rocks-pylib" into distribution ... >>> patching "MySQL-python" into distribution ... >>> patching "rocks-kickstart" into distribution ... >>> patching "rocks-kickstart-profiles" into distribution ... >>> patching "rocks-kickstart-dtds" into distribution ... >>> building CRAM filesystem ... >>> Cleaning distribution >>> Resolving versions (RPMs) >>> Resolving versions (SRPMs) >>> Creating symlinks to kickstart files >>> Generating hdlist (rpm database) >>> Segregating RPMs (rocks, non-rocks) >>> sh: ./kickstart.cgi: No such file or directory >>> sh: ./kickstart.cgi: No such file or directory >>> Traceback (innermost last): >>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>> app.run() >>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>> eval('self.command_%s()' % (command)) >>> File "<string>", line 0, in ? >>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>> builder.build() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>> (rocks, nonrocks) =3D self.segregateRPMS() >>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >> segregateRPMS >>> for pkg in ks.getSection('packages'): >>> TypeError: loop over non-sequence >> >> Any ideas? >> >> --=20
  • 169.
    >> Vicky Rowley email: vrowley at ucsd.edu >> Biomedical Informatics Research Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at >> http://www.sagacitech.com/Chinaweb >> >> >> -- __--__-- >> Message: 8 >> Cc: rocks <npaci-rocks-discussion at sdsc.edu> >> From: Greg Bruno <bruno at rocksclusters.org> >> Subject: Re: [Rocks-Discuss]one node short in "labels" >> Date: Wed, 10 Dec 2003 15:12:49 -0800 >> To: Vincent Fox <vincent_b_fox at yahoo.com> >> >>> So I go to the "labels" selection on the web page to print out = >> the=3D20 >>> pretty labels. What a nice idea by the way! >>> =3DA0 >>> EXCEPT....it's one node short! I go up to 0-13 and this stops at=3D20 >>> 0-12.=3DA0 Any ideas where I should check to fix this? >> >> yeah, we found this corner case -- it'll be fixed in the next release. >> >> thanks for bug report. >> >> - gb >> >> >> -- __--__-- >> Message: 9 >> Cc: npaci-rocks-discussion at sdsc.edu >> From: "Mason J. Katz" <mjk at sdsc.edu> >> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when >> trying to build CD distro >> Date: Wed, 10 Dec 2003 15:16:27 -0800 >> To: "V. Rowley" <vrowley at ucsd.edu> >> >> It looks like someone moved the profiles directory to profiles.orig. >> >> -mjk >> >> >> [root at rocks14 install]# ls -l >> total 56 >> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20 >> ftp.rocksclusters.org >> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20 >> ftp.rocksclusters.org.orig >> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi >> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig >> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38
  • 170.
    >> rocks-dist.orig >> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >> >>> When I run this: >>> >>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20 >>> rocks-dist --dist=3Dcdrom cdrom >>> >>> on a server installed with ROCKS 3.0.0, I eventually get this: >>> >>>> Cleaning distribution >>>> Resolving versions (RPMs) >>>> Resolving versions (SRPMs) >>>> Adding support for rebuild distribution from source >>>> Creating files (symbolic links - fast) >>>> Creating symlinks to kickstart files >>>> Fixing Comps Database >>>> Generating hdlist (rpm database) >>>> Patching second stage loader (eKV, partioning, ...) >>>> patching "rocks-ekv" into distribution ... >>>> patching "rocks-piece-pipe" into distribution ... >>>> patching "PyXML" into distribution ... >>>> patching "expat" into distribution ... >>>> patching "rocks-pylib" into distribution ... >>>> patching "MySQL-python" into distribution ... >>>> patching "rocks-kickstart" into distribution ... >>>> patching "rocks-kickstart-profiles" into distribution ... >>>> patching "rocks-kickstart-dtds" into distribution ... >>>> building CRAM filesystem ... >>>> Cleaning distribution >>>> Resolving versions (RPMs) >>>> Resolving versions (SRPMs) >>>> Creating symlinks to kickstart files >>>> Generating hdlist (rpm database) >>>> Segregating RPMs (rocks, non-rocks) >>>> sh: ./kickstart.cgi: No such file or directory >>>> sh: ./kickstart.cgi: No such file or directory >>>> Traceback (innermost last): >>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>> app.run() >>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>> eval('self.command_%s()' % (command)) >>>> File "<string>", line 0, in ? >>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>> builder.build() >>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>> (rocks, nonrocks) =3D self.segregateRPMS() >>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20 >>>> segregateRPMS >>>> for pkg in ks.getSection('packages'): >>>> TypeError: loop over non-sequence >>> >>> Any ideas? >>> >>> --=20 >>> Vicky Rowley email: vrowley at ucsd.edu >>> Biomedical Informatics Research Network work: (858) 536-5980
  • 171.
    >>> University ofCalifornia, San Diego fax: (858) 822-0828 >>> 9500 Gilman Drive >>> La Jolla, CA 92093-0715 >>> >>> >>> See pictures from our trip to China at=20 >>> http://www.sagacitech.com/Chinaweb >> >> >> -- __--__-- >> Message: 10 >> Date: Wed, 10 Dec 2003 16:50:16 -0800 >> From: "V. Rowley" <vrowley at ucsd.edu> >> To: "Mason J. Katz" <mjk at sdsc.edu> >> CC: npaci-rocks-discussion at sdsc.edu >> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when >> trying >> to build CD distro >> >> Yep, I did that, but only *AFTER* getting the error. [Thought it >> was=20 >> generated by the rocks-dist sequence, but apparently not.] Go >> ahead.=20 >> Move it back. Same difference. >> >> Vicky >> >> Mason J. Katz wrote: >>> It looks like someone moved the profiles directory to profiles.orig. >>> =20 >>> -mjk >>> =20 >>> =20 >>> [root at rocks14 install]# ls -l >>> total 56 >>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07=20 >>> ftp.rocksclusters.org >>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38=20 >>> ftp.rocksclusters.org.orig >>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 kickstart.cgi >>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 profiles.orig >>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 >> rocks-dist.orig >>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >>> =20 >>>> When I run this: >>>> >>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ;=20 >>>> rocks-dist --dist=3Dcdrom cdrom >>>> >>>> on a server installed with ROCKS 3.0.0, I eventually get this: >>>> >>>>> Cleaning distribution >>>>> Resolving versions (RPMs)
  • 172.
    >>>>> Resolving versions(SRPMs) >>>>> Adding support for rebuild distribution from source >>>>> Creating files (symbolic links - fast) >>>>> Creating symlinks to kickstart files >>>>> Fixing Comps Database >>>>> Generating hdlist (rpm database) >>>>> Patching second stage loader (eKV, partioning, ...) >>>>> patching "rocks-ekv" into distribution ... >>>>> patching "rocks-piece-pipe" into distribution ... >>>>> patching "PyXML" into distribution ... >>>>> patching "expat" into distribution ... >>>>> patching "rocks-pylib" into distribution ... >>>>> patching "MySQL-python" into distribution ... >>>>> patching "rocks-kickstart" into distribution ... >>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>> building CRAM filesystem ... >>>>> Cleaning distribution >>>>> Resolving versions (RPMs) >>>>> Resolving versions (SRPMs) >>>>> Creating symlinks to kickstart files >>>>> Generating hdlist (rpm database) >>>>> Segregating RPMs (rocks, non-rocks) >>>>> sh: ./kickstart.cgi: No such file or directory >>>>> sh: ./kickstart.cgi: No such file or directory >>>>> Traceback (innermost last): >>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>> app.run() >>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>> eval('self.command_%s()' % (command)) >>>>> File "<string>", line 0, in ? >>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>> builder.build() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>> (rocks, nonrocks) =3D self.segregateRPMS() >>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in=20 >>>>> segregateRPMS >>>>> for pkg in ks.getSection('packages'): >>>>> TypeError: loop over non-sequence >>>> >>>> >>>> Any ideas? >>>> >>>> --=20 >>>> Vicky Rowley email: vrowley at ucsd.edu >>>> Biomedical Informatics Research Network work: (858) 536-5980 >>>> University of California, San Diego fax: (858) 822-0828 >>>> 9500 Gilman Drive >>>> La Jolla, CA 92093-0715 >>>> >>>> >>>> See pictures from our trip to China at >> http://www.sagacitech.com/Chinaweb >>> =20 >>> =20 >>> =20 >> >> --=20 >> Vicky Rowley email: vrowley at ucsd.edu
  • 173.
    >> Biomedical InformaticsResearch Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at >> http://www.sagacitech.com/Chinaweb >> >> >> -- __--__-- >> Message: 11 >> Date: Wed, 10 Dec 2003 17:23:25 -0800 (PST) >> From: Tim Carlson <tim.carlson at pnl.gov> >> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when >> trying to >> build CD distro >> To: "V. Rowley" <vrowley at ucsd.edu> >> Cc: "Mason J. Katz" <mjk at sdsc.edu>, npaci-rocks-discussion at sdsc.edu >> Reply-to: Tim Carlson <tim.carlson at pnl.gov> >> >> On Wed, 10 Dec 2003, V. Rowley wrote: >> >> Did you remove python by chance? kickstart.cgi calls python directly >> in >> /usr/bin/python while rocks-dist does an "env python" >> >> Tim >> >>> Yep, I did that, but only *AFTER* getting the error. [Thought it was >>> generated by the rocks-dist sequence, but apparently not.] Go ahead. >>> Move it back. Same difference. >>> >>> Vicky >>> >>> Mason J. Katz wrote: >>>> It looks like someone moved the profiles directory to profiles.orig. >>>> >>>> -mjk >>>> >>>> >>>> [root at rocks14 install]# ls -l >>>> total 56 >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 contrib.orig >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 >>>> ftp.rocksclusters.org >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 >>>> ftp.rocksclusters.org.orig >>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 >> kickstart.cgi >>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 >> profiles.orig >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 >> rocks-dist.orig >>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote:
  • 174.
    >>>> >>>>> When Irun this: >>>>> >>>>> [root at rocks14 install]# rocks-dist mirror ; rocks-dist dist ; >>>>> rocks-dist --dist=3Dcdrom cdrom >>>>> >>>>> on a server installed with ROCKS 3.0.0, I eventually get this: >>>>> >>>>>> Cleaning distribution >>>>>> Resolving versions (RPMs) >>>>>> Resolving versions (SRPMs) >>>>>> Adding support for rebuild distribution from source >>>>>> Creating files (symbolic links - fast) >>>>>> Creating symlinks to kickstart files >>>>>> Fixing Comps Database >>>>>> Generating hdlist (rpm database) >>>>>> Patching second stage loader (eKV, partioning, ...) >>>>>> patching "rocks-ekv" into distribution ... >>>>>> patching "rocks-piece-pipe" into distribution ... >>>>>> patching "PyXML" into distribution ... >>>>>> patching "expat" into distribution ... >>>>>> patching "rocks-pylib" into distribution ... >>>>>> patching "MySQL-python" into distribution ... >>>>>> patching "rocks-kickstart" into distribution ... >>>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>>> building CRAM filesystem ... >>>>>> Cleaning distribution >>>>>> Resolving versions (RPMs) >>>>>> Resolving versions (SRPMs) >>>>>> Creating symlinks to kickstart files >>>>>> Generating hdlist (rpm database) >>>>>> Segregating RPMs (rocks, non-rocks) >>>>>> sh: ./kickstart.cgi: No such file or directory >>>>>> sh: ./kickstart.cgi: No such file or directory >>>>>> Traceback (innermost last): >>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>>> app.run() >>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>>> eval('self.command_%s()' % (command)) >>>>>> File "<string>", line 0, in ? >>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>>> builder.build() >>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>>> (rocks, nonrocks) =3D self.segregateRPMS() >>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>>>>> segregateRPMS >>>>>> for pkg in ks.getSection('packages'): >>>>>> TypeError: loop over non-sequence >>>>> >>>>> >>>>> Any ideas? >>>>> >>>>> -- >>>>> Vicky Rowley email: vrowley at ucsd.edu >>>>> Biomedical Informatics Research Network work: (858) 536-5980 >>>>> University of California, San Diego fax: (858) 822-0828 >>>>> 9500 Gilman Drive >>>>> La Jolla, CA 92093-0715
  • 175.
    >>>>> >>>>> >>>>> See picturesfrom our trip to China at >> http://www.sagacitech.com/Chinaweb >>>> >>>> >>>> >>> >>> -- >>> Vicky Rowley email: vrowley at ucsd.edu >>> Biomedical Informatics Research Network work: (858) 536-5980 >>> University of California, San Diego fax: (858) 822-0828 >>> 9500 Gilman Drive >>> La Jolla, CA 92093-0715 >>> >>> >>> See pictures from our trip to China at >> http://www.sagacitech.com/Chinaweb >>> >>> >> >> >> >> >> -- __--__-- >> _______________________________________________ >> npaci-rocks-discussion mailing list >> npaci-rocks-discussion at sdsc.edu >> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion >> >> >> End of npaci-rocks-discussion Digest >> >> >> DISCLAIMER: >> This email is confidential and may be privileged. If you are not the = >> intended recipient, please delete it and notify us immediately. >> Please = >> do not copy or use it for any purpose, or disclose its contents to >> any = >> other person as it may be an offence under the Official Secrets Act. = >> Thank you. >> >> --__--__-- >> >> Message: 2 >> Date: Wed, 10 Dec 2003 18:03:41 -0800 >> From: Terrence Martin <tmartin at physics.ucsd.edu> >> To: npaci-rocks-discussion at sdsc.edu >> Subject: [Rocks-Discuss]Rocks 3.0.0 >> >> I am having a problem on install of rocks 3.0.0 on my new cluster. >> >> The python error occurs right after anaconda starts and just before >> the >> install asks for the roll CDROM. >> >> The error refers to an inability to find or load rocks.file. The error >> is associated I think with the window that pops up and asks you in put
  • 176.
    >> the rollCDROM in. >> >> The process I followed to get to this point is >> >> Put the Rocks 3.0.0 CDROM into the CDROM drive >> Boot the system >> At the prompt type frontend >> Wait till anaconda starts >> Error referring to unable to load rocks.file. >> >> I have successfully installed rocks on a smaller cluster but that has >> different hardware. I used the same CDROM for both installs. >> >> Any thoughts? >> >> Terrence >> >> >> >> --__--__-- >> >> Message: 3 >> Date: Wed, 10 Dec 2003 19:52:49 -0800 >> From: "V. Rowley" <vrowley at ucsd.edu> >> To: npaci-rocks-discussion at sdsc.edu >> Subject: Re: [Rocks-Discuss]"TypeError: loop over non-sequence" when >> trying >> to build CD distro >> >> Looks like python is okay: >> >>> [root at rocks14 birn-oracle1]# which python >>> /usr/bin/python >>> [root at rocks14 birn-oracle1]# python --help >>> Unknown option: -- >>> usage: python [option] ... [-c cmd | file | -] [arg] ... >>> Options and arguments (and corresponding environment variables): >>> -d : debug output from parser (also PYTHONDEBUG=x) >>> -i : inspect interactively after running script, (also >> PYTHONINSPECT=x) >>> and force prompts, even if stdin does not appear to be a >> terminal >>> -O : optimize generated bytecode (a tad; also PYTHONOPTIMIZE=x) >>> -OO : remove doc-strings in addition to the -O optimizations >>> -S : don't imply 'import site' on initialization >>> -t : issue warnings about inconsistent tab usage (-tt: issue >> errors) >>> -u : unbuffered binary stdout and stderr (also >>> PYTHONUNBUFFERED=x) >>> -v : verbose (trace import statements) (also PYTHONVERBOSE=x) >>> -x : skip first line of source, allowing use of non-Unix forms of >> #!cmd >>> -X : disable class based built-in exceptions >>> -c cmd : program passed in as string (terminates option list) >>> file : program read from script file >>> - : program read from stdin (default; interactive mode if a tty) >>> arg ...: arguments passed to program in sys.argv[1:] >>> Other environment variables: >>> PYTHONSTARTUP: file executed on interactive startup (no default)
  • 177.
    >>> PYTHONPATH : ':'-separated list of directories prefixed to the >>> default module search path. The result is sys.path. >>> PYTHONHOME : alternate <prefix> directory (or >> <prefix>:<exec_prefix>). >>> The default module search path uses >>> <prefix>/python1.5. >>> [root at rocks14 birn-oracle1]# >> >> >> >> Tim Carlson wrote: >>> On Wed, 10 Dec 2003, V. Rowley wrote: >>> >>> Did you remove python by chance? kickstart.cgi calls python directly >> in >>> /usr/bin/python while rocks-dist does an "env python" >>> >>> Tim >>> >>> >>>> Yep, I did that, but only *AFTER* getting the error. [Thought it >>>> was >>>> generated by the rocks-dist sequence, but apparently not.] Go >>>> ahead. >>>> Move it back. Same difference. >>>> >>>> Vicky >>>> >>>> Mason J. Katz wrote: >>>> >>>>> It looks like someone moved the profiles directory to >>>>> profiles.orig. >>>>> >>>>> -mjk >>>>> >>>>> >>>>> [root at rocks14 install]# ls -l >>>>> total 56 >>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:16 cdrom >>>>> drwxrwsr-x 5 root wheel 4096 Dec 10 20:38 >>>>> contrib.orig >>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:07 >>>>> ftp.rocksclusters.org >>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 20:38 >>>>> ftp.rocksclusters.org.orig >>>>> -r-xrwsr-x 1 root wheel 19254 Sep 3 12:40 >>>>> kickstart.cgi >>>>> drwxr-xr-x 3 root root 4096 Dec 10 20:38 >>>>> profiles.orig >>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:15 rocks-dist >>>>> drwxrwsr-x 3 root wheel 4096 Dec 10 20:38 >> rocks-dist.orig >>>>> drwxr-sr-x 3 root wheel 4096 Dec 10 21:02 src >>>>> drwxr-sr-x 4 root wheel 4096 Dec 10 20:49 src.foo >>>>> On Dec 10, 2003, at 2:43 PM, V. Rowley wrote: >>>>> >>>>> >>>>>> When I run this: >>>>>>
  • 178.
    >>>>>> [root atrocks14 install]# rocks-dist mirror ; rocks-dist dist ; >>>>>> rocks-dist --dist=cdrom cdrom >>>>>> >>>>>> on a server installed with ROCKS 3.0.0, I eventually get this: >>>>>> >>>>>> >>>>>>> Cleaning distribution >>>>>>> Resolving versions (RPMs) >>>>>>> Resolving versions (SRPMs) >>>>>>> Adding support for rebuild distribution from source >>>>>>> Creating files (symbolic links - fast) >>>>>>> Creating symlinks to kickstart files >>>>>>> Fixing Comps Database >>>>>>> Generating hdlist (rpm database) >>>>>>> Patching second stage loader (eKV, partioning, ...) >>>>>>> patching "rocks-ekv" into distribution ... >>>>>>> patching "rocks-piece-pipe" into distribution ... >>>>>>> patching "PyXML" into distribution ... >>>>>>> patching "expat" into distribution ... >>>>>>> patching "rocks-pylib" into distribution ... >>>>>>> patching "MySQL-python" into distribution ... >>>>>>> patching "rocks-kickstart" into distribution ... >>>>>>> patching "rocks-kickstart-profiles" into distribution ... >>>>>>> patching "rocks-kickstart-dtds" into distribution ... >>>>>>> building CRAM filesystem ... >>>>>>> Cleaning distribution >>>>>>> Resolving versions (RPMs) >>>>>>> Resolving versions (SRPMs) >>>>>>> Creating symlinks to kickstart files >>>>>>> Generating hdlist (rpm database) >>>>>>> Segregating RPMs (rocks, non-rocks) >>>>>>> sh: ./kickstart.cgi: No such file or directory >>>>>>> sh: ./kickstart.cgi: No such file or directory >>>>>>> Traceback (innermost last): >>>>>>> File "/opt/rocks/bin/rocks-dist", line 807, in ? >>>>>>> app.run() >>>>>>> File "/opt/rocks/bin/rocks-dist", line 623, in run >>>>>>> eval('self.command_%s()' % (command)) >>>>>>> File "<string>", line 0, in ? >>>>>>> File "/opt/rocks/bin/rocks-dist", line 736, in command_cdrom >>>>>>> builder.build() >>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1223, in build >>>>>>> (rocks, nonrocks) = self.segregateRPMS() >>>>>>> File "/opt/rocks/lib/python/rocks/build.py", line 1107, in >>>>>>> segregateRPMS >>>>>>> for pkg in ks.getSection('packages'): >>>>>>> TypeError: loop over non-sequence >>>>>> >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> -- >>>>>> Vicky Rowley email: vrowley at ucsd.edu >>>>>> Biomedical Informatics Research Network work: (858) 536-5980 >>>>>> University of California, San Diego fax: (858) 822-0828 >>>>>> 9500 Gilman Drive >>>>>> La Jolla, CA 92093-0715 >>>>>> >>>>>>
  • 179.
    >>>>>> See picturesfrom our trip to China at >> http://www.sagacitech.com/Chinaweb >>>>> >>>>> >>>>> >>>> -- >>>> Vicky Rowley email: vrowley at ucsd.edu >>>> Biomedical Informatics Research Network work: (858) 536-5980 >>>> University of California, San Diego fax: (858) 822-0828 >>>> 9500 Gilman Drive >>>> La Jolla, CA 92093-0715 >>>> >>>> >>>> See pictures from our trip to China at >> http://www.sagacitech.com/Chinaweb >>>> >>>> >>> >>> >>> >>> >> >> -- >> Vicky Rowley email: vrowley at ucsd.edu >> Biomedical Informatics Research Network work: (858) 536-5980 >> University of California, San Diego fax: (858) 822-0828 >> 9500 Gilman Drive >> La Jolla, CA 92093-0715 >> >> >> See pictures from our trip to China at >> http://www.sagacitech.com/Chinaweb >> >> >> >> --__--__-- >> >> _______________________________________________ >> npaci-rocks-discussion mailing list >> npaci-rocks-discussion at sdsc.edu >> http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion >> >> >> End of npaci-rocks-discussion Digest >> >> >> DISCLAIMER: >> This email is confidential and may be privileged. If you are not the >> intended recipient, please delete it and notify us immediately. >> Please do not copy or use it for any purpose, or disclose its >> contents to any other person as it may be an offence under the >> Official Secrets Act. Thank you. >> >> > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 > >
  • 180.
    Glen Otero, Ph.D. LinuxProphet 619.917.1772 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 35605 bytes Desc: not available Url : https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031211/1a0b38fb/attachment-0001.bin From tmartin at physics.ucsd.edu Fri Dec 12 10:26:58 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Fri, 12 Dec 2003 10:26:58 -0800 Subject: [Rocks-Discuss]ftp.rocksclusters.org mirror? Message-ID: <3FDA0872.8010405@physics.ucsd.edu> I was wondering, does the command rocks-dist do anything else besides call wget on the correct tree at ftp.rocksclusters.org? I ask because some firewall restrictions on a system I am hesitant to fiddle are preventing me from running rocks-dist mirror from my head node. I would like to download the mirror of the rocks distro on another system, transfer the tree and then run rocks-dist dist to rebuild the rocks for my compute nodes. Is this reasonable? Also am I going to run into any problems with rocks 3.0.0 having installed the head node on a UP system but my compute nodes are SMP? I am making an assumption that once I get all of the packages into rocks (currently there is no smp kernels on the head node) the compute nodes will install the right kernel? BTW thanks for the help so far, the trick it seems to getting Rocks 3.0.0 on these supermicro systems is to install rocks on the hard drive in a separate computer and then install the hard disk. Thanks, Terrence From mjk at sdsc.edu Fri Dec 12 10:48:17 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Fri, 12 Dec 2003 10:48:17 -0800 Subject: [Rocks-Discuss]ftp.rocksclusters.org mirror? In-Reply-To: <3FDA0872.8010405@physics.ucsd.edu> References: <3FDA0872.8010405@physics.ucsd.edu> Message-ID: <BF99287A-2CD3-11D8-A2DC-000A95DA5638@sdsc.edu> - Yes, "rocks-dist mirror" does a python system() call to run the wget application. It does this several time for the various directories it needs. - No, the compute nodes do not need to be the same SMPness of the
  • 181.
    frontend. All installationsare done with Red Hat Kickstart (plus our pixie dust) so hardware is auto detected for you. This is not disk imaging :) -mjk On Dec 12, 2003, at 10:26 AM, Terrence Martin wrote: > I was wondering, does the command rocks-dist do anything else besides > call wget on the correct tree at ftp.rocksclusters.org? > > I ask because some firewall restrictions on a system I am hesitant to > fiddle are preventing me from running rocks-dist mirror from my head > node. I would like to download the mirror of the rocks distro on > another system, transfer the tree and then run rocks-dist dist to > rebuild the rocks for my compute nodes. Is this reasonable? > > Also am I going to run into any problems with rocks 3.0.0 having > installed the head node on a UP system but my compute nodes are SMP? I > am making an assumption that once I get all of the packages into rocks > (currently there is no smp kernels on the head node) the compute nodes > will install the right kernel? > > BTW thanks for the help so far, the trick it seems to getting Rocks > 3.0.0 on these supermicro systems is to install rocks on the hard > drive in a separate computer and then install the hard disk. > > Thanks, > > Terrence > > From mjk at sdsc.edu Fri Dec 12 10:54:03 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Fri, 12 Dec 2003 10:54:03 -0800 Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends? In-Reply-To: <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu> References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu> <ddptix48s6.fsf@oecpc11.ucsd.edu> <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu> Message-ID: <8E405599-2CD4-11D8-A2DC-000A95DA5638@sdsc.edu> We haven't done this for a while, and since our 3.0 release using different version of Red Hat for x86 and IA64 cross-building distribution may not work. 3.1.0 (since you are on campus you'll get a CD set from us next week) uses the same base RH for all architecture so this should be possible again. The mirror should have worked: # rocks-dist --arch=ia64 mirror Should be the ia64 tree from ftp.rocksclusters.org, you can also use your IA64 DVD mount it on /mnt/cdrom and do a "rocks-dist copycd" to create the IA64 mirror. If this works you will the to use the --genhdlist flag w/ rocks-dist.
  • 182.
    For example: # cd /home/install # rocks-dist dist --- build the x86 distribution # rocks-dist --arch=ia64 --genhdlist=rocks-dist/.../i386/.../genhdlist You'll need to use find to determine the path of the genhdlist executable in you x86 distribution. This may still fail (since RH version differ), but it does work when the version are the same for both archs. -mjk On Dec 11, 2003, at 2:29 PM, Edward O'Connor wrote: > Hi everybody, > > I'm trying to bring up some ia64 compute nodes in a cluster with an > ia32 > frontend. Normally, `cd /home/install; rocks-dist mirror dist` only > sets > up the frontend to handle ia32 compute nodes. I tried to manhandle > `rocks-dist mirror` into mirroring the ia64 stuff from > ftp.rocksclusters.org by giving it the --arch=ia64 option, but that > didn't work, so I went ahead and did the mirroring step by hand. > > After having done so, `rocks-dist dist` still doesn't do the right > thing. So, adding --arch=ia64 to that command yields this error output: > > ,---- > | # rocks-dist --arch=ia64 dist > | Cleaning distribution > | Resolving versions (RPMs) > | Resolving versions (SRPMs) > | Adding support for rebuild distribution from source > | Creating files (symbolic links - fast) > | Creating symlinks to kickstart files > | Fixing Comps Database > | error - comps file is missing, skipping this step > | Generating hdlist (rpm database) > | error - could not find rpm anaconda-runtime > | error - could not find genhdlist > | Patching second stage loader (eKV, partioning, ...) > | error - could not find second stage, skipping this step > `---- > > So my question is, what do I need to do to the ia32 frontend to enable > it to kickstart an ia64 compute node? Thanks. > > > Ted > > -- > Edward O'Connor > oconnor at ucsd.edu From mjk at sdsc.edu Fri Dec 12 11:12:59 2003
  • 183.
    From: mjk atsdsc.edu (Mason J. Katz) Date: Fri, 12 Dec 2003 11:12:59 -0800 Subject: [Rocks-Discuss]I can't use xpbs in rocks In-Reply-To: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com> References: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com> Message-ID: <32F6A3BA-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu> Unfortunately we don't have a fix here. We've moved to SGE (your can now use QMon). We do have a PBS roll but we plan to release 3.1 before the PBS roll is complete. -mjk On Dec 10, 2003, at 8:44 PM, zhong wenyu wrote: > Hi,everyone! > I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of > them. > typed:xpbs[enter] > showed:xpbs: initialization failed! output: invalid command name > "Pref_Init" > thanks! > > _________________________________________________________________ > ?????????????? MSN Messenger: http://messenger.msn.com/cn From fparnold at chem.northwestern.edu Fri Dec 12 06:52:45 2003 From: fparnold at chem.northwestern.edu (Fred P. Arnold) Date: Fri, 12 Dec 2003 08:52:45 -0600 (CST) Subject: [Rocks-Discuss]Gig E on HP ZX6000 Message-ID: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu> Hello, I know this is a hardware question, not technically a Rocks one, but I can't find the answer in my HP manuals: On the ZX6000, there are two ethernet ports, a 10/100 basic/management port, and a 1000 which is designated the primary interface. Unfortunately, rocks always identifies the 10/100 as eth0. Does anyone know how to disable the 10/100 on a ZX6000? On an IA32, I'd go into the bios, but these don't technically have one. We'd like to run ours on a pure Gig network. Thanks. -Fred Frederick P. Arnold, Jr. NUIT, Northwestern U. f-arnold at northwestern.edu From mjk at sdsc.edu Fri Dec 12 11:16:42 2003 From: mjk at sdsc.edu (Mason J. Katz)
  • 184.
    Date: Fri, 12Dec 2003 11:16:42 -0800 Subject: [Rocks-Discuss]ScalablePBS. In-Reply-To: <200311212352.27000.Roy.Dragseth@cc.uit.no> References: <200311212352.27000.Roy.Dragseth@cc.uit.no> Message-ID: <B83C8894-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu> hi Roy, This should become the basis of the PBS roll (currently openpbs). We are seeking developers who would like to help write and maintain this -- I'm not singling you out Roy, although you would be more than welcome, rather I'm taking advantage of your message to solicit other volunteers. Anyone? -mjk On Nov 21, 2003, at 2:52 PM, Roy Dragseth wrote: > Hi folks. > > I've been testing ScalablePBS (SPBS) from supercluster.org for a few > weeks now > and it seems like a fairly good replacement for OpenPBS. Only a few > minor > changes to the OpenPBS infrastructure were needed to accomplish the > neccessary changes in the kickstart generation to make the nodes > switch to > SPBS. > > SPBS is based on OpenPBS 2.3.12, but incorporates most provided patches > (sandia etc) and is actively developed by the same maintainers that > develop > maui. It scales better than OpenPBS, to around 2K nodes, has better > fault > tolerance and communicates better with maui. It has, as far as I can > see, no > user visible changes from OpenPBS. > > I know, a lot of people are moving away from pbs and into sge, I was > thinking > about making the switch too. The emergence SPBS seems to make the > switch > unneccessary and I don't have to teach myself (and the users) a new > queueing > interface... > > Configuration tested: > Rocks 3.0.0 > SPBS 1.0.1p0 (should leave beta phase next month) > Maui 3.2.6p6 (available for "Early Access Production") > > SPBS and Maui can be downloaded from http://www.supercluster.org/ > > Have a nice weekend, > r. > > -- >
  • 185.
    > The Computer Center, University of Troms?, N-9037 TROMS?, Norway. > phone:+47 77 64 41 07, fax:+47 77 64 41 00 > Roy Dragseth, High Performance Computing System Administrator > Direct call: +47 77 64 62 56. email: royd at cc.uit.no From jlkaiser at fnal.gov Fri Dec 12 11:25:58 2003 From: jlkaiser at fnal.gov (Joseph L. Kaiser) Date: Fri, 12 Dec 2003 13:25:58 -0600 Subject: [Rocks-Discuss](no subject) Message-ID: <1071257158.3719.9.camel@ajax.kaisergroup.net> My install of 3.0.0 is crapping out here: "/usr/src/build/90289-i386/install// a x x usr/lib/anaconda/comps.py", line a x x 153, in __getitem__ a x x KeyError: PyXML # x x Even though PyXML is in the distribution I have built. Is there anything that can cause this other than the missing RPM? Thanks, Joe From oconnor at soe.ucsd.edu Fri Dec 12 11:36:04 2003 From: oconnor at soe.ucsd.edu (Edward O'Connor) Date: Fri, 12 Dec 2003 11:36:04 -0800 Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends? In-Reply-To: <8E405599-2CD4-11D8-A2DC-000A95DA5638@sdsc.edu> (Mason J. Katz's message of "Fri, 12 Dec 2003 10:54:03 -0800") References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu> <ddptix48s6.fsf@oecpc11.ucsd.edu> <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu> <8E405599-2CD4-11D8-A2DC-000A95DA5638@sdsc.edu> Message-ID: <ddiskl4ymz.fsf@oecpc11.ucsd.edu> > We haven't done this for a while, and since our 3.0 release using > different version of Red Hat for x86 and IA64 cross-building > distribution may not work. Ahh. After further travails (read below), I'm pretty willing to suspect that this indeed does not work in Rocks 3.0.0. I'm looking forward to those 3.1.0 CDs and DVDs next week! :) > you can also use your IA64 DVD mount it on /mnt/cdrom and do a > "rocks-dist copycd" to create the IA64 mirror. Unfortunately, the ia32 frontend machine doesn't have a DVD drive in it. So I mounted the ia64 ISO image on /mnt/cdrom via a loopback device and that worked fine.
  • 186.
    However, `rocks-dist copycd`seemed to have nuked the ia32 stuff under /home/install/ftp.rocksclusters.org/, or, if it didn't entirely nuke it, it made the bare `rocks-dist dist` of your next instructions fail: > If this works you will the to use the --genhdlist flag w/ rocks-dist. > For example: > > # cd /home/install > # rocks-dist dist --- build the x86 distribution As this failed, I went ahead and also ran a `rocks-dist mirror`, which proceeded to download a whole lot of stuff from you guys. After it finished, `rocks-dist dist` completed without error. I double-checked and the ia64 mirror from the `rocks-dist copycd` command still appears to be there. > # rocks-dist --arch=ia64 --genhdlist=rocks-dist/.../i386/.../genhdlist Should there be a `dist` at the end of that? The above command (with the substitution of the appropriate genhdlist path) appears to be a no-op. So I appended a `dist` as the idea is for it to create the appropriate symlinks for ia64 as well, and it bombs out too, in the same way as before: ,---- | # rocks-dist --arch=ia64 --genhdlist=rocks-dist/7.3/en/os/i386/usr/lib/anaconda- runtime/genhdlist dist | Cleaning distribution | Resolving versions (RPMs) | Resolving versions (SRPMs) | Adding support for rebuild distribution from source | Creating files (symbolic links - fast) | Creating symlinks to kickstart files | Fixing Comps Database | error - comps file is missing, skipping this step | Generating hdlist (rpm database) | error creating file /home/install/rocks- dist/desktop/7.3/en/os/ia64/RedHat/base/hdlist: No such file or directory | Patching second stage loader (eKV, partioning, ...) | error - could not find second stage, skipping this step `---- > You'll need to use find to determine the path of the genhdlist > executable in you x86 distribution. This may still fail (since RH > version differ), but it does work when the version are the same for > both archs. I suppose at this point that it's still failing due to the RH version mismatch, and that getting this to work in 3.0.0 is a lost cause. Ted -- Edward O'Connor oconnor at ucsd.edu
  • 187.
    From jared_hodge atiat.utexas.edu Fri Dec 12 12:07:32 2003 From: jared_hodge at iat.utexas.edu (Jared Hodge) Date: Fri, 12 Dec 2003 14:07:32 -0600 Subject: [Rocks-Discuss]I can't use xpbs in rocks References: <BAY3-F24QLayI4TY7zD00009bf1@hotmail.com> <32F6A3BA-2CD7-11D8- A2DC-000A95DA5638@sdsc.edu> Message-ID: <3FDA2004.3020203@iat.utexas.edu> OK, I've got a fix for this one. The problem is that xpbs thinks that it's in the directory /var/tmp/OpenPBS-buildroot/opt/OpenPBS/ Anyway, the path is mangled to get to some of the subroutines. The rocks guys can figure out a way to prevent this in future releases, but here's how you can get it working (and pbsmon while were at it): First fix the scripts: /opt/OpenPBS/bin/xpbs Need's the following changes: #set libdir /var/tmp/OpenPBS-buildroot/opt/OpenPBS/lib/xpbs #set appdefdir /var/tmp/OpenPBS-buildroot/opt/OpenPBS/lib/xpbs set libdir /opt/OpenPBS/lib/xpbs set appdefdir /opt/OpenPBS/lib/xpbs /opt/OpenPBS/bin/xpbsmon Needs the same thing plus the first line needs changed now do the following: cd /opt/OpenPBS/lib/xpbs rm tclIndex ./buildindex `pwd` cd /opt/OpenPBS/lib/xpbsmon rm tclIndex ./buildindex `pwd` That should fix it all up. I tested this on a 2.3.2 cluster, I assume it's the same on 3.0. -- Jared Hodge The Institute for Advanced Technology The University of Texas at Austin 3925 W. Braker Lane, Suite 400 Austin, Texas 78759 Phone: 512-232-4460 Fax: 512-471-9096 Email: jared_hodge at iat.utexas.edu Mason J. Katz wrote: > Unfortunately we don't have a fix here. We've moved to SGE (your can > now use QMon). We do have a PBS roll but we plan to release 3.1 > before the PBS roll is complete. > > -mjk
  • 188.
    > > On Dec10, 2003, at 8:44 PM, zhong wenyu wrote: > >> Hi,everyone! >> I have installed rocks 2.3.2 and 3.0.0,xpbs can not be use in both of >> them. >> typed:xpbs[enter] >> showed:xpbs: initialization failed! output: invalid command name >> "Pref_Init" >> thanks! >> >> _________________________________________________________________ >> ?????????????? MSN Messenger: http://messenger.msn.com/cn > > From jlkaiser at fnal.gov Fri Dec 12 14:39:42 2003 From: jlkaiser at fnal.gov (Joe Kaiser) Date: Fri, 12 Dec 2003 16:39:42 -0600 Subject: [Rocks-Discuss](no subject) In-Reply-To: <1071257158.3719.9.camel@ajax.kaisergroup.net> References: <1071257158.3719.9.camel@ajax.kaisergroup.net> Message-ID: <1071268782.22030.0.camel@nietzsche.fnal.gov> Sorry, creating extra links where they don't belong. Nevermind. On Fri, 2003-12-12 at 13:25, Joseph L. Kaiser wrote: > My install of 3.0.0 is crapping out here: > > "/usr/src/build/90289-i386/install// a x > x usr/lib/anaconda/comps.py", line a > x > x 153, in __getitem__ a > x > x KeyError: PyXML # > x > x > > > Even though PyXML is in the distribution I have built. Is there > anything that can cause this other than the missing RPM? > > Thanks, > > Joe -- =================================================================== Joe Kaiser - Systems Administrator Fermi Lab CD/OSS-SCS Never laugh at live dragons. 630-840-6444 jlkaiser at fnal.gov ===================================================================
  • 189.
    From jholland atcs.uh.edu Fri Dec 12 14:52:10 2003 From: jholland at cs.uh.edu (Jason Holland) Date: Fri, 12 Dec 2003 16:52:10 -0600 (CST) Subject: [Rocks-Discuss]Gig E on HP ZX6000 In-Reply-To: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu> References: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu> Message-ID: <Pine.GSO.4.58.0312121650350.4139@leibnitz.cs.uh.edu> Fred, Try flipping the modules in /etc/modules.conf. Flip eth0 with eth1 so that the gige interface comes up as eth0. Or, just turn off eth0 altogether with 'alias eth0 off'. I think thats the right syntax. We have 60 zx6000's and I have personally have never found a way to disable the port. Jason P Holland Texas Learning and Computation Center http://www.tlc2.uh.edu University of Houston Philip G Hoffman Hall rm 207A tel: 713-743-4850 On Fri, 12 Dec 2003, Fred P. Arnold wrote: > Hello, > > I know this is a hardware question, not technically a Rocks one, but I > can't find the answer in my HP manuals: > > On the ZX6000, there are two ethernet ports, a 10/100 basic/management > port, and a 1000 which is designated the primary interface. > Unfortunately, rocks always identifies the 10/100 as eth0. > > Does anyone know how to disable the 10/100 on a ZX6000? On an IA32, I'd > go into the bios, but these don't technically have one. We'd like to run > ours on a pure Gig network. > > Thanks. > > -Fred > > Frederick P. Arnold, Jr. > NUIT, Northwestern U. > f-arnold at northwestern.edu > From jian at appro.com Fri Dec 12 17:27:51 2003 From: jian at appro.com (Jian Chang) Date: Fri, 12 Dec 2003 17:27:51 -0800 Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro Message-ID: <4AE58AD63966B24B99F95CA24C02EB1903414F@hawk.appro.com> Hello Mason / Puru, I got your contact information from Bryan Littlefield.
  • 190.
    I would liketo discuss with you regarding benchmark test systems you might need down the road. We can also share with you our findings as to what is compatible in the Opteron systems. Please reply with your phone number where I can reach you, and I will call promptly. Bryan, Thank you for the referral. Best regards, Jian Chang Regional Sales Manager (408) 941-8100 x 202 (800) 927-5464 x 202 (408) 941-8111 Fax jian at appro.com www.appro.com -----Original Message----- From: Bryan Littlefield [mailto:bryan at UCLAlumni.net] Sent: Tuesday, December 09, 2003 12:14 PM To: npaci-rocks-discussion at sdsc.edu; mjk at sdsc.edu Cc: Jian Chang Subject: Rocks-Discuss] AMD Opteron - Contact Appro Hi Mason, I suggest contacting Appro. We are using Rocks on our Opteron cluster and Appro would likely love to help. I will contact them as well to see if they could help getting a opteron machine for testing. Contact info below: Thanks --Bryan Jian Chang - Regional Sales Manager (408) 941-8100 x 202 (800) 927-5464 x 202 (408) 941-8111 Fax jian at appro.com http://www.appro.com npaci-rocks-discussion-request at sdsc.edu wrote: From: "Mason J. Katz" <mailto:mjk at sdsc.edu> <mjk at sdsc.edu> Subject: Re: [Rocks-Discuss]AMD Opteron Date: Tue, 9 Dec 2003 07:28:51 -0800 To: "purushotham komaravolu" <mailto:purikk at hotmail.com> <purikk at hotmail.com> We have a beta right now that we have sent to a few people. We plan on a release this month, and AMD_64 will be part of this release along with the usual x86, IA64 support. If you want to help accelerate this process please talk to your vendor about loaning/giving us some hardware for testing. Having access to a variety of Opteron hardware (we own two boxes) is the only way we can
  • 191.
    have good supportfor this chip. -mjk On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote: Cc: <mailto:npaci-rocks-discussion at sdsc.edu> <npaci-rocks-discussion at sdsc.edu> Hello, I am a newbie to ROCKS cluster. I wanted to setup clusters on 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel and AMD). I found the 64-bit download for Intel on the website but not for AMD. Does it work for AMD opteron? if not what is the ETA for AMD-64. We are planning to but AMD-64 bit machines shortly, and I would like to volunteer for the beta testing if needed. Thanks Regards, Puru _______________________________________________ npaci-rocks-discussion mailing list npaci-rocks-discussion at sdsc.edu http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion End of npaci-rocks-discussion Digest -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031212/dec7e41b/attachment-0001.html From landman at scalableinformatics.com Sat Dec 13 07:50:02 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Sat, 13 Dec 2003 10:50:02 -0500 Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0 Message-ID: <1071330602.4444.56.camel@protein.scalableinformatics.com> Folks: Finally built the 2.4.23 kernel into an RPM via the RedHat tools. Had to hack up the spec file a bit, but you can see the results at http://scalableinformatics.com/downloads/kernels/2.4.23/ These are 2.4.23 with the 2.4.24-pre1 patch (e.g. xfs is in there, woo hoo!). I had to strip out most of the previous patches as they were incompatible with .23 (and I don't want to spend time forward porting them). The spec file, the sources, etc are released under the normal licenses (GPL). No warranties, use at your own risk, and these are NOT
  • 192.
    official Redhat kernels.Don't ask them for support for these, they won't do it, and they will look at you funny. That said, I had also checked out the cvs tree to start the "Carlson" process :) indicated in the list a few months ago (see https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003533.html) to build a more customized distribution. I got to the Build the boot RPM cd rocks/src/rocks/boot make rpm point, and lo and behold this is what I see ... rm version.mk rm arch rm -f /local/rocks/src/rocks/boot/.rpmmacros rm -f /usr/src/redhat/SOURCES/rocks-boot-3.1.0.tar rm -f /usr/src/redhat/SOURCES/rocks-boot-3.1.0.tar.gz ... Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer has a strong sense of urgency and little time to wait for an operational cluster). I checked out the system from CVS earlier this week. Is there any way to switch the build back to 3.0.0? Or am I really out of luck at this moment??? Clues/hints welcome. These kernels might work, though I don't have a method to try them in the distro yet. They work on the build machine. [root at head root]# uname -a Linux head.public 2.4.23-1 #1 SMP Sat Dec 13 14:41:06 GMT 2003 i686 unknown [root at head root]# rpm -qa | grep -i kernel kernel-2.4.23-1 kernel-BOOT-2.4.23-1 rocks-kernel-3.0.0-0 pvfs-kernel-1.6.0-1 kernel-doc-2.4.23-1 kernel-source-2.4.23-1 kernel-smp-2.4.23-1 The spec file is in the above download section, along with a .src.rpm and other stuff. If anyone does have a clue as to how to build with 3.0.0 given the current cvs, or if there is a tagged set I needed to get, please let me know. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615
  • 193.
    From tim.carlson atpnl.gov Sat Dec 13 08:31:03 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Sat, 13 Dec 2003 08:31:03 -0800 (PST) Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0 In-Reply-To: <1071330602.4444.56.camel@protein.scalableinformatics.com> Message-ID: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> On Sat, 13 Dec 2003, Joe Landman wrote: > That said, I had also checked out the cvs tree to start the "Carlson" > process :) indicated in the list a few months ago (see yikes.. ! :) > > Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer > has a strong sense of urgency and little time to wait for an operational > cluster). I checked out the system from CVS earlier this week. You needed to check out the 3.0.0 tagged version ROCKS_3_0_0_i386 Off thread, but it would seem to me that the numbering scheme for ROCKS got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3 based and the new 3.1 will be RH 3.0 based. Not that it matters. Just curious. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From phil at sdsc.edu Sat Dec 13 08:51:29 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Sat, 13 Dec 2003 08:51:29 -0800 Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0 In-Reply-To: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> References: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> Message-ID: <3FDB4391.4080405@sdsc.edu> Tim Carlson wrote: >On Sat, 13 Dec 2003, Joe Landman wrote: > > > >>That said, I had also checked out the cvs tree to start the "Carlson"
  • 194.
    >>process :) indicatedin the list a few months ago (see >> >> > >yikes.. ! :) > > > >>Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer >>has a strong sense of urgency and little time to wait for an operational >>cluster). I checked out the system from CVS earlier this week. >> >> > >You needed to check out the 3.0.0 tagged version > >ROCKS_3_0_0_i386 > this is correct. > >Off thread, but it would seem to me that the numbering scheme for ROCKS >got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new >3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3 >based and the new 3.1 will be RH 3.0 based. Not that it matters. Just >curious. > I blame Bruno ... We moved to 3.0 because rolls is very different from the way 2.3.2 was put together -- this wasn't a minor change and so a subminor revision number didn't make sense. 3.0 --> 3.1 change from 7.3 to recompiled RHEL, change from PBS as default to SGE as default. .... OK, you could argue that this is also a major change and shouldn't have a minor version #. We didn't want to go from 3.0 to 4.0 for some non-definable reasons :-), but mostly it's that 3.0 and 3.1 feel pretty similar in terms of the way they are put together (with rolls). -P > >Tim > >Tim Carlson >Voice: (509) 376 3423 >Email: Tim.Carlson at pnl.gov >EMSL UNIX System Support > > -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031213/69aa41fa/attachment-0001.html From landman at scalableinformatics.com Sat Dec 13 11:14:51 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Sat, 13 Dec 2003 14:14:51 -0500
  • 195.
    Subject: [Rocks-Discuss]Trying tointegrate a new kernel into 3.0.0 In-Reply-To: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> References: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> Message-ID: <1071342891.4445.58.camel@protein.scalableinformatics.com> Thanks. Magic incantations, and I have the "Carlson" process implemented. Ok, next step is the roll-my-own ... more later On Sat, 2003-12-13 at 11:31, Tim Carlson wrote: > On Sat, 13 Dec 2003, Joe Landman wrote: > > > That said, I had also checked out the cvs tree to start the "Carlson" > > process :) indicated in the list a few months ago (see > > yikes.. ! :) > > > > > Ok... I wanted to rebuild 3.0.0, as I cannot wait for 3.1.0 (my customer > > has a strong sense of urgency and little time to wait for an operational > > cluster). I checked out the system from CVS earlier this week. > > You needed to check out the 3.0.0 tagged version > > ROCKS_3_0_0_i386 > > Off thread, but it would seem to me that the numbering scheme for ROCKS > got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new > 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH 7.3 > based and the new 3.1 will be RH 3.0 based. Not that it matters. Just > curious. > > Tim > > Tim Carlson > Voice: (509) 376 3423 > Email: Tim.Carlson at pnl.gov > EMSL UNIX System Support From wyzhong78 at msn.com Mon Dec 15 00:02:15 2003 From: wyzhong78 at msn.com (zhong wenyu) Date: Mon, 15 Dec 2003 16:02:15 +0800 Subject: [Rocks-Discuss]about add-extra-nic Message-ID: <BAY3-F40JRkRy9Iwgel00056a6d@hotmail.com> Hi,everyone! my compute node'mb is msi 9141,on which there are one 1000M nic and one 100M nic.I plan to use 100M net to control and 1000M for application.so I use 100M switch to connect compute nodes from frontend,a 1000M switch to connect compute nodes each other not include frontend. at my first time to install the compute node,i found it "waiting for dhcp ip information" to long,and ican not finish a installing.i think the 1000M nic must answer for it.so i disabled it in BIOS.after that,the installing worked,the compute nodes appeared. then i want to add the extar nic.i use the command and shoo-node,the compute node go to rebooting(between the rebooting i enabled the nic) ,and come into "waiting for dhcp ip information" again.
  • 196.
    so i disabledit again and restart, the node reinstall all right, finished with no trouble.even i can see the boot message "start eth1....[ok]"!but i can only found error by use "ifconfig eth1" even after i enable the 1000M nic again! thanks and regards! _________________________________________________________________ ?????????????? MSN Messenger: http://messenger.msn.com/cn From Roy.Dragseth at cc.uit.no Mon Dec 15 02:31:51 2003 From: Roy.Dragseth at cc.uit.no (Roy Dragseth) Date: Mon, 15 Dec 2003 11:31:51 +0100 Subject: [Rocks-Discuss]ia64 compute nodes with ia32 frontends? In-Reply-To: <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu> References: <793188FE-D411-11D7-8529-000393C7898E@sdsc.edu> <ddptix48s6.fsf@oecpc11.ucsd.edu> <ddwu930yzp.fsf_-_@oecpc11.ucsd.edu> Message-ID: <200312151131.51410.Roy.Dragseth@cc.uit.no> Hi. I've been running a setup like this for something like this for over a year now, it will not (ever?) work right out of the box due to some kernel problems. rocks-dist --arch ia64 dist will most likely crash a ia32 frontend. The ia32 kernel doesn't like to mount a cramfs image generated on a ia64 machine, it gives me a kernel panic. Here is a rough guide to get this kind of setup going. 1. Setup the ia32 as usual, but allow root write access to /export by inserting "no_root_squash" as an option in /etc/exports. 2. create a "fake" ia64 frontend using one of the ia64 nodes, let it configure eth0 by dhcp an let the ia32 frontend think it is a compute node. 3. on the fake frontend you turn off the nis daemons except ypbind. 4. edit /etc/auto.home to mount /home from the ia32 frontend and restart autofs. 5. on the fake frontend you do a rocks-dist copycd to dump the ia64 dvd into the /home/install. 6. Now you can do a rocks-dist dist on the ia64 box. 7. At last you need a symlink to make the ia32 frontend happy: ln -s enterprise/2.1AW/en/os/ia64 rocks-dist/7.3/en/os/ia64 Now you can boot up your ia64 nodes from the ia32 frontend. After you are confident that your ia64 nodes are installed correctly you can reinstall the ia64 frontend as a regular compute node. Subsequent rocks-dist dist can be run on any ia64 compute node as long as it has the anaconda-runtime and rocks-dist rpms installed. Hope this helps,
  • 197.
    r. -- TheComputer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd at cc.uit.no From Roy.Dragseth at cc.uit.no Mon Dec 15 04:28:15 2003 From: Roy.Dragseth at cc.uit.no (Roy Dragseth) Date: Mon, 15 Dec 2003 13:28:15 +0100 Subject: [Rocks-Discuss]Gig E on HP ZX6000 In-Reply-To: <Pine.GSO.4.58.0312121650350.4139@leibnitz.cs.uh.edu> References: <Pine.GSO.4.33.0312120850030.4235-100000@mercury.chem.northwestern.edu> <Pine.GSO.4.58.0312121650350.4139@leibnitz.cs.uh.edu> Message-ID: <200312151328.15826.Roy.Dragseth@cc.uit.no> I had similar problems on our HP rx2600 boxes and found a way to make the kernel ignore the 100Mb/s NIC by adding this append line in elilo.conf: append="reserve=0xd00,64" See my post https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-October/003483.html for details on how to figure out this parameter. Remark: this has to be modified both in the elilo.conf and elilo-ks.conf in /boot/efi/efi/redhat/. The problem is that cluster-kickstart overwrites these files at every reboot and the setup is hardcoded into the cluster-kickstart executable so you need to figure out a way to work around this. I grabbed cluster-kickstart.c from cvs, did the neccessary mods to it and installed the new one on every compute node. r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd at cc.uit.no From fds at sdsc.edu Mon Dec 15 11:31:01 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 15 Dec 2003 11:31:01 -0800 Subject: [Rocks-Discuss]Trying to integrate a new kernel into 3.0.0 In-Reply-To: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> References: <Pine.GSO.4.44.0312130824450.27600-100000@poincare.emsl.pnl.gov> Message-ID: <37508BEC-2F35-11D8-804D-000393A4725A@sdsc.edu> We did indeed move version scheming. We used to be "Redhat minus 5" so a RH 7.3-based Rocks was called 2.3.x. This became mute when Redhat
  • 198.
    quickly went from8 to 9 to Enterprise 3. So we decided to be selfish and move to 3.0.0 when we made a big internal change (Rolls and the end of monolithic Rocks). 3.1.0 is an minor number revision, which corresponds to how much has changed in the Rocks code, not the underlying Redhat system. A bugfix release would be 3.1.1, etc... We hope this versioning scheme will be more resilient to linux system changes (which are out of our control), while keeping the focus on the Rocks structure. On Dec 13, 2003, at 8:31 AM, Tim Carlson wrote: > Off thread, but it would seem to me that the numbering scheme for ROCKS > got out of whack somewhere. Shouldn't 3.0.0 have been 2.3.3 and the new > 3.1 been 3.0? The reasoning being that the current 3.0.0 is still RH > 7.3 > based and the new 3.1 will be RH 3.0 based. Not that it matters. Just > curious. > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From jlkaiser at fnal.gov Mon Dec 15 11:43:43 2003 From: jlkaiser at fnal.gov (Joseph L. Kaiser) Date: Mon, 15 Dec 2003 13:43:43 -0600 Subject: [Rocks-Discuss]problem forcing a kernel Message-ID: <1071517423.3719.4.camel@ajax.kaisergroup.net> Hi, I am trying to install this kernel: kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following whether I put it in the force directory of my distro or the regular RPMS directory or contrib: During package installation it gives me this: /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be opened. This is due to a missing file, a bad package, or bad media. Press <return> to try again. The file is there. The media is the network. I have installed the package on other systems by hand. Any ideas? Thanks, Joe
  • 199.
    From tmartin atphysics.ucsd.edu Mon Dec 15 15:58:51 2003 From: tmartin at physics.ucsd.edu (Terrence Martin) Date: Mon, 15 Dec 2003 15:58:51 -0800 Subject: [Rocks-Discuss]removing a node from the cluster Message-ID: <3FDE4ABB.6030302@physics.ucsd.edu> How does one go about removing a node from the cluster? Is there a straight forward way to do this? Terrence From ebpeele2 at pams.ncsu.edu Mon Dec 15 16:42:47 2003 From: ebpeele2 at pams.ncsu.edu (Elliot Peele) Date: Mon, 15 Dec 2003 19:42:47 -0500 Subject: [Rocks-Discuss]removing a node from the cluster In-Reply-To: <3FDE4ABB.6030302@physics.ucsd.edu> References: <3FDE4ABB.6030302@physics.ucsd.edu> Message-ID: <1071535367.1871.1.camel@localhost.localdomain> insert-ethers --replace hostname Select compute from the menu then exit insert-ethers. Elliot On Mon, 2003-12-15 at 18:58, Terrence Martin wrote: > How does one go about removing a node from the cluster? Is there a > straight forward way to do this? > > Terrence -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031215/ ebf9581b/attachment-0001.bin From phil at sdsc.edu Mon Dec 15 16:44:29 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Mon, 15 Dec 2003 16:44:29 -0800 Subject: [Rocks-Discuss]removing a node from the cluster In-Reply-To: <3FDE4ABB.6030302@physics.ucsd.edu> References: <3FDE4ABB.6030302@physics.ucsd.edu> Message-ID: <3FDE556D.4040100@sdsc.edu> insert-ethers --replace "compute-0-0" select "compute" from the menu and then hit f1 to exit. This will re-create all of the files that have host names and remove the node (you are essentially replacing the node named "compute-0-0" with the empty set). PBS will likely be unhappy with this change -- If I remember correctly, it has an
  • 200.
    additional file thatit creates when a node is added to the queuing system -- when the node doesn't appear in the host table, it gets cranky. You should look in /opt/OpenPBS/server_priv/nodes to solve this problem -- suppose you want to delete compute-0-0. # qmgr -c "delete node compute-0-0" # insert-ethers --replace "compute-0-0" -P Terrence Martin wrote: > How does one go about removing a node from the cluster? Is there a > straight forward way to do this? > > Terrence -- == Philip Papadopoulos, Ph.D. == Program Director for San Diego Supercomputer Center == Grid and Cluster Computing 9500 Gilman Drive == Ph: (858) 822-3628 University of California, San Diego == FAX: (858) 822-5407 La Jolla, CA 92093-0505 From gotero at linuxprophet.com Mon Dec 15 16:52:23 2003 From: gotero at linuxprophet.com (Glen Otero) Date: Mon, 15 Dec 2003 16:52:23 -0800 Subject: [Rocks-Discuss]removing a node from the cluster In-Reply-To: <1071535367.1871.1.camel@localhost.localdomain> References: <3FDE4ABB.6030302@physics.ucsd.edu> <1071535367.1871.1.camel@localhost.localdomain> Message-ID: <1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com> On Dec 15, 2003, at 4:42 PM, Elliot Peele wrote: > insert-ethers --replace hostname > > Select compute from the menu then exit insert-ethers. Then run: # insert-ethers --update to update the database Check the database entries with:
  • 201.
    # dbreport hosts Glen > >Elliot > > On Mon, 2003-12-15 at 18:58, Terrence Martin wrote: >> How does one go about removing a node from the cluster? Is there a >> straight forward way to do this? >> >> Terrence >> Glen Otero, Ph.D. Linux Prophet 619.917.1772 From landman at scalableinformatics.com Mon Dec 15 17:13:29 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 15 Dec 2003 20:13:29 -0500 Subject: [Rocks-Discuss]removing a node from the cluster In-Reply-To: <1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com> References: <3FDE4ABB.6030302@physics.ucsd.edu> <1071535367.1871.1.camel@localhost.localdomain> <1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com> Message-ID: <3FDE5C39.1030503@scalableinformatics.com> Harumph: rmnode nasty_compute_node insert-ethers --update (rmnode at http://scalableinformatics.com/downloads/rmnode.gz). I thought insert-ethers had a simple version of this. All rmnode is, is a hacked version of one of the other rocks tools. Joe Glen Otero wrote: > > On Dec 15, 2003, at 4:42 PM, Elliot Peele wrote: > >> insert-ethers --replace hostname >> >> Select compute from the menu then exit insert-ethers. > > > Then run: > > # insert-ethers --update > > to update the database >
  • 202.
    > Check thedatabase entries with: > > # dbreport hosts > > Glen > >> >> Elliot >> >> On Mon, 2003-12-15 at 18:58, Terrence Martin wrote: >> >>> How does one go about removing a node from the cluster? Is there a >>> straight forward way to do this? >>> >>> Terrence >>> > Glen Otero, Ph.D. > Linux Prophet > 619.917.1772 -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From csamuel at vpac.org Mon Dec 15 18:06:47 2003 From: csamuel at vpac.org (Chris Samuel) Date: Tue, 16 Dec 2003 13:06:47 +1100 Subject: [Rocks-Discuss]ScalablePBS. In-Reply-To: <B83C8894-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu> References: <200311212352.27000.Roy.Dragseth@cc.uit.no> <B83C8894-2CD7-11D8- A2DC-000A95DA5638@sdsc.edu> Message-ID: <200312161306.55651.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 13 Dec 2003 06:16 am, Mason J. Katz wrote: > This should become the basis of the PBS roll (currently openpbs). We > are seeking developers who would like to help write and maintain this > -- I'm not singling you out Roy, although you would be more than > welcome, rather I'm taking advantage of your message to solicit other > volunteers. Anyone? I think we might be interested in getting involved with this, we migrated from OpenPBS to ScalablePBS some time ago and spent quite a bit of time tracking down memory leaks and the like with DJ and friends at SuperCluster. We've also started using Rocks on a cluster that we manage for one of our member institutions and a colleague of mine is having fun trying to get it to go onto an Itanium cluster at the moment plus we should have some Opteron boxes arriving in a month or so for a mini-cluster which we'd like to run
  • 203.
    Rocks on. Currently weinstall Rocks on the cluster and then remove PBS and MAUI RPM's and install SPBS and the 3.2.6 version of MAUI we have access to, so a version that came with SPBS ready to go would make life a lot simpler for us. :-) cheers! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/3mi3O2KABBYQAh8RAuSLAJ9Bx/5aCF8kRjHFapUpiASQUJeCTwCcD9y7 Y/ZM38t0J8r5dAYj1MdiUWA= =bCIS -----END PGP SIGNATURE----- From bruno at rocksclusters.org Mon Dec 15 18:30:03 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 15 Dec 2003 18:30:03 -0800 Subject: [Rocks-Discuss]removing a node from the cluster In-Reply-To: <3FDE5C39.1030503@scalableinformatics.com> References: <3FDE4ABB.6030302@physics.ucsd.edu> <1071535367.1871.1.camel@localhost.localdomain> <1C2131BE-2F62-11D8-9436-000A95CD8EC8@linuxprophet.com> <3FDE5C39.1030503@scalableinformatics.com> Message-ID: <C13C5DE4-2F6F-11D8-B821-000A95C4E3B4@rocksclusters.org> > Harumph: > > rmnode nasty_compute_node > insert-ethers --update > > (rmnode at http://scalableinformatics.com/downloads/rmnode.gz). > > I thought insert-ethers had a simple version of this. All rmnode is, > is a hacked version of one of the other rocks tools. actually, since v3.0.0, i think it does: http://www.rocksclusters.org/rocks-documentation/3.0.0/faq- configuration.html#REMOVE-NODE - gb From bruno at rocksclusters.org Mon Dec 15 19:40:49 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 15 Dec 2003 19:40:49 -0800 Subject: [Rocks-Discuss]problem forcing a kernel In-Reply-To: <1071517423.3719.4.camel@ajax.kaisergroup.net>
  • 204.
    References: <1071517423.3719.4.camel@ajax.kaisergroup.net> Message-ID: <A3F73894-2F79-11D8-B821-000A95C4E3B4@rocksclusters.org> > I am trying to install this kernel: > > kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following > whether I put it in the force directory of my distro or the regular > RPMS > directory or contrib: > > During package installation it gives me this: > > > /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be > opened. This is due to a missing file, a bad package, or bad media. > Press <return> to try again. > > > The file is there. The media is the network. I have installed the > package on other systems by hand. Any ideas? just to be sure, do you run the following after you copy the RPM into the force directory: # cd /home/install # rocks-dist dist - gb From bruno at rocksclusters.org Mon Dec 15 19:56:51 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 15 Dec 2003 19:56:51 -0800 Subject: [Rocks-Discuss]Adding partitions that are not reformatted under hard boots or shoot-node In-Reply-To: <3FD68B06.9010709@phys.ufl.edu> References: <3FD68B06.9010709@phys.ufl.edu> Message-ID: <E12881B4-2F7B-11D8-B821-000A95C4E3B4@rocksclusters.org> sorry for the late response. i recently tested the manual partitioning procedure on our upcoming release and there was a bug. a fix has been committed for the next release -- so manual partitioning will work on 3.1.0 as explained in the 3.0.0 documentation. - gb On Dec 9, 2003, at 6:55 PM, Jorge L. Rodriguez wrote: > Hi, > > How do I add an extra partition to my compute nodes and retain the > data on all non / partitions when system hard boots or is shot? > I tried the suggestion in the documentation under "Customizing your > ROCKS Installation" where you replace the auto-partition.xml but hard > boots or shoot-nodes on these reformat all partitions instead of just
  • 205.
    > the /. I have also tried to modify the installclass.xml so that an > extra partition is added into the python code see below. This does > mostly what I want but now I can't shoot-node even though a hard boot > reinstalls without reformatting all but /. Is this the right approach? > I'd rather avoid having to replace installclass since I don't really > want to partition all nodes this way but if I must I will. > > Jorge > > # > # set up the root partition > # > args = [ "/" , "--size" , "4096", > "--fstype", "&fstype;", > "--ondisk", devnames[0] ] > KickstartBase.definePartition(self, id, args) > > # ---- Jorge, I added this args > args = [ "/state/partition1" , "--size" , > "55000", > "--fstype", "&fstype;", > "--ondisk", devnames[0] ] > KickstartBase.definePartition(self, id, args) > # ----- > args = [ "swap" , "--size" , "1000", > "--ondisk", devnames[0] ] > KickstartBase.definePartition(self, id, args) > > # > # greedy partitioning > # > # ----- Jorge, I change this from i = 1 > i = 2 > # ----- > for devname in devnames: > partname = "/state/partition%d" % (i) > args = [ partname, "--size", "1", > "--fstype", "&fstype;", > "--grow", "--ondisk", devname ] > KickstartBase.definePartition(self, id, > args) > > i = i + 1 > > > From jlkaiser at fnal.gov Mon Dec 15 20:17:52 2003 From: jlkaiser at fnal.gov (Joseph L. Kaiser) Date: Mon, 15 Dec 2003 22:17:52 -0600 Subject: [Rocks-Discuss]problem forcing a kernel In-Reply-To: <A3F73894-2F79-11D8-B821-000A95C4E3B4@rocksclusters.org> References: <1071517423.3719.4.camel@ajax.kaisergroup.net> <A3F73894-2F79-11D8-B821-000A95C4E3B4@rocksclusters.org> Message-ID: <1071548271.3720.0.camel@ajax.kaisergroup.net> yup
  • 206.
    On Mon, 2003-12-15at 21:40, Greg Bruno wrote: > > I am trying to install this kernel: > > > > kernel-smp-2.4.20-20.XFS1.3.1.i686.rpm and keep getting the following > > whether I put it in the force directory of my distro or the regular > > RPMS > > directory or contrib: > > > > During package installation it gives me this: > > > > > > /mnt/sysimage/var/tmpkernel-smp-2.4.20-20.9.XFS1.3.1.i686.rpm cannot be > > opened. This is due to a missing file, a bad package, or bad media. > > Press <return> to try again. > > > > > > The file is there. The media is the network. I have installed the > > package on other systems by hand. Any ideas? > > just to be sure, do you run the following after you copy the RPM into > the force directory: > > # cd /home/install > # rocks-dist dist > > - gb > From Roy.Dragseth at cc.uit.no Tue Dec 16 02:13:50 2003 From: Roy.Dragseth at cc.uit.no (Roy Dragseth) Date: Tue, 16 Dec 2003 11:13:50 +0100 Subject: [Rocks-Discuss]ScalablePBS. In-Reply-To: <B83C8894-2CD7-11D8-A2DC-000A95DA5638@sdsc.edu> References: <200311212352.27000.Roy.Dragseth@cc.uit.no> <B83C8894-2CD7-11D8- A2DC-000A95DA5638@sdsc.edu> Message-ID: <200312161113.50076.Roy.Dragseth@cc.uit.no> On Friday 12 December 2003 20:16, Mason J. Katz wrote: > This should become the basis of the PBS roll (currently openpbs). We > are seeking developers who would like to help write and maintain this > -- I'm not singling you out Roy, although you would be more than > welcome, rather I'm taking advantage of your message to solicit other > volunteers. Anyone? > I talked to my boss and he gave me thumbs up, so I'll be glad to take care of the Maui/PBS roll of rocks. I'd love to see some more hands in the air as maintainers/testers... r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway.
  • 207.
    phone:+47 77 6441 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd at cc.uit.no From daniel.kidger at quadrics.com Tue Dec 16 07:08:44 2003 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Tue, 16 Dec 2003 15:08:44 +0000 Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) In-Reply-To: <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net> References: <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net> Message-ID: <3FDF1FFC.60501@quadrics.com> Glen et al. >I recently had the same problem when building a quadrics cluster on Rocks 2.3.2 >with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The problem is >definitely in the naming of the rpms, in that anaconda running on the compute >nodes is not going to recognize kernel rpms that begin with 'qsnet' as potential >boot options. Unfortunately, being under a severe time contraint, I resorted to >manually installing the qsnet kernel on all nodes of the cluster, which isn't >the Rocks way. The long term solution is to mangle the kernel makefiles so that >the qsnet kernel rpms have conventional kernel rpm names, which is what Greg's >post referred to. I have been thinking about this. I reckon that the long term solution is *not* to rename the kernel that we use. (nor indeed to change the naming convention of any other kernels that people want to work on). As well as the triplet version numbering and the architecture, the kernel naming that we use includes the kernel source tree (Redhat, Suse, LSY, Vanilia, ..) and our partch level version numering triplet. Quadrics cannot be the only people who need freedom to include extra information in our naming convention for kernels. The solution must lie in either annaconda itself or more likely a cleaner way to include extra kernel(s) as well as the stock one in the compute node install process. Using extend-nodes.xml this works apart from niggles about the /boot/grub/menu.lst that our kernel post-instal;l configures getting clobbered by Rocks. Yours, Daniel. gotero at linuxprophet.com wrote: >Daniel- > > > -- Yours, Daniel.
  • 208.
    -------------------------------------------------------------- Dr. Dan Kidger,Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From mjk at sdsc.edu Tue Dec 16 07:09:56 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 16 Dec 2003 07:09:56 -0800 Subject: [Rocks-Discuss]ScalablePBS. In-Reply-To: <200312161113.50076.Roy.Dragseth@cc.uit.no> References: <200311212352.27000.Roy.Dragseth@cc.uit.no> <B83C8894-2CD7-11D8- A2DC-000A95DA5638@sdsc.edu> <200312161113.50076.Roy.Dragseth@cc.uit.no> Message-ID: <E89F1F82-2FD9-11D8-A2DC-000A95DA5638@sdsc.edu> Fanstastic! I think this puts us at three people that have volunteered to help out on this. I will followup on this and help organize, support, and do some of the development also. But I'm going to push this back until after we get 3.1 out which looks like monday. -mjk On Dec 16, 2003, at 2:13 AM, Roy Dragseth wrote: > On Friday 12 December 2003 20:16, Mason J. Katz wrote: >> This should become the basis of the PBS roll (currently openpbs). We >> are seeking developers who would like to help write and maintain this >> -- I'm not singling you out Roy, although you would be more than >> welcome, rather I'm taking advantage of your message to solicit other >> volunteers. Anyone? >> > > I talked to my boss and he gave me thumbs up, so I'll be glad to take > care of > the Maui/PBS roll of rocks. > > I'd love to see some more hands in the air as maintainers/testers... > > r. > > > -- > > The Computer Center, University of Troms?, N-9037 TROMS? Norway. > phone:+47 77 64 41 07, fax:+47 77 64 41 00 > Roy Dragseth, High Performance Computing System Administrator > Direct call: +47 77 64 62 56. email: royd at cc.uit.no From mjk at sdsc.edu Tue Dec 16 07:37:04 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 16 Dec 2003 07:37:04 -0800 Subject: [Rocks-Discuss]custom-kernels : naming conventions ? (Rocks 3.0.0) In-Reply-To: <3FDF1FFC.60501@quadrics.com> References:
  • 209.
    <20031209180224.24711.h014.c001.wm@mail.linuxprophet.com.criticalpath.net> <3FDF1FFC.60501@quadrics.com> Message-ID: <B3192AFA-2FDD-11D8-A2DC-000A95DA5638@sdsc.edu> If you rename the linux kernel to include other arbitrary strings the RedHat Kickstart installer will not recognize it as a kernel. This means you loose probing for the correct x86 cpu (386/486/585/686) and probing for SMP vs. uni. This implies you would need to re-write the anaconda code to do this for arbitrarily named packages, if you could convince RedHat to do this great, but it's not worth our development time to do this ourselves when properly named kernel packages work wonderfully. The unfortunate reality is the kernel RPM is not just another package -- it has some special installation logic to optimize for you hardware. Sure they could have done this better, but they do a darn good job as is. This is not a Rocks issue, it means you have created a package that does not work with RedHat. I understand why you need to include extra strings in the kernel name, but suggest that there are several alternatives to this that don't break RedHat kickstart. For example, you could: - Write a kernel version module to report on /proc/qsnet_kernel the same information. - Have the kernel RPM install a /usr/doc/qsnet/VERSION file - Have a subpackage of the kernel rpm that include the extra strings (and extra docs). - Stop patching the kernel and only use a module. True some things require kernel patches, but almost all driver changes can go into modules only. This was not always true a few years ago, the module system has improved a lot. We've faced numerous issues like this with RedHat in creating Rocks, and for every issue we have found a work around that keeps us w/in the RedHat way of doing things. This is not always optimal for development but always yields a simpler, and more supportable, system. -mjk On Dec 16, 2003, at 7:08 AM, Dan Kidger wrote: > Glen et al. > >> I recently had the same problem when building a quadrics cluster on >> Rocks 2.3.2 >> with the qsnet-RedHat-kernel-2.4.18-27.3.4qsnet.i686.rpms. The >> problem is >> definitely in the naming of the rpms, in that anaconda running on the >> compute >> nodes is not going to recognize kernel rpms that begin with 'qsnet' >> as potential >> boot options. Unfortunately, being under a severe time contraint, I >> resorted to >> manually installing the qsnet kernel on all nodes of the cluster, >> which isn't
  • 210.
    >> the Rocksway. The long term solution is to mangle the kernel >> makefiles so that >> the qsnet kernel rpms have conventional kernel rpm names, which is >> what Greg's >> post referred to. > > I have been thinking about this. > > I reckon that the long term solution is *not* to rename the kernel > that we use. (nor indeed to change the naming convention of any other > kernels that people want to work on). As well as the triplet version > numbering and the architecture, the kernel naming that we use includes > the kernel source tree (Redhat, Suse, LSY, Vanilia, ..) and our partch > level version numering triplet. > Quadrics cannot be the only people who need freedom to include extra > information in our naming convention for kernels. > The solution must lie in either annaconda itself or more likely a > cleaner way to include extra kernel(s) as well as the stock one in the > compute node install process. Using extend-nodes.xml this works apart > from niggles about the /boot/grub/menu.lst that our kernel > post-instal;l configures getting clobbered by Rocks. > > Yours, > Daniel. > > > gotero at linuxprophet.com wrote: > >> Daniel- >> >> > > -- > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > From dtwright at uiuc.edu Tue Dec 16 11:45:55 2003 From: dtwright at uiuc.edu (Dan Wright) Date: Tue, 16 Dec 2003 13:45:55 -0600 Subject: [Rocks-Discuss]a minor ganglia question Message-ID: <20031216194554.GH26246@uiuc.edu> Hello all, I'm in the process of setting up a 3.0.0 cluster and have a question about the "Physical view" in ganglia. In this view (which is quite cool BTW :) is shows higher-numbered nodes on top and lower-numbered nodes on bottom: compute-0-12 ... compute-0-2
  • 211.
    compute-0-1 compute-0-0 and my clusteris physically reversed from that: compute-0-0 compute-0-1 compute-0-2 ... compute-0-12 Is there an easy way to switch this display around so it matches the real physical layout? I poked around and ganglia for a few minutes and didn't see anything obvious, so I thought I'd ask before I actually start wasting time on this :) Thanks, - Dan Wright (dtwright at uiuc.edu) (http://www.scs.uiuc.edu/) (UNIX Systems Administrator, School of Chemical Sciences, UIUC) (333-1728) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031216/28f3eb5a/attachment-0001.bin From purikk at hotmail.com Tue Dec 16 12:34:51 2003 From: purikk at hotmail.com (Purushotham Komaravolu) Date: Tue, 16 Dec 2003 15:34:51 -0500 Subject: [Rocks-Discuss]hardware-setup for the Rocks cluster References: <200312162016.hBGKGuJ05160@postal.sdsc.edu> Message-ID: <BAY1-DAV575EPSM0omP0000cb94@hotmail.com> Hi All, We are trying to setup rocks cluster with 1 front and 20 computing nodes. Frontend: 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache 2) Dual port Gigabit Ethernet 3) 1 GB DDR RAM 4) 3* 200 GB EIDE ULTRA ATA 100 Compute nodes: 1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache 2) Dual port Gigabit Ethernet 3) 1 GB DDR RAM 4) 41 GB UDMA EIDE 1 HP Procurve 24 port switch Does the setup look ok?
  • 212.
    Does Rocks supportthe following features Remote power monitoring for individual nodes *Temperature monitoring of individual processors *Power sequencing on startup to prevent possible power spiking *Remote power-down and reset of system and nodes *Serial access to nodes *Disk cloning *Plug-In Extensible Architecture *Image Manager and also How should be the disk setup, does all the disks need to be attached to frontend and compute nodes have small 3 or 4 GB disks? Can someone point me to a clustering software which supports all above features if Rocks does'nt support them. thanks a lot Regards, Puru From purikk at hotmail.com Tue Dec 16 12:39:19 2003 From: purikk at hotmail.com (Purushotham Komaravolu) Date: Tue, 16 Dec 2003 15:39:19 -0500 Subject: [Rocks-Discuss]Java Rocks cluster Message-ID: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com> I am a newbie to ROCKS I have a question about running Java on a Rockster. Is it possible that I can start only one JVM on one machine and the task be run distributed on the cluster? It is a multi-threaded application. Like say, I have an application with 100 threads. Can I have 50 threads run on one machine and 50 on another by launching the application(jvm) on one machine?(similar to SUN Firebird) I dont want to use MPI or any special code. Thanks Sincerely Puru -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031216/ee12ac80/attachment-0001.html From mjk at sdsc.edu Tue Dec 16 13:20:24 2003
  • 213.
    From: mjk atsdsc.edu (Mason J. Katz) Date: Tue, 16 Dec 2003 13:20:24 -0800 Subject: [Rocks-Discuss]Java Rocks cluster In-Reply-To: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com> References: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com> Message-ID: <A9849F18-300D-11D8-A2DC-000A95DA5638@sdsc.edu> There are a few research projects that do map java threads onto cluster compute nodes processes. At the IEEE Cluster '03 conference a couple weeks ago in Hong Kong there were a few interesting Java talks on this subject. You can see the schedule at the following link and do some google research for more info. I think the papers will be online soon... http://www.csis.hku.hk/cluster2003/advance-program.html Rocks 3.1 will include a Java Roll, but this is nothing more than Sun's Java sdk/rte and doesn't do any cluster magic for you. -mjk On Dec 16, 2003, at 12:39 PM, Purushotham Komaravolu wrote: > I am a newbie to ROCKS > I have a question about running Java on a?Rockster. > ?Is it possible that I can start only one JVM on one machine and the > ?task?be run distributed on the cluster? It is a multi-threaded > application.? > Like say, I have an application with 100 threads.?Can I have 50 > threads run on one machine and 50 on another by?launching the > application(jvm) on one machine?(similar to SUN Firebird)?I dont want > to use MPI or any?special code. > Thanks > Sincerely > Puru From phil at sdsc.edu Tue Dec 16 13:38:48 2003 From: phil at sdsc.edu (Philip Papadopoulos) Date: Tue, 16 Dec 2003 13:38:48 -0800 Subject: [Rocks-Discuss]hardware-setup for the Rocks cluster In-Reply-To: <BAY1-DAV575EPSM0omP0000cb94@hotmail.com> References: <200312162016.hBGKGuJ05160@postal.sdsc.edu> <BAY1- DAV575EPSM0omP0000cb94@hotmail.com> Message-ID: <3FDF7B68.3030302@sdsc.edu> Purushotham Komaravolu wrote: >Hi All, > We are trying to setup rocks cluster with 1 front and 20 computing >nodes. >Frontend: > 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache > 2) Dual port Gigabit Ethernet > 3) 1 GB DDR RAM > 4) 3* 200 GB EIDE ULTRA ATA 100
  • 214.
    > >Compute nodes: > 1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache > 2) Dual port Gigabit Ethernet > 3) 1 GB DDR RAM > 4) 41 GB UDMA EIDE >1 HP Procurve 24 port switch > > >Does the setup look ok? > Setup looks fine. > > >Does Rocks support the following features >Remote power monitoring for individual nodes > >*Temperature monitoring of individual processors > Not directly -- there isn't a completely general solution to this -- though lmsensors is good for non-server boards. However, nothing prevents you from adding the proper software. It's fairly easy to add metrics to ganglia if you have the baseline drivers for your particular temp monitoring sw. > >*Power sequencing on startup to prevent possible power spiking > >*Remote power-down and reset of system and nodes > >*Serial access to nodes > All of these generally require another network (serial, lights-out management, etc). We don't assume any of these extra networks exist. Again, layering that functionality a top of rocks is very very straightforward. See the FAQ for how to add packages to nodes. > >*Disk cloning > No. Emphatically No. Disk cloning is not anywhere in the rocks vocabulary. We have distributions (Redhat + Rocks + Cluster tools + your own software) and a way to generate a kickstart file in a programatic way. Disk cloning assumes homogeneity of hardware (we don't), requires a custom after market installer to fix up a node after an image is put on a node (we use Redhat as the installer), requires a completely different image for every different functional type of node (frontend, compute, nfs, web, pvfs, etc). > >*Plug-In Extensible Architecture
  • 215.
    > Uh. Yeah. That'sthe whole point. Again see the FAQ of how you add packages. Rolls is an additional extension mechanism that allows you to add larger chunks of functionality at Cluster build time. We extend base rocks with Grid Software, Schedulers, Java, and community-specific software stacks. You should wait (about 5 days) for the final release of 3.1.0 to see how rolls works. > >*Image Manager > Definitely No. There are no images in Rocks. We have distributions and appliance types. A graph description of appliances is melded with distributions to define a complete node. Shared configuration is truly shared. None of that happens with images -- the base software and the configuration are locked together. > >and also > >How should be the disk setup, does all the disks need to be attached to >frontend and compute nodes have small 3 or 4 GB disks? > Nodes must be disk full. Type and size (8GB is probably minimal given the size of Linux these days). You can put as many disks as you want on your frontend and have it double as an NFS server for your cluster (default). You can build other NFS servers easily (and manage them as easily as you do a compute node). > >Can someone point me to a clustering software which supports all above >features if Rocks does'nt support them. > Sorry. Doesn't exist. Pick the things that you can live without today (but would want to add tomorrow). -P > >thanks a lot > >Regards, > >Puru > > > > > >
  • 216.
    -- == Philip Papadopoulos, Ph.D. == Program Director for San Diego Supercomputer Center == Grid and Cluster Computing 9500 Gilman Drive == Ph: (858) 822-3628 University of California, San Diego == FAX: (858) 822-5407 La Jolla, CA 92093-0505 From mjk at sdsc.edu Tue Dec 16 13:38:59 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 16 Dec 2003 13:38:59 -0800 Subject: [Rocks-Discuss]hardware-setup for the Rocks cluster In-Reply-To: <BAY1-DAV575EPSM0omP0000cb94@hotmail.com> References: <200312162016.hBGKGuJ05160@postal.sdsc.edu> <BAY1- DAV575EPSM0omP0000cb94@hotmail.com> Message-ID: <421F6254-3010-11D8-A2DC-000A95DA5638@sdsc.edu> On Dec 16, 2003, at 12:34 PM, Purushotham Komaravolu wrote: > Hi All, > We are trying to setup rocks cluster with 1 front and 20 > computing > nodes. > Frontend: > 1) Dual Pentium Xeon 2.4 GHz PC 533 and 512lk L2 Cache > 2) Dual port Gigabit Ethernet > 3) 1 GB DDR RAM > 4) 3* 200 GB EIDE ULTRA ATA 100 > > Compute nodes: > 1) Pentium Xeon 2.4 GHz PC 533 and 512k L2 Cache > 2) Dual port Gigabit Ethernet > 3) 1 GB DDR RAM > 4) 41 GB UDMA EIDE > 1 HP Procurve 24 port switch > > > Does the setup look ok? Sounds good, if you have device driver issues just wait until next week when 3.1 comes out, this will have a new kernel and more supported hardware. > Does Rocks support the following features > Remote power monitoring for individual nodes Ethernet addressable power strips can be used for this. > *Temperature monitoring of individual processors No, although a ganglia module can be created to do this. The problem is there isn't a common standard out there for *all* hardware right now. > *Power sequencing on startup to prevent possible power spiking
  • 217.
    Ethernet addressable powerstrips can be used for this. > *Remote power-down and reset of system and nodes Yes (using sw). For hw control you would need a remote management board in every node, or ethernet addressable power stips. > *Serial access to nodes No, Rocks using ssh and ethernet for this. But you can add your own serial port concentrator if you need. > *Disk cloning Nope, this doesn't scale in both system and people time. Rocks uses RedHat's Kickstart to build the disk image on each node in a cluster programmatically. This is extremely fast -- in fact a 128 node cluster can be built from scratch (including hardware integration) in under 2 hours, and the entire cluster can be reinstalled in around 12 minutes. We did this as a demonstration of Rock's scalability at SC'03 (we even have a movie of it). > *Plug-In Extensible Architecture Yes. You can add to the cluster database and extend our utilities. Everything is open. > *Image Manager Rocks does not do system imaging. We have a utility called rocks-dist that builds distributions for you. This combined with the XML profile graph gives you what you want here. > How should be the disk setup, does all the disks need to be attached to > frontend and compute nodes have small 3 or 4 GB disks? Buy the smallest modern HD you can for the compute node (4 GB is fine). By default the frontend serves user directories over NFS so you should have more storage on the frontend node. -mjk From landman at scalableinformatics.com Tue Dec 16 13:43:51 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Dec 2003 16:43:51 -0500 Subject: [Rocks-Discuss]Java Rocks cluster In-Reply-To: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com> References: <BAY1-DAV62R0rmTIVvL0000cc3a@hotmail.com> Message-ID: <1071611031.9903.77.camel@squash.scalableinformatics.com> Hi Puru: Java threads are shared memory objects at this moment. You would need to look at thread-migration schemes to layer atop the process, and a distributed shared memory model to handle memory issues. I don't think Java natively supports this, so you will likely have to appeal to some
  • 218.
    other method. Moreover, shared memory across slower cluster network fabrics is painful at best. If you are going to work on a single system image machine with shared memory, you want the fastest/best fabric you can get. If it is easier to re-architect your code to be independent worker processes, you could write it using JVMs and simple sockets or similar. If it is threaded, you may have problems parallelizing it on a cluster. Joe On Tue, 2003-12-16 at 15:39, Purushotham Komaravolu wrote: > I am a newbie to ROCKS > I have a question about running Java on a Rockster. > Is it possible that I can start only one JVM on one machine and the > task be run distributed on the cluster? It is a multi-threaded > application. > Like say, I have an application with 100 threads. Can I have 50 > threads run on one machine and 50 on another by launching the > application(jvm) on one machine?(similar to SUN Firebird) I dont want > to use MPI or any special code. > Thanks > Sincerely > Puru -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From rscarce at caci.com Tue Dec 16 10:56:18 2003 From: rscarce at caci.com (Reed Scarce) Date: Tue, 16 Dec 2003 13:56:18 -0500 Subject: [Rocks-Discuss]grub / boot / fdisk problem Message-ID: <OF2C6AD168.EB3D778E-ON85256DFE.0067CF1C-85256DFE.006812B4@caci.com> I installed Rocks on a primary master hard drive. It became necessary to re-install I took an identical hd and made it primary master. The first drive, which boots fine, was left off the system to act as an archive, to mount after the new system was up and running. The new system was installed and works great, now to correctly install the old drive as primary slave, reboot, mount and copy the scripts and configs to the new system! There the problem began. When I boot either drive as primary master and only primary drive, they boot fine. When I connect either drive correctly configured and recognized by the BIOS, as primary or secondary slave - grub gives a GRUB prompt and won't boot. Something interesting, when booted from a floppy (mkbootdisk)from the new disk, in /var/log/dmesg both drives are visible but fdisk reports the partition table is empty - so I can't mount the drive from a floppy boot.
  • 219.
    dmesg is likethis: (my comments) hda: ST34321A, ... (pri master) hdb: ST34321A, ... (pri slave) hdc: FX4010M, ATAPI CD/DVD-ROM drive (secnd master) hdd: ST320420A, ... (secnd slave) ide0 at ... (ide pri chain) ide1 at ... (ide secnd chain) hda: 8404830 sectors ... (good) hdb: 8404830 sectors ... (good) hdd: 39851760 sectors ... (good) ide-floppy driver ... (ok) Partition check: (<---<<<this is where it gets interesting) hda: hdb: hdd: hdd1 hdd2 hdd3 (<---<<<that's right, hdd is now the boot drive. Even if I boot without the floppy, hdd is the boot drive.) Any suggestons? Reed Scarce Systems Engineer CACI, Inc. 1100 N. Glebe Rd Arlington, VA 22201 (703) 841-3045 -------------- next part -------------- An HTML attachment was scrubbed... URL: https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031216/498124c7/attachment-0001.html From ShiYi.Yue at astrazeneca.com Tue Dec 16 14:05:46 2003 From: ShiYi.Yue at astrazeneca.com (ShiYi.Yue at astrazeneca.com) Date: Tue, 16 Dec 2003 23:05:46 +0100 Subject: [Rocks-Discuss]hardware compability check wirh Rocks 3.00 Message-ID: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net> hi, I was wondering if there is a way to set a hardware compability check in the kickstart of Rocks, and give us an oppotunity to add the drvers once the uncompatible hardware was detected. I have some PCs with Broadcom Gbit 10/100/1000 network cards, It looks Rocks 3.0 was not happy with these network cards. The only way I can do now (without rebuild the distribution) is to replace these cards. I am afraid this type of situation will happen again and again since RH7.3 is getting older and older. I hope I were wrong and someone can point me a solution. Shi-Yi shiyi.yue at astrazeneca.com From mjk at sdsc.edu Tue Dec 16 14:55:38 2003 From: mjk at sdsc.edu (Mason J. Katz)
  • 220.
    Date: Tue, 16Dec 2003 14:55:38 -0800 Subject: [Rocks-Discuss]hardware compability check wirh Rocks 3.00 In-Reply-To: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net> References: <D2A2B86E8730D711B8560008028AC980257A2E@camrd9.camrd.astrazeneca.net> Message-ID: <F7910D2D-301A-11D8-A2DC-000A95DA5638@sdsc.edu> We've been thinking about this off and on for over a year -- it's a pretty hard problem. The real trick to supporting all hardware is keeping the boot kernel current. We've let our releases get old and more and more people are seeing hardware support issues. Rocks 3.1 (out next week) will include the latest RedHat kernel from RHEL 3.0. This will fix most of the hardware support issues out there. When we release please download 3.1 and try it with you hardware, if this still fails please let us know. Thanks. -mjk On Dec 16, 2003, at 2:05 PM, ShiYi.Yue at astrazeneca.com wrote: > hi, > > I was wondering if there is a way to set a hardware compability check > in the > kickstart of Rocks, and give us an oppotunity to add the drvers once > the > uncompatible hardware was detected. > > I have some PCs with Broadcom Gbit 10/100/1000 network cards, It looks > Rocks > 3.0 was not happy with these network cards. The only way I can do now > (without rebuild the distribution) is to replace these cards. I am > afraid > this type of situation will happen again and again since RH7.3 is > getting > older and older. > I hope I were wrong and someone can point me a solution. > Shi-Yi > shiyi.yue at astrazeneca.com From msherman at informaticscenter.info Tue Dec 16 16:25:45 2003 From: msherman at informaticscenter.info (Mark Sherman) Date: Tue, 16 Dec 2003 17:25:45 -0700 Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro Message-ID: <20031217002545.17912.qmail@webmail-2-2.mesa1.secureserver.net> Hello, I'm an administrator on a pure i386 cluster under Rocks 3.0.0, and our clients are pushing us to include some Opteron nodes. I'm trying to find out the feasibility of such an addition. I know there's been a lot of talk about Opterons on the rocks list, so I'm wondering if someone can give a boiled-down can-do can't-do maybe-but- we-haven't-tested-it-yet kind of status. With that, I'd say I'm probaly willing to be a pseudo-beta site and give feedback on how the system works. Thank you very much, and keep up the good work. I love the Rocks system. ~M
  • 221.
    ______________________________________________ Mark Sherman Computing SystemsAdministrator Informatics Center Massachusetts Biomedical Initiatives Worcester MA 01605 508-797-4200 msherman at informaticscenter.info ----------------------~----------------------- > -------- Original Message -------- > Subject: [Rocks-Discuss]RE: Rocks-Discuss] AMD Opteron - Contact Appro > From: "Jian Chang" <jian at appro.com> > Date: Fri, December 12, 2003 6:27 pm > To: "Bryan Littlefield" <bryan at UCLAlumni.net>, > npaci-rocks-discussion at sdsc.edu, mjk at sdsc.edu > > Hello Mason / Puru, > > I got your contact information from Bryan Littlefield. > I would like to discuss with you regarding benchmark test systems you > might need down the road. > We can also share with you our findings as to what is compatible in the > Opteron systems. > Please reply with your phone number where I can reach you, and I will > call promptly. > > Bryan, > > Thank you for the referral. > > Best regards, > > Jian Chang > Regional Sales Manager > (408) 941-8100 x 202 > (800) 927-5464 x 202 > (408) 941-8111 Fax > jian at appro.com > www.appro.com > > -----Original Message----- > From: Bryan Littlefield [mailto:bryan at UCLAlumni.net] > Sent: Tuesday, December 09, 2003 12:14 PM > To: npaci-rocks-discussion at sdsc.edu; mjk at sdsc.edu > Cc: Jian Chang > Subject: Rocks-Discuss] AMD Opteron - Contact Appro > > Hi Mason, > > I suggest contacting Appro. We are using Rocks on our Opteron cluster > and Appro would likely love to help. I will contact them as well to see > if they could help getting a opteron machine for testing. Contact info > below: > > Thanks --Bryan > > Jian Chang - Regional Sales Manager
  • 222.
    > (408) 941-8100 x 202 > (800) 927-5464 x 202 > (408) 941-8111 Fax > jian at appro.com > http://www.appro.com > > npaci-rocks-discussion-request at sdsc.edu wrote: > > > From: "Mason J. Katz" <mailto:mjk at sdsc.edu> <mjk at sdsc.edu> > Subject: Re: [Rocks-Discuss]AMD Opteron > Date: Tue, 9 Dec 2003 07:28:51 -0800 > To: "purushotham komaravolu" <mailto:purikk at hotmail.com> > <purikk at hotmail.com> > > We have a beta right now that we have sent to a few people. We plan on > > a release this month, and AMD_64 will be part of this release along > with the usual x86, IA64 support. > > If you want to help accelerate this process please talk to your vendor > > about loaning/giving us some hardware for testing. Having access to a > > variety of Opteron hardware (we own two boxes) is the only way we can > have good support for this chip. > > -mjk > > > On Dec 8, 2003, at 8:23 PM, purushotham komaravolu wrote: > > > Cc: <mailto:npaci-rocks-discussion at sdsc.edu> > <npaci-rocks-discussion at sdsc.edu> > > > Hello, > I am a newbie to ROCKS cluster. I wanted to setup clusters > > on > 32-bit Architectures( Intel and AMD) and 64-bit Architecture( Intel > and > AMD). > I found the 64-bit download for Intel on the website but not for AMD. > Does > it work for AMD opteron? if not what is the ETA for AMD-64. > We are planning to but AMD-64 bit machines shortly, and I would like > to > volunteer for the beta testing if needed. > Thanks > Regards, > Puru > > > _______________________________________________ > npaci-rocks-discussion mailing list > npaci-rocks-discussion at sdsc.edu > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
  • 223.
    > > > End ofnpaci-rocks-discussion Digest From fds at sdsc.edu Tue Dec 16 18:04:47 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Tue, 16 Dec 2003 18:04:47 -0800 Subject: [Rocks-Discuss]a minor ganglia question In-Reply-To: <20031216194554.GH26246@uiuc.edu> References: <20031216194554.GH26246@uiuc.edu> Message-ID: <63C818CD-3035-11D8-8652-000393A4725A@sdsc.edu> Dan, Good question. Unfortunately this behavior is hardwired into stock Ganglia, not the Rocks-specific pages that we have more control over. The good news is that I wrote the code for this page :) Its easy to fix if you would like to do it yourself. Edit the file /var/www/html/ganglia/functions.php. On line 386, you should see: krsort($racks[$rack]); To get the ordering you desire, change this to: ksort($racks[$rack]); Thats it. You should see the high-numbered compute nodes at the bottom of the rack. I will see if we can get a config file button on the page to give this option for a later release of Ganglia. -Federico On Dec 16, 2003, at 11:45 AM, Dan Wright wrote: > Hello all, > > I'm in the process of setting up a 3.0.0 cluster and have a question > about the > "Physical view" in ganglia. In this view (which is quite cool BTW :) > is shows > higher-numbered nodes on top and lower-numbered nodes on bottom: > > compute-0-12 > ... > compute-0-2 > compute-0-1 > compute-0-0 > > and my cluster is physically reversed from that: > > compute-0-0 > compute-0-1 > compute-0-2 > ... > compute-0-12
  • 224.
    > > Is therean easy way to switch this display around so it matches the > real > physical layout? I poked around and ganglia for a few minutes and > didn't see > anything obvious, so I thought I'd ask before I actually start wasting > time on > this :) > > Thanks, > > - Dan Wright > (dtwright at uiuc.edu) > (http://www.scs.uiuc.edu/) > (UNIX Systems Administrator, School of Chemical Sciences, UIUC) > (333-1728) > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From csamuel at vpac.org Tue Dec 16 18:49:22 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 17 Dec 2003 13:49:22 +1100 Subject: [Rocks-Discuss]a minor ganglia question In-Reply-To: <20031216194554.GH26246@uiuc.edu> References: <20031216194554.GH26246@uiuc.edu> Message-ID: <200312171349.24485.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 17 Dec 2003 06:45 am, Dan Wright wrote: > Is there an easy way to switch this display around so it matches the real > physical layout? I think this is why they tell you to install the compute nodes from the bottom of the rack. :-) cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/38QyO2KABBYQAh8RAo+vAJ0XcP6tBJpwjxYnicEQkysRslWmmQCcDpeb K8bNCLgiF5umMiJ/59ICN70= =57YJ -----END PGP SIGNATURE-----
  • 225.
    From hermanns attupi.dmt.upm.es Wed Dec 17 00:08:19 2003 From: hermanns at tupi.dmt.upm.es (Miguel Hermanns) Date: Wed, 17 Dec 2003 09:08:19 +0100 Subject: [Rocks-Discuss]Creation of a hardware compatibility list? Message-ID: <3FE00EF3.4020809@tupi.dmt.upm.es> Since one of the strong features of Rocks is the posibility of fast deployment of clusters, wouldn't it be of interest to create a hardware compatibility list on the web page of Rocks? This list could be filled in by the users of Rocks with their experience and the hardware they have. In this way somebody interested in building a cluster as fast as possible could check the list and buy something absolutely 100% compatible with Rocks. I know that in principle one could check the compatibility list of RH, but my own experience was negative in that aspect (I installed an Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was unable to recognize it). Miguel From mjk at sdsc.edu Wed Dec 17 09:03:00 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 17 Dec 2003 09:03:00 -0800 Subject: [Rocks-Discuss]Creation of a hardware compatibility list? In-Reply-To: <3FE00EF3.4020809@tupi.dmt.upm.es> References: <3FE00EF3.4020809@tupi.dmt.upm.es> Message-ID: <DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu> We have thought about this, and have some ideas on how to setup a useful page. Something like the old Linux laptop hardware list but simpler to mine for data. It's been on our long list of things to do for a while now :) -mjk On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote: > Since one of the strong features of Rocks is the posibility of fast > deployment of clusters, wouldn't it be of interest to create a > hardware compatibility list on the web page of Rocks? This list could > be filled in by the users of Rocks with their experience and the > hardware they have. In this way somebody interested in building a > cluster as fast as possible could check the list and buy something > absolutely 100% compatible with Rocks. > > I know that in principle one could check the compatibility list of RH, > but my own experience was negative in that aspect (I installed an > Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was > unable to recognize it). > > Miguel >
  • 226.
    From junkscarce athotmail.com Wed Dec 17 09:31:21 2003 From: junkscarce at hotmail.com (Reed Scarce) Date: Wed, 17 Dec 2003 17:31:21 +0000 Subject: [Rocks-Discuss]fidsk reports all zeros, need actual Message-ID: <BAY1-F978XKPl5GDrPi0003db4e@hotmail.com> Good ol' fdisk "print" on my compute node give me a line: Device Boot Start End Blocks Id System but no data. Extra Functionality's "print" reports Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID 1 00 0 0 0 0 0 0 0 0 0 2 00 0 0 0 0 0 0 0 0 0 3 00 0 0 0 0 0 0 0 0 0 4 00 0 0 0 0 0 0 0 0 0 How can I retrieve the information necessary for scripted information at node installation time? TIA --RRS _________________________________________________________________ Enjoy the holiday season with great tips from MSN. http://special.msn.com/network/happyholidays.armx From dtwright at uiuc.edu Wed Dec 17 11:49:53 2003 From: dtwright at uiuc.edu (Dan Wright) Date: Wed, 17 Dec 2003 13:49:53 -0600 Subject: [Rocks-Discuss]a minor ganglia question In-Reply-To: <200312171349.24485.csamuel@vpac.org> References: <20031216194554.GH26246@uiuc.edu> <200312171349.24485.csamuel@vpac.org> Message-ID: <20031217194953.GS26246@uiuc.edu> Eh...whatever ;-) I started using rocks with 2.2.1 (when there was no physical layout display) and haven't read the manual again since :) Chris Samuel said: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Wed, 17 Dec 2003 06:45 am, Dan Wright wrote: > > > Is there an easy way to switch this display around so it matches the real > > physical layout? > > I think this is why they tell you to install the compute nodes from the bottom > of the rack. :-) > > cheers, > Chris > - -- > Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin > Victorian Partnership for Advanced Computing http://www.vpac.org/ > Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
  • 227.
    > > -----BEGIN PGPSIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/38QyO2KABBYQAh8RAo+vAJ0XcP6tBJpwjxYnicEQkysRslWmmQCcDpeb > K8bNCLgiF5umMiJ/59ICN70= > =57YJ > -----END PGP SIGNATURE----- > - Dan Wright (dtwright at uiuc.edu) (http://www.uiuc.edu/~dtwright) -] ------------------------------ [-] -------------------------------- [- ``Weave a circle round him thrice, / And close your eyes with holy dread, For he on honeydew hath fed, / and drunk the milk of Paradise.'' Samuel Taylor Coleridge, Kubla Khan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20031217/ a3718aef/attachment-0001.bin From dtwright at uiuc.edu Wed Dec 17 11:51:00 2003 From: dtwright at uiuc.edu (Dan Wright) Date: Wed, 17 Dec 2003 13:51:00 -0600 Subject: [Rocks-Discuss]a minor ganglia question In-Reply-To: <63C818CD-3035-11D8-8652-000393A4725A@sdsc.edu> References: <20031216194554.GH26246@uiuc.edu> <63C818CD-3035-11D8-8652-000393A4725A@sdsc.edu> Message-ID: <20031217195100.GT26246@uiuc.edu> Federico, Thanks! That'll make this easy enough... maybe next time I'll read the manual and install the machines in the rocks-recommended order as another poster suggested :) Federico Sacerdoti said: > Dan, > > Good question. Unfortunately this behavior is hardwired into stock > Ganglia, not the Rocks-specific pages that we have more control over. > > The good news is that I wrote the code for this page :) Its easy to fix > if you would like to do it yourself. > > Edit the file /var/www/html/ganglia/functions.php. On line 386, you > should see: > > krsort($racks[$rack]); > > To get the ordering you desire, change this to: > > ksort($racks[$rack]); >
  • 228.
    > Thats it.You should see the high-numbered compute nodes at the bottom > of the rack. I will see if we can get a config file button on the page > to give this option for a later release of Ganglia. > > -Federico > > On Dec 16, 2003, at 11:45 AM, Dan Wright wrote: > > >Hello all, > > > >I'm in the process of setting up a 3.0.0 cluster and have a question > >about the > >"Physical view" in ganglia. In this view (which is quite cool BTW :) > >is shows > >higher-numbered nodes on top and lower-numbered nodes on bottom: > > > >compute-0-12 > >... > >compute-0-2 > >compute-0-1 > >compute-0-0 > > > >and my cluster is physically reversed from that: > > > >compute-0-0 > >compute-0-1 > >compute-0-2 > >... > >compute-0-12 > > > >Is there an easy way to switch this display around so it matches the > >real > >physical layout? I poked around and ganglia for a few minutes and > >didn't see > >anything obvious, so I thought I'd ask before I actually start wasting > >time on > >this :) > > > >Thanks, > > > >- Dan Wright > >(dtwright at uiuc.edu) > >(http://www.scs.uiuc.edu/) > >(UNIX Systems Administrator, School of Chemical Sciences, UIUC) > >(333-1728) > > > Federico > > Rocks Cluster Group, San Diego Supercomputing Center, CA > - Dan Wright (dtwright at uiuc.edu) (http://www.uiuc.edu/~dtwright) -] ------------------------------ [-] -------------------------------- [- ``Weave a circle round him thrice, / And close your eyes with holy dread, For he on honeydew hath fed, / and drunk the milk of Paradise.'' Samuel Taylor Coleridge, Kubla Khan -------------- next part --------------
  • 229.
    A non-text attachmentwas scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : https://lists.sdsc.edu/pipermail/npaci-rocks- discussion/attachments/20031217/620937b3/attachment-0001.bin From bruno at rocksclusters.org Wed Dec 17 12:52:30 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 17 Dec 2003 12:52:30 -0800 Subject: [Rocks-Discuss]fidsk reports all zeros, need actual In-Reply-To: <BAY1-F978XKPl5GDrPi0003db4e@hotmail.com> References: <BAY1-F978XKPl5GDrPi0003db4e@hotmail.com> Message-ID: <EDF0DAE8-30D2-11D8-B821-000A95C4E3B4@rocksclusters.org> > Good ol' fdisk "print" on my compute node give me a line: > Device Boot Start End Blocks Id System > > but no data. > > Extra Functionality's "print" reports > Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID > 1 00 0 0 0 0 0 0 0 0 0 > 2 00 0 0 0 0 0 0 0 0 0 > 3 00 0 0 0 0 0 0 0 0 0 > 4 00 0 0 0 0 0 0 0 0 0 > > How can I retrieve the information necessary for scripted information > at node installation time? this should answer your question: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-February/ 001388.html - gb From anand at novaglobal.com.sg Wed Dec 17 20:14:45 2003 From: anand at novaglobal.com.sg (Anand Vaidya) Date: Wed, 17 Dec 2003 23:14:45 -0500 Subject: [Rocks-Discuss]Creation of a hardware compatibility list? In-Reply-To: <DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu> References: <3FE00EF3.4020809@tupi.dmt.upm.es> <DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu> Message-ID: <200312172314.48434.anand@novaglobal.com.sg> Why not create a Wiki? Wiki is easy enough to install (60seconds?) and just the right tool for user-driven projects like Rocks. Nice example of wiki wiki webs are http://en.wikipedia.org/ or even my favourite GentooServer project has a very nice wiki at http:// www.subverted.net/wakka/wakka.php?wakka=MainPage (Though not related to clustering) Regards, Anand
  • 230.
    On Wednesday 17December 2003 12:03, Mason J. Katz wrote: > We have thought about this, and have some ideas on how to setup a > useful page. Something like the old Linux laptop hardware list but > simpler to mine for data. It's been on our long list of things to do > for a while now :) > > -mjk > > On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote: > > Since one of the strong features of Rocks is the posibility of fast > > deployment of clusters, wouldn't it be of interest to create a > > hardware compatibility list on the web page of Rocks? This list could > > be filled in by the users of Rocks with their experience and the > > hardware they have. In this way somebody interested in building a > > cluster as fast as possible could check the list and buy something > > absolutely 100% compatible with Rocks. > > > > I know that in principle one could check the compatibility list of RH, > > but my own experience was negative in that aspect (I installed an > > Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was > > unable to recognize it). > > > > Miguel - From mjk at sdsc.edu Thu Dec 18 08:02:14 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Thu, 18 Dec 2003 08:02:14 -0800 Subject: [Rocks-Discuss]Creation of a hardware compatibility list? In-Reply-To: <200312172314.48434.anand@novaglobal.com.sg> References: <3FE00EF3.4020809@tupi.dmt.upm.es> <DEEE58E0-30B2-11D8-9543-000A95DA5638@sdsc.edu> <200312172314.48434.anand@novaglobal.com.sg> Message-ID: <8BA1598E-3173-11D8-9543-000A95DA5638@sdsc.edu> I've been thinking about a rocks wiki for a few months now, but I'm a bit paranoid about the lack of authentication for updates (basically anyone can modify your site). If there is interest out there, we could just set one up, leave it alone, and let our users worry about the content. Done well this could have information on: - hardware issues - bugs reports - feature requests - contributed documentation (to be moved into our users manual) - etc Basically a simple version of sourceforge (we have no plans to move to sourceforge -- the interface and bandwidth both stink). Ideas....? -mjk On Dec 17, 2003, at 8:14 PM, Anand Vaidya wrote:
  • 231.
    > Why notcreate a Wiki? Wiki is easy enough to install (60seconds?) and > just > the right tool for user-driven projects like Rocks. > > Nice example of wiki wiki webs are http://en.wikipedia.org/ or even my > favourite GentooServer project has a very nice wiki at http:// > www.subverted.net/wakka/wakka.php?wakka=MainPage (Though not related to > clustering) > > Regards, > Anand > > On Wednesday 17 December 2003 12:03, Mason J. Katz wrote: >> We have thought about this, and have some ideas on how to setup a >> useful page. Something like the old Linux laptop hardware list but >> simpler to mine for data. It's been on our long list of things to do >> for a while now :) >> >> -mjk >> >> On Dec 17, 2003, at 12:08 AM, Miguel Hermanns wrote: >>> Since one of the strong features of Rocks is the posibility of fast >>> deployment of clusters, wouldn't it be of interest to create a >>> hardware compatibility list on the web page of Rocks? This list could >>> be filled in by the users of Rocks with their experience and the >>> hardware they have. In this way somebody interested in building a >>> cluster as fast as possible could check the list and buy something >>> absolutely 100% compatible with Rocks. >>> >>> I know that in principle one could check the compatibility list of >>> RH, >>> but my own experience was negative in that aspect (I installed an >>> Adaptec IDE RAID controller, supported by RH7.3, but Rocks 2.3 was >>> unable to recognize it). >>> >>> Miguel > > - From hermanns at tupi.dmt.upm.es Fri Dec 19 00:47:11 2003 From: hermanns at tupi.dmt.upm.es (Miguel Hermanns) Date: Fri, 19 Dec 2003 09:47:11 +0100 Subject: [Rocks-Discuss]Creation of a hardware compatibility list? Message-ID: <3FE2BB0F.4060908@tupi.dmt.upm.es> //>>I've been thinking about a rocks wiki for a few months now, but I'm a bit paranoid about the lack of authentication for updates (basically anyone can modify your site).<< One possible filter could be that only the users of the registered clusters can modify the wiki (So that when you summit the data of the cluster you also include a user and a password), although in that case I would be excluded, since our cluster has been unable to work with Rocks yet :-(. >> - hardware issues
  • 232.
    >> - bugs reports >> - feature requests >> - contributed documentation (to be moved into our users manual) >> - etc So for example the cluster register could be editable by the registered users (each one only its entry) and could include a description of the installed hardware (not just the processor, but also the motherboard model, the hard disks, NICs, etc). So everybody interested in building a cluster could go to the register, have a look and click on the different clusters that are similar to the one in mind. After that with just a click the user could review the hardware configuration and the encountered problems. This would also be greate if the Rocks clusters get updated, because then their builders could go and update their entry without needing to summit an email to the Rocks team, hence avoinding giving them extra work. In order to include the not yet working Rocks clusters, the database of clusters (with the corresponding users and passwords) could be extended by them, but their entries would not be shown on the Rocks register until they are fully working. In this way information on the hardware incompatibilities can be collected and could be shown on a different part of www.rocksclusters.org. The feature requests would still be handled through the maillist and for the contributed documentation I would place the sourcefiles in readonly mode on the ftp server and if somebody goes and makes modifications on them, then the new version should be emailed to the persons in charge of the docs to give their approval. Miguel From jkreuzig at uci.edu Fri Dec 19 16:58:58 2003 From: jkreuzig at uci.edu (James Kreuziger) Date: Fri, 19 Dec 2003 16:58:58 -0800 (PST) Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <1062015636.6781.100.camel@babylon.physics.ncsu.edu> References: <1062015636.6781.100.camel@babylon.physics.ncsu.edu> Message-ID: <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu> Ok, I need some help here. I've managed to setup my frontend node, and it is up and running. I have my 8 nodes all connected up to a Dell Power Connect 5224. I can access the switch through a serial terminal and get a command line interface. The little lights on the front of the switch are blinking, so that's good. However, I can't get the switch recognized by insert-ethers. I've even managed to change the IP of the switch through the CLI, but I can't see the switch from the frontend node. I can't telnet, get the web interface or anything. I haven't saved the configuration, so a reboot of the switch will reset the values. I'm grasping at straws here. I'm not a network engineer, so I could use some help getting this thing configured.
  • 233.
    If anybody canhelp me out, contact me by email. Thanks, -Jim ************************************************* Jim Kreuziger jkreuzig at uci.edu 949-824-4474 ************************************************* From tim.carlson at pnl.gov Fri Dec 19 17:24:22 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Fri, 19 Dec 2003 17:24:22 -0800 (PST) Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu> Message-ID: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> On Fri, 19 Dec 2003, James Kreuziger wrote: I think we need a Rocks FAQ https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-August/002762.html You need to turn on fast-link. > Ok, I need some help here. I've managed to setup > my frontend node, and it is up and running. I have > my 8 nodes all connected up to a Dell Power Connect 5224. > I can access the switch through a serial terminal and > get a command line interface. The little lights on the > front of the switch are blinking, so that's good. > > However, I can't get the switch recognized by insert-ethers. > I've even managed to change the IP of the switch through > the CLI, but I can't see the switch from the frontend node. > I can't telnet, get the web interface or anything. I haven't > saved the configuration, so a reboot of the switch will > reset the values. > > I'm grasping at straws here. I'm not a network engineer, > so I could use some help getting this thing configured. > > If anybody can help me out, contact me by email. > > Thanks, > > -Jim > > ************************************************* > Jim Kreuziger > jkreuzig at uci.edu > 949-824-4474 > *************************************************
  • 234.
    > > > Tim Carlson Voice: (509)376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From Georgi.Kostov at umich.edu Fri Dec 19 17:34:15 2003 From: Georgi.Kostov at umich.edu (Georgi Kostov) Date: Fri, 19 Dec 2003 20:34:15 -0500 Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu> References: <1062015636.6781.100.camel@babylon.physics.ncsu.edu> <Pine.GSO.4.58.0312191642260.19504@massun.ucicom.uci.edu> Message-ID: <1071884055.3fe3a717b3efc@carrierpigeon.mail.umich.edu> Jim, I have a 5224 here. What are your config settings on the switch? I.e. IP, sub-net mask, gateway settings - for both the switch and the interface of the head-node on which the 5224 is connected (I assume it's on the private subnet, so the subnet is something like 10.0.0.0/255.0.0.0 with the frontend internal interface (eth0) as 10.0.1.1, right?) One thing to try on the head node is use (as root) "tcpdump eth0", and watch for packets. To avoid clutter, I would either turn the rest (compute nodes, etc.) off, or filter them out with settings on tcpdump. With some more info we should be able to tease this out. --Georgi Michigan Center for Biological Information (MCBI) University of Michigan 3600 Green Court, Suite 700 Ann Arbor, MI 48105-1570 Phone/Fax: (734) 998-9236/8571 kostov at umich.edu www.ctaalliance.org Quoting James Kreuziger <jkreuzig at uci.edu>: > Ok, I need some help here. I've managed to setup > my frontend node, and it is up and running. I have > my 8 nodes all connected up to a Dell Power Connect 5224. > I can access the switch through a serial terminal and > get a command line interface. The little lights on the > front of the switch are blinking, so that's good. > > However, I can't get the switch recognized by insert-ethers. > I've even managed to change the IP of the switch through > the CLI, but I can't see the switch from the frontend node. > I can't telnet, get the web interface or anything. I haven't
  • 235.
    > saved the configuration, so a reboot of the switch will > reset the values. > > I'm grasping at straws here. I'm not a network engineer, > so I could use some help getting this thing configured. > > If anybody can help me out, contact me by email. > > Thanks, > > -Jim > > ************************************************* > Jim Kreuziger > jkreuzig at uci.edu > 949-824-4474 > ************************************************* > > > From daniel.kidger at quadrics.com Mon Dec 22 01:45:47 2003 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Mon, 22 Dec 2003 09:45:47 +0000 Subject: Fwd: Re: [Rocks-Discuss]Dell Power Connect 5224 Message-ID: <200312220945.47665.daniel.kidger@quadrics.com> ---------- Forwarded Message ---------- Subject: Re: [Rocks-Discuss]Dell Power Connect 5224 Date: Mon, 22 Dec 2003 09:38:41 +0000 From: Dan Kidger <daniel.kidger at quadrics.com> To: Georgi Kostov <Georgi.Kostov at umich.edu> Cc: paci-rocks-discussion at sdsc.edu > Quoting James Kreuziger <jkreuzig at uci.edu>: > > Ok, I need some help here. I've managed to setup > > my frontend node, and it is up and running. I have > > my 8 nodes all connected up to a Dell Power Connect 5224. > > I can access the switch through a serial terminal and > > get a command line interface. The little lights on the > > front of the switch are blinking, so that's good. > > > > However, I can't get the switch recognized by insert-ethers. > > I've even managed to change the IP of the switch through > > the CLI, but I can't see the switch from the frontend node. > > I can't telnet, get the web interface or anything. I haven't > > saved the configuration, so a reboot of the switch will > > reset the values. I don't know much about the 5224 per se. but I do know that much of the time emebedded devices *have* to be rebooted to pick up new settings for their IP. once done - I would try pinging the switch's IP abnd then doing 'arp -a' to see its MAC address (which should match that on the white sticky label on the back)
  • 236.
    Daniel. -------------------------------------------------------------- Dr. Dan Kidger,Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- ------------------------------------------------------- -- Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From daniel.kidger at quadrics.com Mon Dec 22 09:03:56 2003 From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com) Date: Mon, 22 Dec 2003 17:03:56 -0000 Subject: [Rocks-Discuss]RE:Writing a Roll ? Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com> Folks, I have made good headway in adding software and its configuration using extend- compute.xml and now have a robust system. (the head node install is still rather manual though :-( ) I would now like to move to doing this as a Roll. However I am not sureof the best way of proceeding - there appears to be little documentation - either on HOWTO or on the underlying concepts. I have mounted the HPC_roll.iso and browsed around: - the image seems to consists of 2 subdirectories - in the same style as RedHat CD's - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS contains binary RPMs ( the latter contains many more RPMs than there is an SRPM for. ) There is no obvious configuration information until you explore: roll-hpc-kickstart-3.0.0-0.noarch.rpm This seems to contain lots of XML which at first glance is hard to decifer. So my question is: Should we be writing our own rolls, and if so how ? (examples?) Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505
  • 237.
    ----------------------- www.quadrics.com -------------------- > Fromdaniel.kidger at quadrics.com Mon Dec 22 09:08:21 2003 From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com) Date: Mon, 22 Dec 2003 17:08:21 -0000 Subject: [Rocks-Discuss]shucks. Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622461C9@tardis0.quadrics.com> # rpm -ql roll-hpc-kickstart |xargs -l grep -inH sucks /export/home/install/profiles/current/nodes/force-smp.xml:21: IBM sucks /export/home/install/profiles/current/nodes/ganglia-server.xml:134: perl sucks /export/home/install/profiles/current/nodes/ganglia-server.xml:148: Switch from ISC to RedHat's pump. Pump sucks but it is standard so /export/home/install/profiles/current/nodes/sendmail-masq.xml:31: m4 sucks :-) Have a good Christmas, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From fds at sdsc.edu Mon Dec 22 10:22:54 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 22 Dec 2003 10:22:54 -0800 Subject: [Rocks-Discuss]RE:Writing a Roll ? In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com> Message-ID: <DBF30128-34AB-11D8-8652-000393A4725A@sdsc.edu> You are right, we have little documentation on creating new rolls. I have lamented to Greg about this, and he has done the same to me. Basically we have been so busy trying to get the 3.1.0 release out that we haven't put our nose to the grindstone about the Developer docs. Here is a little primer since it sounds like you are indeed ready. 1. The first thing to realize is that rolls are not build from "scratch", but are done from the safe confines of our build environment. This environment is the directory: [your local rocks CVS sandbox]/src/roll/ You must checkout the Rocks CVS tree to get this. Instructions about how to do this (anonymously) are at http://cvs/rocksclusters.org/. Once you have this build environment on your frontend system, you are ready for the next step to building your roll. You should make a new directory here called "quadrics" - the name matters as it will be the identifier for your roll from now on.
  • 238.
    1. Now thebest thing I can tell you is to look at the "hpc" and "sge" roll (two of our most mature) for the directory structure in "quadrics". Its fairly straightforward, and mirrors what we do for the base. The "nodes" directory will hold your "extend-compute.xml", etc. (more on this later). The "roll-quadrics-kickstart.noarch.rpm" is made automatically for your from information in these directories. 2. The "src" dir holds anything you need to compile. Anything in src should deposit an RPM package in the "RPMS" directory when its build is finished. 3. You type "make roll" to start the build process. It will take a bit of study for you to get things correct, but suffice it to say that you will have an iso file suitable for burning when you are done. Thank bruno for this sweet fact - everything is automatic except your intellectual property :) One more word on your XML files. Our philosophy of rolls is not to use the "extend/replace" strategy that we advocate for customization. As a roll builder, you are at the grass-roots level, and can rise above simple customization techniques. Your roll should define a "quadrics.xml" node in the kickstart graph. You define the node in the file "roll/quadrics/nodes/quadrics.xml" and the edges in the file "roll/quadrics/graphs/default/quadrics.xml". Look at the SGE roll for a good example of this. By defining your configuration this way, you have more power to do complex tasks (different configuration for different appliance types), and to leave room for future growth. Good luck, and we hope and pray for a good technical writer that will do this process justice. -Federico On Dec 22, 2003, at 9:03 AM, daniel.kidger at quadrics.com wrote: > Folks, > I have made good headway in adding software and its configuration > using extend-compute.xml and now have a robust system. (the head node > install is still rather manual though :-( ) > > I would now like to move to doing this as a Roll. However I am not > sureof the best way of proceeding - there appears to be little > documentation - either on HOWTO or on the underlying concepts. > > I have mounted the HPC_roll.iso and browsed around: > - the image seems to consists of 2 subdirectories - in the same style > as RedHat CD's > - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS > contains binary RPMs > ( the latter contains many more RPMs than there is an SRPM for. ) > > There is no obvious configuration information until you explore: > roll-hpc-kickstart-3.0.0-0.noarch.rpm > This seems to contain lots of XML which at first glance is hard to > decifer. >
  • 239.
    > So myquestion is: > Should we be writing our own rolls, and if so how ? (examples?) > > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > >> >> Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From mjk at sdsc.edu Mon Dec 22 11:07:32 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Mon, 22 Dec 2003 11:07:32 -0800 Subject: [Rocks-Discuss]shucks. In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622461C9@tardis0.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F622461C9@tardis0.quadrics.com> Message-ID: <18168448-34B2-11D8-8AD9-000A95DA5638@sdsc.edu> If these are the worst CVS log comments you've found you aren't looking very hard. The only one here I'm compelled to clarify is IBM. There are around 3-5 ways of probing the chipset to determine if the box is SMP, RedHat supports the most common ones which everyone in the world except IBM use. This forced us to patch anaconda to detect SMP for IBM hardware (or in this case just force it) -- didn't these guys invent the PC? -mjk On Dec 22, 2003, at 9:08 AM, daniel.kidger at quadrics.com wrote: > > # rpm -ql roll-hpc-kickstart |xargs -l grep -inH sucks > > /export/home/install/profiles/current/nodes/force-smp.xml:21: IBM > sucks > /export/home/install/profiles/current/nodes/ganglia-server.xml:134: > perl sucks > /export/home/install/profiles/current/nodes/ganglia-server.xml:148: > Switch from ISC to RedHat's pump. Pump sucks but it is standard so > /export/home/install/profiles/current/nodes/sendmail-masq.xml:31: m4 > sucks > > :-) > > Have a good Christmas, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com
  • 240.
    > One BridewellSt., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- From mjk at sdsc.edu Mon Dec 22 11:13:30 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Mon, 22 Dec 2003 11:13:30 -0800 Subject: [Rocks-Discuss]RE:Writing a Roll ? In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F622357D0@tardis0.quadrics.com> Message-ID: <EDBC4D7D-34B2-11D8-9250-000A95DA5638@sdsc.edu> http://cvs.rocksclusters.org In the rocks/src/roll directory you can see several roll examples, all of which are build be typing "make roll". THe roll-*-kickstart.*.noarch.rpm is the real magic that includes the XML profiles that are grafted onto the base kickstart graph. -mjk On Dec 22, 2003, at 9:03 AM, daniel.kidger at quadrics.com wrote: > Folks, > I have made good headway in adding software and its configuration > using extend-compute.xml and now have a robust system. (the head node > install is still rather manual though :-( ) > > I would now like to move to doing this as a Roll. However I am not > sureof the best way of proceeding - there appears to be little > documentation - either on HOWTO or on the underlying concepts. > > I have mounted the HPC_roll.iso and browsed around: > - the image seems to consists of 2 subdirectories - in the same style > as RedHat CD's > - as expected ./SRPMS contains the source RPMs, and ./RedHat/RPMS > contains binary RPMs > ( the latter contains many more RPMs than there is an SRPM for. ) > > There is no obvious configuration information until you explore: > roll-hpc-kickstart-3.0.0-0.noarch.rpm > This seems to contain lots of XML which at first glance is hard to > decifer. > > So my question is: > Should we be writing our own rolls, and if so how ? (examples?) > > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > >>
  • 241.
    From daniel.kidger atquadrics.com Mon Dec 22 11:12:17 2003 From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com) Date: Mon, 22 Dec 2003 19:12:17 -0000 Subject: [Rocks-Discuss]RE:Writing a Roll ? Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622357D1@tardis0.quadrics.com> Federico, > Here is a little primer since it sounds like you are indeed ready. > --- many very informative lines deleted --- thanks for that long reply. :-) I am currently pulling a copy of the source tree from cvs.rocksclusters.org (194MB of rocks/doc alone !) Just a couple of questions for now: 1. Do rolls have to be CD based ? (during development I would probably get through a lot of CDROMs - but more importantly it would get a bit fiddly - to be keep walking round to the CD-writer - then nipping of to the room with the cluster in every time) 2. Do I have to reinstall the headnode from scratch each time I want to test a roll ? (even if the roll only affects RPMs that get installed on compute nodes) 3. Can a CD contain multiple rolls? (Once mature - a cluster may have quite a few rolls: pbs, sge, gm, IB, etc. and Quadrics would proably have two - the (open-source) hardware drivers,MPI,etc and also RMS - our (closed-source) cluster Resource Manager.) 4. What subset of the cvs tree does a Roll developer need? The whole tree is clearly rather excessive. 5. I am a little concerned about the amount of bloat needed to install our five RPMs as a Roll.(The RPMs are already prebuilt by our own internal build proceedures). So taking another case - lets say the Intel Compilers - These have 4 RPMs (plus a little sed-ery of their config files and pasting in the license file). Would these be best installed as a Roll or as a simple extend-compute.xml as I have currently? Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From sjenks at uci.edu Mon Dec 22 11:17:07 2003 From: sjenks at uci.edu (Stephen Jenks) Date: Mon, 22 Dec 2003 11:17:07 -0800 Subject: [Rocks-Discuss]rocks-dist suggestion Message-ID: <6F2FB100-34B3-11D8-88FD-000A95B96C68@uci.edu>
  • 242.
    Hi ROCKS folks, Justa suggestion for when you guys are bored after the 3.1 release 8-) I ran into some trouble installing some updates to a ROCKS 3.0 cluster that could easily be solved with some checking in rocks-dist: I put the openssh and other updates in the proper contrib directory under /home/install and ran "rock-dist dist" which properly updated the distribution. The problem occurred when I tried to reload the computed nodes - the install failed when it hit any of the RPMs in the contrib directory. It turns out the protections on those RPMs was set to 600 because I had copied them out of root's home directory, thus they couldn't be read by the server to send them down to the compute nodes. After fixing the permissions, all was well. So rocks-dist should check (and possibly fix) permissions on files that will be included in the kickstart distribution. I realize that the mistake was entirely mine, but I'm probably not the only one to ever forget to set permissions correctly and the tool could easily catch such mistakes. Thanks for putting together such a useful cluster distribution! Steve Jenks From msherman at informaticscenter.info Mon Dec 22 11:50:03 2003 From: msherman at informaticscenter.info (Mark Sherman) Date: Mon, 22 Dec 2003 12:50:03 -0700 Subject: [Rocks-Discuss]MPI and memory + node rescue Message-ID: <20031222195003.7688.qmail@webmail4.mesa1.secureserver.net> just for future consideration... any time I need to look at a system without booting it or it's ability to boot I just throw in the knoppix cd. www.knoppix.org ______________________________________________ Mark Sherman Computing Systems Administrator Informatics Center Massachusetts Biomedical Initiatives Worcester MA 01605 508-797-4200 msherman at informaticscenter.info ----------------------~----------------------- > -------- Original Message -------- > Subject: Re: [Rocks-Discuss]MPI and memory + node rescue > From: "Trond SAUE" <saue at quantix.u-strasbg.fr> > Date: Thu, November 27, 2003 1:38 am > To: "Stephen P. Lebeau" <lebeau at openbiosystems.com> > Cc: npaci-rocks-discussion at sdsc.edu > > On 2003.11.26 16:52, Stephen P. Lebeau wrote:
  • 243.
    > > If you go here, they talk about creating a Linux floppy > > repair disk. Make sure to read the README file... they > > require that you make a 1.68MB floppy ( README explains how ) > > > > http://www.tux.org/pub/people/kent-robotti/looplinux/rip/ > > > > If that doesn't work... > > > > http://www.toms.net/rb/download.html > > > > I've actually used this one before. > > > > -S > > > In order to have a look at the disk of my crashed node, I downloaded > RIP-2.2-1680.bin from the first site, but I was not able to boot > properly. However, tomsrtbt-2.0.103 from the second site worked very > well and allowed me to reboot the node as well as mount its disk to > look at messages. Unfortunately, they did not really tell me anything > more...However, it might be an idea for a future release of ROCKS to > include a second "standalone" boot option for the computer nodes, so > that one can access them independent of the frontend.... > All the best, > Trond Saue > -- > Trond SAUE (DIRAC: > http://dirac.chem.sdu.dk/) > Laboratoire de Chimie Quantique et Mod?lisation Mol?culaire > Universite Louis Pasteur ; 4, rue Blaise Pascal ; F-67000 STRASBOURG > t?l: 03 90 24 13 01 fax: 03 90 24 15 89 email: saue at quantix.u- > strasbg.fr From daniel.kidger at quadrics.com Mon Dec 22 11:51:16 2003 From: daniel.kidger at quadrics.com (daniel.kidger at quadrics.com) Date: Mon, 22 Dec 2003 19:51:16 -0000 Subject: [Rocks-Discuss]rocks-dist suggestion Message-ID: <30062B7EA51A9045B9F605FAAC1B4F622461CD@tardis0.quadrics.com> > Just a suggestion for when you guys are bored after the 3.1 > release 8-) > The problem occurred when I tried to reload the computed nodes - the > install failed when it hit any of the RPMs in the contrib > directory. It > turns out the protections on those RPMs was set to 600 because I had > copied them out of root's home directory, thus they couldn't > be read by > the server to send them down to the compute nodes. After fixing the > permissions, all was well. This is a 'me-too' reply. Rocks reads the RPMs using http - hence they need to be readable by user apache. With symlinks - it is all too easy even if the RPMs are 644 for the directory tree to be somewhere not walakable by a 3rd party userid like apache. Yours,
  • 244.
    Daniel. -------------------------------------------------------------- Dr. Dan Kidger,Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- > From fds at sdsc.edu Mon Dec 22 15:26:01 2003 From: fds at sdsc.edu (Federico Sacerdoti) Date: Mon, 22 Dec 2003 15:26:01 -0800 Subject: [Rocks-Discuss]RE:Writing a Roll ? In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F622357D1@tardis0.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F622357D1@tardis0.quadrics.com> Message-ID: <34B2A95C-34D6-11D8-8652-000393A4725A@sdsc.edu> On Dec 22, 2003, at 11:12 AM, daniel.kidger at quadrics.com wrote: > Federico, > >> Here is a little primer since it sounds like you are indeed ready. >> --- many very informative lines deleted --- > > > Just a couple of questions for now: > 1. Do rolls have to be CD based ? > (during development I would probably get through a lot of CDROMs - > but more importantly it would get a bit fiddly > - to be keep walking round to the CD-writer - then nipping of to the > room with the cluster in every time) > For distribution, the rolls should probably be cd based. For development, however, that is not necessary. There is a make target which will compile your source, and "install" the roll into your local distribution. This is "make intodist" and assumes you are building on a frontend node. You would follow this call with a call to "rocks-dist dist" in the "/home/install" directory. Of course, this makes most sense for rolls that affect compute nodes. To test parts of your roll that affect frontend functionality, you still need to use the CDs. > 2. Do I have to reinstall the headnode from scratch each time I want > to test a roll ? > (even if the roll only affects RPMs that get installed on compute > nodes) See comment above. We're working on a way to fully install frontends over the network, but it will not make it into the new release. > > 3. Can a CD contain multiple rolls? > (Once mature - a cluster may have quite a few rolls: pbs, sge, gm, > IB, etc. > and Quadrics would proably have two - the (open-source) hardware > drivers,MPI,etc and also RMS - our (closed-source) cluster Resource
  • 245.
    > Manager.) There issome support for this, we call them "Metarolls". We know they are important, and we have some support for them now. The build process for them is a bit different, and wont arrive for this release but soon after. > 4. What subset of the cvs tree does a Roll developer need? The whole > tree is clearly rather excessive. > There are definately areas of the tree not necessary for roll building. Its always safest to have everything, but you're welcome to crop and test. > 5. I am a little concerned about the amount of bloat needed to > install our five RPMs as a Roll.(The RPMs are already prebuilt by our > own internal build proceedures). > So taking another case - lets say the Intel Compilers - These have 4 > RPMs (plus a little sed-ery of their config files and pasting in the > license file). Would these be best installed as a Roll or as a simple > extend-compute.xml as I have currently? It is better to put them in a roll. We are have ways to combine, distribute, sort, etc. these rolls, and they form a nice capsule of software to introduce into the system. I understand that pulling the whole source tree seems a bit excessive, but it is rather standard practice for working on an open project. Plus only the developer needs the source, the consumer does not. Good luck, and we're glad someone is asking the questions. Rolls are intended for outside construction, and we need to document the process. :) -Federico > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > > Federico Rocks Cluster Group, San Diego Supercomputing Center, CA From tlinden at pcu.helsinki.fi Tue Dec 23 05:28:35 2003 From: tlinden at pcu.helsinki.fi (=?ISO-8859-15?Q?Tomas_Lind=E9n?=) Date: Tue, 23 Dec 2003 15:28:35 +0200 (EET) Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart? Message-ID: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi> To reinstall a cluster I use the command
  • 246.
    cluster-fork /boot/kickstart/cluster-kickstart Now sinceall 32 nodes have been PXE installed this means that the reinstallation is performed by first doing a PXE-boot to load the installation kernel. My problem is that sometimes a few nodes fail during this reinstallation process. The failing nodes seem to be different whenever this problem occurs. The really strange thing is that after more than a day or so some nodes somehow manage to finish the reinstallation process! Sometimes the whole cluster comes up fine without any lost node. The problematic nodes _seem_ to get the installation kernel with PXE, so it might be not a PXE problem but something odd that happens later? Has anyone seen anything like this before? I'm aware of a bug in the RedHat installation kernel on Athlon systems when trying to run with a serial console. https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/001988.html This is why I run the installation kernel without a serial console, but this makes debugging difficult because the serial console only shows output during the PXE boot process. No output is generated by the installation kernel itself. The next output is generated when the node has finished the installation and loads the final kernel which runs fine with a serial console. This is using Rocks 2.3.2 on a 32 node cluster with Tyan Tiger MPX S2466N-4M motherboards and dual Athlon MP CPUs with no graphics adapters, so the system has a 32 port serial console switch. The motherboards have integrated 100 Mb/s 3Com 3C920 NICs (in practice a 3C905 NIC). The switch is made by Enterasys. The frontend private NIC is also running at 100 Mb/s. When doing the cluster reinstallation the network bandwidth over the frontend NIC saturates at 12,5 MB/s. Maybe some packets are lost because of this? The frontend private ethernet connection will be upgraded to Gb/s. Hopefully this will solve this reinstallation problem. Do you have any other ideas how to solve this problem? Best regards, Tomas Lind?n -------------------------------------------------------------------------- I , I I Tomas Linden Helsinki Institute of Physics (HIP) I I Tomas.Linden at Helsinki.FI P.O. Box 64 (Gustaf H?llstr?min katu 2) I I phone: +358-9-191 505 63 FIN-00014 UNIVERSITY OF HELSINKI I I fax: +358-9-191 505 53 Finland I I WWW: http://www.physics.helsinki.fi/~tlinden/eindex.html I -------------------------------------------------------------------------- From kjcruz at ece.uprm.edu Tue Dec 23 05:31:26 2003 From: kjcruz at ece.uprm.edu (Kennie Cruz) Date: Tue, 23 Dec 2003 09:31:26 -0400 (AST) Subject: [Rocks-Discuss]Error installing the compute node Message-ID: <Pine.LNX.4.58.0312230921290.23333@alambique.ece.uprm.edu> Hi,
  • 247.
    I am tryingto kickstart the compute nodes with Rocks 3.0.0, the frontend is already working. I revised the FAQ question 7.1.2, the services (dhcpd, httpd, mysqld and autofs) are running, but running kickstar.cgi from the command line give an error: error - cannot kickstart external nodes I made a quick search on the list, but without any success. The compute node gets the assigned IP and insert-ethers detect the appliance without any trouble, but fails to run the kickstart.cgi from the frontend. The web server error log says something like this: [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed header from script. Bad header=# @Copyright@: /var/www/html/install/kickstart.cgi While the access log says this: 10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0" 500 587 "-" "-" I ran insert-ethers with the Ethernet Switches option. My nodes are connected via 3 managed ethernet switches. Any help will be appreciated. Thanks in advance. -- Kennie J. Cruz Gutierrez, System Administrator Department of Electrical and Computer Engineering University of Puerto Rico, Mayaguez Campus Work Phone: (787) 832-4040 x 3798 Email: Kennie.Cruz at ece.uprm.edu Web: http://ece.uprm.edu/~kennie/ [2003-12-23/09:21] Black holes are created when God divides by zero! From bruno at rocksclusters.org Tue Dec 23 08:33:39 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 23 Dec 2003 08:33:39 -0800 Subject: [Rocks-Discuss]Error installing the compute node In-Reply-To: <Pine.LNX.4.58.0312230921290.23333@alambique.ece.uprm.edu> References: <Pine.LNX.4.58.0312230921290.23333@alambique.ece.uprm.edu> Message-ID: <C33DF11A-3565-11D8-B821-000A95C4E3B4@rocksclusters.org> just to be clear, did you execute: # cd /home/install # ./kickstart.cgi --client compute-0-0 - gb
  • 248.
    On Dec 23,2003, at 5:31 AM, Kennie Cruz wrote: > Hi, > > I am trying to kickstart the compute nodes with Rocks 3.0.0, the > frontend > is already working. I revised the FAQ question 7.1.2, the services > (dhcpd, > httpd, mysqld and autofs) are running, but running kickstar.cgi from > the > command line give an error: > > error - cannot kickstart external nodes > > I made a quick search on the list, but without any success. > > The compute node gets the assigned IP and insert-ethers detect the > appliance without any trouble, but fails to run the kickstart.cgi from > the > frontend. The web server error log says something like this: > > [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed > header > from script. Bad header=# @Copyright@: > /var/www/html/install/kickstart.cgi > > While the access log says this: > > 10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET > /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0" > 500 587 "-" "-" > > I ran insert-ethers with the Ethernet Switches option. My nodes are > connected via 3 managed ethernet switches. > > Any help will be appreciated. > > Thanks in advance. > > -- > Kennie J. Cruz Gutierrez, System Administrator > Department of Electrical and Computer Engineering > University of Puerto Rico, Mayaguez Campus > Work Phone: (787) 832-4040 x 3798 > Email: Kennie.Cruz at ece.uprm.edu > Web: http://ece.uprm.edu/~kennie/ > > [2003-12-23/09:21] > Black holes are created when God divides by zero! From daniel.kidger at quadrics.com Tue Dec 23 09:03:49 2003 From: daniel.kidger at quadrics.com (Daniel Kidger) Date: Tue, 23 Dec 2003 17:03:49 +0000 Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart? In-Reply-To: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi> References: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi>
  • 249.
    Message-ID: <3FE87575.5060807@quadrics.com> Tomas Lind?nwrote: >To reinstall a cluster I use the command > cluster-fork /boot/kickstart/cluster-kickstart >Now since all 32 nodes have been PXE installed this means that the >reinstallation is performed by first doing a PXE-boot to load the >installation kernel. My problem is that sometimes a few nodes fail >during this reinstallation process. > Although I haven't PXE installed a Rocks cluster of this size I have done PXE-based installs of (larger) RedHat clusters using a customised kickstart file. What can go wrong is that I have seen timeouts if too made nodes dhcp/tftp for their installer kernel simultaneously. You could try and increase the timeout or better not do too many at once - say start 8 at a time every 30 seconds. There is plenty of precedence of this in say the automated installer of the Alphaserver SC Tru64clusters. Also outside of Rocks I have seen folk use mutiple 'sub-master' nodes to act as tftp/http fileservers during the install process. It would be interesting to see what the Rocks developers vision is for the scalable installation of large clusters. -- Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From mjk at sdsc.edu Tue Dec 23 09:44:14 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 23 Dec 2003 09:44:14 -0800 Subject: [Rocks-Discuss]Lost nodes during cluster-kickstart? In-Reply-To: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi> References: <Pine.OSF.4.58.0312231440260.353431@rock.helsinki.fi> Message-ID: <9F7E8D1C-356F-11D8-8281-000A95DA5638@sdsc.edu> The problems is PXE has an extremely short timeout, and once it fails it does not retry. Since this is a BIOS thing, there isn't a lot to do. If you boot your compute nodes off of CDs (and avoid PXE), the problem goes away. This is because even if the DHCP timeouts we've modified our installation to be extremely aggressive in DHCP request and the entire installation process will actually watchdog timeout and restart if needed. Unfortunately, the PXE timeout cannot be fixed in the same way. Our experience shows PXE to scale to 128 nodes for a mass re-install using current hardware. Older CPUs may shows issues. The only answer right now for this is to stage your re-install so the PXE server can handle the load. This load is actually very low, but the PXE server for Linux is still maturing. -mjk
  • 250.
    On Dec 23,2003, at 5:28 AM, Tomas Lind?n wrote: > To reinstall a cluster I use the command > cluster-fork /boot/kickstart/cluster-kickstart > Now since all 32 nodes have been PXE installed this means that the > reinstallation is performed by first doing a PXE-boot to load the > installation kernel. My problem is that sometimes a few nodes fail > during this reinstallation process. The failing nodes seem to be > different > whenever this problem occurs. The really strange thing is that after > more than a day or so some nodes somehow manage to finish the > reinstallation process! > > Sometimes the whole cluster comes up fine without any lost node. > > The problematic nodes _seem_ to get the installation kernel with PXE, > so > it might be not a PXE problem but something odd that happens later? > > Has anyone seen anything like this before? > > I'm aware of a bug in the RedHat installation kernel > on Athlon systems when trying to run with a serial console. > > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2003-May/ > 001988.html > This is why I run the installation kernel without a serial console, but > this makes debugging difficult because the serial console only shows > output during the PXE boot process. No output is generated by the > installation kernel itself. The next output is generated when > the node has finished the installation and loads the final kernel which > runs fine with a serial console. > > This is using Rocks 2.3.2 on a 32 node cluster with Tyan Tiger MPX > S2466N-4M motherboards and dual Athlon MP CPUs with no graphics > adapters, so the system has a 32 port serial console switch. The > motherboards have integrated 100 Mb/s 3Com 3C920 NICs (in practice a > 3C905 NIC). The switch is made by Enterasys. The frontend private NIC > is > also running at 100 Mb/s. When doing the cluster reinstallation the > network bandwidth over the frontend NIC saturates at 12,5 MB/s. Maybe > some packets are lost because of this? > > The frontend private ethernet connection will be upgraded to Gb/s. > Hopefully this will solve this reinstallation problem. > > Do you have any other ideas how to solve this problem? > > Best regards, Tomas Lind?n > ----------------------------------------------------------------------- > --- > I , > I > I Tomas Linden Helsinki Institute of Physics (HIP) > I > I Tomas.Linden at Helsinki.FI P.O. Box 64 (Gustaf H?llstr?min katu > 2) I > I phone: +358-9-191 505 63 FIN-00014 UNIVERSITY OF HELSINKI
  • 251.
    > I > I fax: +358-9-191 505 53 Finland > I > I WWW: http://www.physics.helsinki.fi/~tlinden/eindex.html > I > ----------------------------------------------------------------------- > --- From Timothy.Carlson at pnl.gov Tue Dec 23 08:57:07 2003 From: Timothy.Carlson at pnl.gov (Carlson, Timothy S) Date: Tue, 23 Dec 2003 08:57:07 -0800 Subject: [Rocks-Discuss]Error installing the compute node Message-ID: <A383F042472668459D642266F8B41692056B9F@pnlmse24.pnl.gov> The problem he is having is that he chose "ethernet switches" when running insert-ethers. He should have chosen "Compute nodes". Only choose "ethernet switches" when you are assigning an IP address to an ethernet switch with DHCP. If your managed switches already have IP addresses, then just install "compute nodes" Tim -----Original Message----- From: Greg Bruno [mailto:bruno at rocksclusters.org] Sent: Tuesday, December 23, 2003 8:34 AM To: Kennie Cruz Cc: npaci-rocks-discussion at sdsc.edu Subject: Re: [Rocks-Discuss]Error installing the compute node just to be clear, did you execute: # cd /home/install # ./kickstart.cgi --client compute-0-0 - gb On Dec 23, 2003, at 5:31 AM, Kennie Cruz wrote: > Hi, > > I am trying to kickstart the compute nodes with Rocks 3.0.0, the > frontend > is already working. I revised the FAQ question 7.1.2, the services > (dhcpd, > httpd, mysqld and autofs) are running, but running kickstar.cgi from > the > command line give an error: > > error - cannot kickstart external nodes > > I made a quick search on the list, but without any success.
  • 252.
    > > The computenode gets the assigned IP and insert-ethers detect the > appliance without any trouble, but fails to run the kickstart.cgi from > the frontend. The web server error log says something like this: > > [Tue Dec 23 09:10:08 2003] [error] [client 10.255.255.254] malformed > header > from script. Bad header=# @Copyright@: > /var/www/html/install/kickstart.cgi > > While the access log says this: > > 10.255.255.254 - - [23/Dec/2003:09:10:08 -0400] "GET > /install/kickstart.cgi?arch=i386&np=2&if=eth0&project=rocks HTTP/1.0" > 500 587 "-" "-" > > I ran insert-ethers with the Ethernet Switches option. My nodes are > connected via 3 managed ethernet switches. > > Any help will be appreciated. > > Thanks in advance. > > -- > Kennie J. Cruz Gutierrez, System Administrator > Department of Electrical and Computer Engineering > University of Puerto Rico, Mayaguez Campus > Work Phone: (787) 832-4040 x 3798 > Email: Kennie.Cruz at ece.uprm.edu > Web: http://ece.uprm.edu/~kennie/ > > [2003-12-23/09:21] > Black holes are created when God divides by zero! From purikk at hotmail.com Tue Dec 23 12:48:30 2003 From: purikk at hotmail.com (Purushotham Komaravolu) Date: Tue, 23 Dec 2003 15:48:30 -0500 Subject: [Rocks-Discuss]beowulf and rocks Message-ID: <BAY1-DAV43JrOq93dSA00011dba@hotmail.com> Hi, I keep people mentioning about beowulf and Rocks, can somebody point me the differnece between them. They they just two different solutions for Clusters? Thanks Regards, Puru From tim.carlson at pnl.gov Tue Dec 23 13:19:39 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Tue, 23 Dec 2003 13:19:39 -0800 (PST) Subject: [Rocks-Discuss]beowulf and rocks In-Reply-To: <BAY1-DAV43JrOq93dSA00011dba@hotmail.com> Message-ID: <Pine.LNX.4.44.0312231314420.25800-100000@localhost.localdomain>
  • 253.
    On Tue, 23Dec 2003, Purushotham Komaravolu wrote: > I keep people mentioning about beowulf and Rocks, can somebody point me > the differnece between them. They they just two different solutions for > Clusters? Beowulf is a loose definition for a cluster of machines (typically off the shelf hardware). Beowulf is not software. Rocks is a software solution to manage your beowulf. You can compare rocks/oscar/scyld/ as software systems for your beowulf cluster. Read Robert Brown's book on beowulfs at this URL http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book/beowulf_book/index.html Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From dlane at ap.stmarys.ca Tue Dec 23 14:53:51 2003 From: dlane at ap.stmarys.ca (Dave Lane) Date: Tue, 23 Dec 2003 18:53:51 -0400 Subject: [Rocks-Discuss]beowulf and rocks In-Reply-To: <BAY1-DAV43JrOq93dSA00011dba@hotmail.com> Message-ID: <5.2.0.9.0.20031223185219.01b444e8@ap.stmarys.ca> At 03:48 PM 12/23/2003 -0500, Purushotham Komaravolu wrote: >Hi, > I keep people mentioning about beowulf and Rocks, can somebody point me >the differnece between them. They they just two different solutions for >Clusters? Beowulf is a loosely-defined generic term (that I won't attempt do define now!), while Rocks is one of the several software distributions that implement a beowulf cluster. ... Dave From junkscarce at hotmail.com Tue Dec 23 15:43:05 2003 From: junkscarce at hotmail.com (Reed Scarce) Date: Tue, 23 Dec 2003 23:43:05 +0000 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails Message-ID: <BAY1-F147XhOous6jec0001512f@hotmail.com> Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml lies code like this commented code: <post> /bin/mkdir /mnt/plc/ <-- works -->
  • 254.
    /bin/mkdir /mnt/plc/plc_data <--works --> /bin/ln -s /mnt/plc_data /data1 <-- works --> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, source exists --> </post> I don't understand why the ln to a directory succeeds but a ln to a script fails. BTW, Dr. Landman, I've attempted to use your build.pl but it seems to faill with: Can't stat `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm . (my note: the path ends at RPMS) I swear I thought I saw a solution to this once but I can't find it again. Upon reinstallation with the file your tool created (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda threw back the exception: Traceback (innermost last): file "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, configFileData) File "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 in run ok save debug TIA Reed Scarce _________________________________________________________________ Tired of slow downloads? Compare online deals from your local high-speed providers now. https://broadband.msn.com From landman at scalableinformatics.com Tue Dec 23 16:17:58 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 23 Dec 2003 19:17:58 -0500 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails In-Reply-To: <BAY1-F147XhOous6jec0001512f@hotmail.com> References: <BAY1-F147XhOous6jec0001512f@hotmail.com> Message-ID: <1072225078.4501.82.camel@protein.scalableinformatics.com> Hi Reed: Which version of finishing server fails on which version of ROCKS? It looks like 3.0. I am up to 3.1.0 now. With a little bit of modification I could make it work with 2.3.2. Likely just a single line to point to the right path. Let me know and I'll see what I can do. I would recommend using the 3.1.0 environment, as it is a significant (read as massive) improvement over previous versions. If you (and others) need it to work with older (pre-3.0) versions of ROCKS, I think I can handle that. Let me know. Joe On Tue, 2003-12-23 at 18:43, Reed Scarce wrote: > Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml > lies code like this commented code: > <post>
  • 255.
    > /bin/mkdir /mnt/plc/ <-- works --> > /bin/mkdir /mnt/plc/plc_data <-- works --> > /bin/ln -s /mnt/plc_data /data1 <-- works --> > /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, source > exists --> > </post> > > I don't understand why the ln to a directory succeeds but a ln to a script > fails. > > BTW, Dr. Landman, I've attempted to use your build.pl but it seems to faill > with: > Can't stat `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm From mjk at sdsc.edu Tue Dec 23 16:35:13 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 23 Dec 2003 16:35:13 -0800 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails In-Reply-To: <BAY1-F147XhOous6jec0001512f@hotmail.com> References: <BAY1-F147XhOous6jec0001512f@hotmail.com> Message-ID: <09B1C3EA-35A9-11D8-8281-000A95DA5638@sdsc.edu> "man chkconfig" If you use chkconfig you do not need to create the rc*.d/* files and they are put in place for you. -mjk On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote: > Within /export/home/install/profiles/2.3.2/site-nodes > extend-compute.xml lies code like this commented code: > <post> > /bin/mkdir /mnt/plc/ <-- works --> > /bin/mkdir /mnt/plc/plc_data <-- works --> > /bin/ln -s /mnt/plc_data /data1 <-- works --> > /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, > source exists --> > </post> > > I don't understand why the ln to a directory succeeds but a ln to a > script fails. > > BTW, Dr. Landman, I've attempted to use your build.pl but it seems to > faill with: > Can't stat > `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm . > (my note: the path ends at RPMS) I swear I thought I saw a solution > to this once but I can't find it again. > Upon reinstallation with the file your tool created > (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda > threw back the exception: Traceback (innermost last): file > "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, > configFileData) File > "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line > 427 in run
  • 256.
    > ok save debug > > > TIA Reed Scarce > > _________________________________________________________________ > Tired of slow downloads? Compare online deals from your local > high-speed providers now. https://broadband.msn.com From jkreuzig at uci.edu Tue Dec 23 19:53:16 2003 From: jkreuzig at uci.edu (James Kreuziger) Date: Tue, 23 Dec 2003 19:53:16 -0800 (PST) Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> Message-ID: <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> Thanks everybody for the info. I was aware of the fast-link issue; However, after enabling it, we still were unable to see the switch from the frontend. We had a laptop hooked up to the switch via serial and ethernet and was able to turn on the fast-link, and assign an IP address. After that, the web-based interface came up on the laptop. Still, no response on the switch from the frontend. So after great gnashing of teeth, and dozens of re-installs of the frontend, success! The problem? The extra nic card on the frontend. We had bought the frontend with a dual 1GB card and a single 100MB card. Whenever the single nic card is installed, the system always takes this as eth0. This is something that was staring us right in the face, so that's why it probably took so long to figure out. After 3 years of trying to find the money, we finally have our first 8 node cluster up! -Jim ************************************************* Jim Kreuziger jkreuzig at uci.edu 949-824-4474 ************************************************* From landman at scalableinformatics.com Tue Dec 23 20:23:35 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 23 Dec 2003 23:23:35 -0500 Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> Message-ID: <3FE914C7.3050001@scalableinformatics.com> Hi James: One of the things I do first time I boot up a new head node is to map
  • 257.
    the ethernet ports.I take out all but one of the network wires, and make sure there is real network traffic. A ping on the subnet is fine. Then I tcpdump the network port. What is suprising to me is how many times the assumed network eth0 is mapped differently. Then by hand, after mapping the rest of the ports, I manually modify the /etc/modules.conf file to reflect what I need. Just a suggestion. Having been bitten enough, I find simple sanity checks help reduce the size or dimensionality of the space of possible problems. This makes these debugging sessions usually faster, and allows for better characterization of the issue. Joe James Kreuziger wrote: >Thanks everybody for the info. I was aware of the fast-link issue; >However, after enabling it, we still were unable to see the switch >from the frontend. We had a laptop hooked up to the switch via serial >and ethernet and was able to turn on the fast-link, and assign an >IP address. After that, the web-based interface came up on the laptop. >Still, no response on the switch from the frontend. > >So after great gnashing of teeth, and dozens of re-installs of the >frontend, success! The problem? The extra nic card on the frontend. >We had bought the frontend with a dual 1GB card and a single 100MB card. >Whenever the single nic card is installed, the system always takes this >as eth0. This is something that was staring us right in the face, so >that's why it probably took so long to figure out. > >After 3 years of trying to find the money, we finally have our first >8 node cluster up! > >-Jim > >************************************************* >Jim Kreuziger >jkreuzig at uci.edu >949-824-4474 >************************************************* > > > -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From bruno at rocksclusters.org Tue Dec 23 21:26:08 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 23 Dec 2003 21:26:08 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64 Message-ID: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org>
  • 258.
    Version 3.1.0 (Matterhorn)of the Rocks cluster distribution is released and now supports three processor families: Intel IA-32, Intel Itanium Processor Family, and AMD Opteron. ?This is the released version of software that was used to build a fully-functioning 128-node grid-enabled cluster?in under 2 hours on opening night ?last month at?SC2003 in Phoenix, AZ. ?Rocks is developed by the Grid and Cluster Computing Group at SDSC and by partners at the University of California, Berkeley, Scalable Systems in Singapore, and individual open-source software developers. This is a co-release for x86 (Pentium, Athlon, and others), Itanium2 (IA-64) and Opteron (x86-64) based clusters. Software is freely available for download to burn onto a bootable CD set for x86 and x86-64 or a single DVD for Itanium2. Versions for all processor families are available at http://www.rocksclusters.org/. Introduced on Version 3.0.0, this version enhances the ?roll? mechanism to enable users, communities and others to easily add on optional software and configuration. ?These optional ?Roll CDs? extend the system by integrating seamlessly and automatically into the management and packaging mechanisms used by base software. For all intents and purposes, rolls appear as if they are part of the original CD distribution. ?A number of defined extension rolls are freely available and include HPC, Sun Grid Engine, Grid (based on NMI), Java and Intel Compiler. An important feature is that new rolls can be created or updated independently of the core distribution. This fundamentally enables science teams and communities to add on domain-specific software packages, define a particular grid configuration, or simply modify any of the default configuration or package settings. New features in NPACI Rocks 3.1.0 include: - Opteron Support - Sun Grid Engine as default queuing System - Upgraded Ganglia server and client, used for collecting and visualizing cluster-wide monitoring metrics - Upgraded MPICH-GM and Myrinet GM 2.0 for the latest Rev D. Cards - Rocks-developed 411 information system to replace Network Information Service (NIS) - Updated SSH version 3.7.1 with no login delay - Several Optional Software Rolls including: - NSF Middleware Initiative version R4 grid distribution - Java 2 - Intel Compilers for x86 and ia64 Rocks 3.1.0 is derived from Red Hat?s publicly available source packages (SRPMS) used in portions of their Enterprise Linux 3.0 Product Line. All SRPMs ?have been recompiled to enable redistribution. ?All available updates for these packages have been pre-applied. Rocks-specific software and standard cluster and grid community software is then added to create a complete clustering toolkit. ?All Rocks source code is available in a public CVS repository. From angel at miami.edu Wed Dec 24 13:14:59 2003 From: angel at miami.edu (Angel Li) Date: Wed, 24 Dec 2003 16:14:59 -0500
  • 259.
    Subject: [Rocks-Discuss]Rocks 3.1.0is released for x86, ia64 and x86-64 In-Reply-To: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org> References: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org> Message-ID: <3FEA01D3.8080204@miami.edu> Hi, I currently have a cluster running Rocks 3.0 and I'm considering upgrading to 3.1. Now that SGE is the default batch queue, is maui working? Also, the Intel compiler roll is included. What licensing issues will I encounter? We currently have a license for version 7. Thanks, Angel From bruno at rocksclusters.org Wed Dec 24 14:14:46 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 24 Dec 2003 14:14:46 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 is released for x86, ia64 and x86-64 In-Reply-To: <3FEA01D3.8080204@miami.edu> References: <ADA8CDD0-35D1-11D8-B821-000A95C4E3B4@rocksclusters.org> <3FEA01D3.8080204@miami.edu> Message-ID: <94F9D6F6-365E-11D8-B821-000A95C4E3B4@rocksclusters.org> > I currently have a cluster running Rocks 3.0 and I'm considering > upgrading to 3.1. Now that SGE is the default batch queue, is maui > working? maui and pbs are currently not available in rocks 3.1, but it will be soon. maui and pbs will be included in its own roll -- that effort will be driven by roy dragseth from the University of Troms?. > Also, the Intel compiler roll is included. What licensing issues will > I encounter? We currently have a license for version 7. i'm not sure how the licenses transfer between versions. after you bring up a frontend with the intel roll, the following link is available on the frontend's home page: http://www.intel.com/software/products/distributors/rock_cluster.htm after you purchase a license, you just need to copy the license into the appropriate directory and then start compiling. for fortran, the appropriate directory is: /opt/intel_fc_80/licenses and for C, the appropriate directory is: /opt/intel_cc_80/licenses also, the intel roll contains a pre-built MPICH environment -- it is
  • 260.
    found under /opt/mpich/intel. - gb From cdwan at mail.ahc.umn.edu Wed Dec 24 14:17:28 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Wed, 24 Dec 2003 16:17:28 -0600 (CST) Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <3FE914C7.3050001@scalableinformatics.com> References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> <3FE914C7.3050001@scalableinformatics.com> Message-ID: <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu> Once upon a time, I decided to install a third interface in a rocks head node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a data network. At boot time *everything* was broken. To make a long story less long, the system had remapped itself with the new gig card as eth0, and the other two shifted up by one. That was really close to "no fun at all." Happy holidays! I'm burning the new release right now! -C From michal at harddata.com Wed Dec 24 15:05:43 2003 From: michal at harddata.com (Michal Jaegermann) Date: Wed, 24 Dec 2003 16:05:43 -0700 Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu>; from cdwan@mail.ahc.umn.edu on Wed, Dec 24, 2003 at 04:17:28PM -0600 References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> <3FE914C7.3050001@scalableinformatics.com> <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu> Message-ID: <20031224160543.A25886@mail.harddata.com> On Wed, Dec 24, 2003 at 04:17:28PM -0600, Chris Dwan (CCGB) wrote: > > Once upon a time, I decided to install a third interface in a rocks head > node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a > data network. At boot time *everything* was broken. I still cannot understand why people insists on NOT using 'nameif' utility. All network interfaces can be named whichever way you want and they will not move regardless how many NICs you will add or remove as long as MACs are not changed. If you replace a card with a different one then /etc/mactab needs to be edited to reflect your new configuration. On clients nodes with an automatic reinstall this indeed is not practical but for your front end machine this is another story. It is indeed the case that default startup scripts from Red Hat 7.3 need some simple additions as interface (re)naming need to be done before NICs are brought up for the first time. In RH9 and FC1
  • 261.
    'nameif' will beused "automagically" if HWADDR variable is defined (and with a correct value). Of course if you have different drivers for different NICs, and they are loaded as modules, then names can be assigned by editing /etc/modules.conf Michal From bruno at rocksclusters.org Wed Dec 24 15:41:25 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 24 Dec 2003 15:41:25 -0800 Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <20031224160543.A25886@mail.harddata.com> References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> <3FE914C7.3050001@scalableinformatics.com> <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu> <20031224160543.A25886@mail.harddata.com> Message-ID: <AFFB44D8-366A-11D8-B821-000A95C4E3B4@rocksclusters.org> >> Once upon a time, I decided to install a third interface in a rocks >> head >> node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) >> for a >> data network. At boot time *everything* was broken. > > I still cannot understand why people insists on NOT using 'nameif' > utility. All network interfaces can be named whichever way you want > and they will not move regardless how many NICs you will add or > remove as long as MACs are not changed. If you replace a card with > a different one then /etc/mactab needs to be edited to reflect your > new configuration. On clients nodes with an automatic reinstall > this indeed is not practical but for your front end machine this is > another story. > > It is indeed the case that default startup scripts from Red Hat 7.3 > need some simple additions as interface (re)naming need to be done > before NICs are brought up for the first time. In RH9 and FC1 > 'nameif' will be used "automagically" if HWADDR variable is defined > (and with a correct value). michal, for this release, we looked at your suggestion of using nameif -- we did a quick prototype and it looks like it will be the right thing to do. we sketched out a design and found that the full solution will require many pieces (database changes, installer changes and the obvious XML file changes). we left this out of 3.1.0 but it is towards the top of our list for the next release. thanks for the suggestion of nameif -- it is suggestions like that which help us to define the direction of rocks. - gb
  • 262.
    From landman atscalableinformatics.com Wed Dec 24 16:08:54 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 24 Dec 2003 19:08:54 -0500 Subject: [Rocks-Discuss]Dell Power Connect 5224 In-Reply-To: <20031224160543.A25886@mail.harddata.com> References: <Pine.GSO.4.44.0312191723530.2317-100000@poincare.emsl.pnl.gov> <Pine.GSO.4.58.0312231934470.12276@massun.ucicom.uci.edu> <3FE914C7.3050001@scalableinformatics.com> <Pine.GSO.4.58.0312241612450.25288@lenti.med.umn.edu> <20031224160543.A25886@mail.harddata.com> Message-ID: <3FEA2A96.3060405@scalableinformatics.com> Michal Jaegermann wrote: >On Wed, Dec 24, 2003 at 04:17:28PM -0600, Chris Dwan (CCGB) wrote: > > >>Once upon a time, I decided to install a third interface in a rocks head >>node (Dell SC1400, and a Syskonnect 98x Gig NIC for the interested) for a >>data network. At boot time *everything* was broken. >> >> > >I still cannot understand why people insists on NOT using 'nameif' >utility. All network interfaces can be named whichever way you want >and they will not move regardless how many NICs you will add or >remove as long as MACs are not changed. If you replace a card with >a different one then /etc/mactab needs to be edited to reflect your >new configuration. On clients nodes with an automatic reinstall >this indeed is not practical but for your front end machine this is >another story. > > Agreed, though as far as I can tell, nameif is not used in the /etc/init.d scripts. It is used by ifup, so you would have to set HWADDR on each interface in the /etc/sysconfig/.../ifcfg-eth* files (the ... refers to that RH9 and RHEL3 have moved where these things sit from what we were used to in RH7.x). Still need to map the interfaces though, to see which physical port corresponds to which device/mac address. With that in hand, you can set up the HWADDR or just swap cables. With the advent of the folks making exactly the right length cables (e.g. not giving any play, and placing them under tension while plugged in...) the cable swap doesnt work well for mapping on some systems. Moreover, on a fair number of systems I have played with, the BIOS is setup so that if they PXE boot, they are doing so from the address that the installed version of ROCKS would see as eth1. Annoying. -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615
  • 263.
    From junkscarce athotmail.com Fri Dec 26 15:35:57 2003 From: junkscarce at hotmail.com (Reed Scarce) Date: Fri, 26 Dec 2003 23:35:57 +0000 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails Message-ID: <BAY1-F88Kxt8zPdqJL900052b1b@hotmail.com> The line: chkconfig --level 3 gpm on works great from the command line, not in extend-compute.xml. Thanks for the new tool though, always glad. The line above is in a block without <eval shell="bash"> tags. I'll keep trying and rtm. Is it possible this is a 2.6.2 issue? The live environment restricts me from using a more recent version. >From: "Mason J. Katz" <mjk at sdsc.edu> >To: "Reed Scarce" <junkscarce at hotmail.com> >CC: npaci-rocks-discussion at sdsc.edu >Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails >Date: Tue, 23 Dec 2003 16:35:13 -0800 > >"man chkconfig" > >If you use chkconfig you do not need to create the rc*.d/* files and they >are put in place for you. > > -mjk > >On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote: > >>Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml >>lies code like this commented code: >><post> >>/bin/mkdir /mnt/plc/ <-- works --> >>/bin/mkdir /mnt/plc/plc_data <-- works --> >>/bin/ln -s /mnt/plc_data /data1 <-- works --> >>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, >>source exists --> >></post> >> >>I don't understand why the ln to a directory succeeds but a ln to a script >>fails. >> >>BTW, Dr. Landman, I've attempted to use your build.pl but it seems to >>faill with: >>Can't stat >>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm . >>(my note: the path ends at RPMS) I swear I thought I saw a solution to >>this once but I can't find it again. >>Upon reinstallation with the file your tool created >>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda >>threw back the exception: Traceback (innermost last): file >>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, >>configFileData) File >>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 in >>run >>ok save debug
  • 264.
    >> >> >>TIA Reed Scarce >> >>_________________________________________________________________ >>Tiredof slow downloads? Compare online deals from your local high-speed >>providers now. https://broadband.msn.com > _________________________________________________________________ Worried about inbox overload? Get MSN Extra Storage now! http://join.msn.com/?PAGE=features/es From mjk at sdsc.edu Fri Dec 26 16:46:22 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Fri, 26 Dec 2003 16:46:22 -0800 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails In-Reply-To: <BAY1-F88Kxt8zPdqJL900052b1b@hotmail.com> References: <BAY1-F88Kxt8zPdqJL900052b1b@hotmail.com> Message-ID: <1759D2DF-3806-11D8-98D0-000A95DA5638@sdsc.edu> Not sure if this answers your question. But.. The <eval></eval> blocks are for code to be run on the kickstart server (the one the generates the kickstart file). Code outside of the eval blocks is run on the kickstarting host. -mjk On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote: > The line: > > chkconfig --level 3 gpm on > > works great from the command line, not in extend-compute.xml. Thanks > for the new tool though, always glad. The line above is in a block > without <eval shell="bash"> tags. I'll keep trying and rtm. Is it > possible this is a 2.6.2 issue? The live environment restricts me > from using a more recent version. > > >> From: "Mason J. Katz" <mjk at sdsc.edu> >> To: "Reed Scarce" <junkscarce at hotmail.com> >> CC: npaci-rocks-discussion at sdsc.edu >> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation >> fails >> Date: Tue, 23 Dec 2003 16:35:13 -0800 >> >> "man chkconfig" >> >> If you use chkconfig you do not need to create the rc*.d/* files and >> they are put in place for you. >> >> -mjk >>
  • 265.
    >> On Dec23, 2003, at 3:43 PM, Reed Scarce wrote: >> >>> Within /export/home/install/profiles/2.3.2/site-nodes >>> extend-compute.xml lies code like this commented code: >>> <post> >>> /bin/mkdir /mnt/plc/ <-- works --> >>> /bin/mkdir /mnt/plc/plc_data <-- works --> >>> /bin/ln -s /mnt/plc_data /data1 <-- works --> >>> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, >>> source exists --> >>> </post> >>> >>> I don't understand why the ln to a directory succeeds but a ln to a >>> script fails. >>> >>> BTW, Dr. Landman, I've attempted to use your build.pl but it seems >>> to faill with: >>> Can't stat >>> `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm . >>> (my note: the path ends at RPMS) I swear I thought I saw a solution >>> to this once but I can't find it again. >>> Upon reinstallation with the file your tool created >>> (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) >>> anaconda threw back the exception: Traceback (innermost last): file >>> "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, >>> configFileData) File >>> "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line >>> 427 in run >>> ok save debug >>> >>> >>> TIA Reed Scarce >>> >>> _________________________________________________________________ >>> Tired of slow downloads? Compare online deals from your local >>> high-speed providers now. https://broadband.msn.com >> > > _________________________________________________________________ > Worried about inbox overload? Get MSN Extra Storage now! > http://join.msn.com/?PAGE=features/es From apseyed at bu.edu Sat Dec 27 12:32:40 2003 From: apseyed at bu.edu (apseyed at bu.edu) Date: Sat, 27 Dec 2003 15:32:40 -0500 Subject: [Rocks-Discuss]Re: npaci-rocks-discussion digest, Vol 1 #663 - 2 msgs In-Reply-To: <200312272013.hBRKDbJ15227@postal.sdsc.edu> References: <200312272013.hBRKDbJ15227@postal.sdsc.edu> Message-ID: <1072557160.3fedec68d07d6@www.bu.edu> For what its worth, Why don't you try specifying the absolute path (/sbin/chkconfig) and setting debug flags and output file. (If you can confirm /sbin is in $PATH during the life of the script nevermind the first suggestion.) echo "got to chkconfig beginning" > /tmp/ks.log
  • 266.
    /sbin/chkconfig --level 3gpm on echo "go to chkconfig end" >> /tmp/ks.log /sbin/chkconfig --list | grep gpm >> /tmp/ks.log -Patrice Quoting npaci-rocks-discussion-request at sdsc.edu: > Send npaci-rocks-discussion mailing list submissions to > npaci-rocks-discussion at sdsc.edu > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion > or, via email, send a message with subject or body 'help' to > npaci-rocks-discussion-request at sdsc.edu > > You can reach the person managing the list at > npaci-rocks-discussion-admin at sdsc.edu > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of npaci-rocks-discussion digest..." > > > Today's Topics: > > 1. Re: Extend-compute.xml issue, ln creation fails (Reed Scarce) > 2. Re: Extend-compute.xml issue, ln creation fails (Mason J. > Katz) > > --__--__-- > > Message: 1 > From: "Reed Scarce" <junkscarce at hotmail.com> > To: mjk at sdsc.edu > Cc: npaci-rocks-discussion at sdsc.edu > Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation > fails > Date: Fri, 26 Dec 2003 23:35:57 +0000 > > The line: > > chkconfig --level 3 gpm on > > works great from the command line, not in extend-compute.xml. Thanks > for > the new tool though, always glad. The line above is in a block > without > <eval shell="bash"> tags. I'll keep trying and rtm. Is it possible > this is > a 2.6.2 issue? The live environment restricts me from using a more > recent > version. > > > >From: "Mason J. Katz" <mjk at sdsc.edu> > >To: "Reed Scarce" <junkscarce at hotmail.com> > >CC: npaci-rocks-discussion at sdsc.edu > >Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation
  • 267.
    > fails > >Date: Tue, 23 Dec 2003 16:35:13 -0800 > > > >"man chkconfig" > > > >If you use chkconfig you do not need to create the rc*.d/* files and > they > >are put in place for you. > > > > -mjk > > > >On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote: > > > >>Within /export/home/install/profiles/2.3.2/site-nodes > extend-compute.xml > >>lies code like this commented code: > >><post> > >>/bin/mkdir /mnt/plc/ <-- works --> > >>/bin/mkdir /mnt/plc/plc_data <-- works --> > >>/bin/ln -s /mnt/plc_data /data1 <-- works --> > >>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to > ln, > >>source exists --> > >></post> > >> > >>I don't understand why the ln to a directory succeeds but a ln to a > script > >>fails. > >> > >>BTW, Dr. Landman, I've attempted to use your build.pl but it seems > to > >>faill with: > >>Can't stat > >>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm > . > >>(my note: the path ends at RPMS) I swear I thought I saw a > solution to > >>this once but I can't find it again. > >>Upon reinstallation with the file your tool created > >>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) > anaconda > >>threw back the exception: Traceback (innermost last): file > >>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, > >>configFileData) File > >>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line > 427 in > >>run > >>ok save debug > >> > >> > >>TIA Reed Scarce > >> > >>_________________________________________________________________ > >>Tired of slow downloads? Compare online deals from your local > high-speed > >>providers now. https://broadband.msn.com > > > > _________________________________________________________________
  • 268.
    > Worried about inbox overload? Get MSN Extra Storage now! > http://join.msn.com/?PAGE=features/es > > > --__--__-- > > Message: 2 > Cc: npaci-rocks-discussion at sdsc.edu > From: "Mason J. Katz" <mjk at sdsc.edu> > Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation > fails > Date: Fri, 26 Dec 2003 16:46:22 -0800 > To: "Reed Scarce" <junkscarce at hotmail.com> > > Not sure if this answers your question. But.. > > The <eval></eval> blocks are for code to be run on the kickstart > server > (the one the generates the kickstart file). Code outside of the eval > > blocks is run on the kickstarting host. > > -mjk > > > On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote: > > > The line: > > > > chkconfig --level 3 gpm on > > > > works great from the command line, not in extend-compute.xml. > Thanks > > for the new tool though, always glad. The line above is in a block > > > without <eval shell="bash"> tags. I'll keep trying and rtm. Is it > > > possible this is a 2.6.2 issue? The live environment restricts me > > > from using a more recent version. > > > > > >> From: "Mason J. Katz" <mjk at sdsc.edu> > >> To: "Reed Scarce" <junkscarce at hotmail.com> > >> CC: npaci-rocks-discussion at sdsc.edu > >> Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation > > >> fails > >> Date: Tue, 23 Dec 2003 16:35:13 -0800 > >> > >> "man chkconfig" > >> > >> If you use chkconfig you do not need to create the rc*.d/* files > and > >> they are put in place for you. > >> > >> -mjk > >> > >> On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote:
  • 269.
    > >> > >>> Within /export/home/install/profiles/2.3.2/site-nodes > >>> extend-compute.xml lies code like this commented code: > >>> <post> > >>> /bin/mkdir /mnt/plc/ <-- works --> > >>> /bin/mkdir /mnt/plc/plc_data <-- works --> > >>> /bin/ln -s /mnt/plc_data /data1 <-- works --> > >>> /bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to > ln, > >>> source exists --> > >>> </post> > >>> > >>> I don't understand why the ln to a directory succeeds but a ln to > a > >>> script fails. > >>> > >>> BTW, Dr. Landman, I've attempted to use your build.pl but it > seems > >>> to faill with: > >>> Can't stat > >>> `/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm > . > >>> (my note: the path ends at RPMS) I swear I thought I saw a > solution > >>> to this once but I can't find it again. > >>> Upon reinstallation with the file your tool created > >>> (/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) > >>> anaconda threw back the exception: Traceback (innermost last): > file > >>> "/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, > > >>> configFileData) File > >>> "/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", > line > >>> 427 in run > >>> ok save debug > >>> > >>> > >>> TIA Reed Scarce > >>> > >>> > _________________________________________________________________ > >>> Tired of slow downloads? Compare online deals from your local > >>> high-speed providers now. https://broadband.msn.com > >> > > > > _________________________________________________________________ > > Worried about inbox overload? Get MSN Extra Storage now! > > http://join.msn.com/?PAGE=features/es > > > > --__--__-- > > _______________________________________________ > npaci-rocks-discussion mailing list > npaci-rocks-discussion at sdsc.edu > http://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion >
  • 270.
    > > End ofnpaci-rocks-discussion Digest > From rocks_india at yahoo.co.in Sat Dec 27 20:20:40 2003 From: rocks_india at yahoo.co.in (=?iso-8859-1?q?Rocks=20India?=) Date: Sun, 28 Dec 2003 04:20:40 +0000 (GMT) Subject: [Rocks-Discuss]Rocks 3.0 Newbeeeeeeee Message-ID: <20031228042040.88990.qmail@web8301.mail.in.yahoo.com> Hello All, I am new to Rocks, i was able to download and install Rocks 3.0. I am not sure if Globus 3.0 gets installed during the installation process.I tried to use simple ca commands and get command not found error. Do i need to download Globus Tool Kit and install it or would it be installed along with rocks. Or can any one direct me to a site or give me steps that need to be taken after installing rocks what need to be done for manipulating globus Rocks-India ________________________________________________________________________ Yahoo! India Matrimony: Find your partner online. Go to http://yahoo.shaadi.com From bruno at rocksclusters.org Sat Dec 27 21:35:28 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Sat, 27 Dec 2003 21:35:28 -0800 Subject: [Rocks-Discuss]Rocks 3.0 Newbeeeeeeee In-Reply-To: <20031228042040.88990.qmail@web8301.mail.in.yahoo.com> References: <20031228042040.88990.qmail@web8301.mail.in.yahoo.com> Message-ID: <A4E388DE-38F7-11D8-9E96-000A95C4E3B4@rocksclusters.org> > I am new to Rocks, i was able to download > and > install Rocks 3.0. I am not sure if Globus 3.0 gets > installed during the installation process.I tried to > use simple ca commands and get command not found > error. > Do i need to download Globus Tool Kit and > install it or would it be installed along with rocks. > > Or can any one direct me to a site or give me steps > that > need to be taken after installing rocks what need to > be done for manipulating globus here's the steps, but it would require reinstalling your frontend: go to:
  • 271.
    http://www.rocksclusters.org/rocks-documentation/3.1.0/iso-images.html and download: Rocks Base, HPC Roll, SGE Roll and the Grid Roll then burn them all to CD. the follow the directions at: http://www.rocksclusters.org/rocks-documentation/3.1.0/install- frontend.html but, before you get started, you should consult this page too: http://rocks.npaci.edu/roll-documentation/grid/3.0/adding-the-roll.html at the end of the process, your frontend will be configured with globus. - gb From ramonjt at ucia.gov Mon Dec 29 09:08:45 2003 From: ramonjt at ucia.gov (ramonjt) Date: Mon, 29 Dec 2003 12:08:45 -0500 Subject: [Rocks-Discuss]Rocks 3.1.0 Message-ID: <3FF05F9D.F6A122F@ucia.gov> Folks, Which set of Rocks 3.1.0 downloads support Xeon Processors, "Pentium and Athlon" or "Itanium"? Thanks, Ramon From bruno at rocksclusters.org Mon Dec 29 09:31:56 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 29 Dec 2003 09:31:56 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 In-Reply-To: <3FF05F9D.F6A122F@ucia.gov> References: <3FF05F9D.F6A122F@ucia.gov> Message-ID: <E60E6664-3A24-11D8-9E96-000A95C4E3B4@rocksclusters.org> > Which set of Rocks 3.1.0 downloads support Xeon Processors, "Pentium > and Athlon" or "Itanium"? xeons are x86 processors -- so you want the ISO images found under the section: Software for x86 (Pentium and Athlon) - gb
  • 272.
    From landman atscalableinformatics.com Mon Dec 29 10:49:49 2003 From: landman at scalableinformatics.com (landman) Date: Mon, 29 Dec 2003 13:49:49 -0500 Subject: [Rocks-Discuss]3.1.0 surprises Message-ID: <20031229183225.M11961@scalableinformatics.com> Pulled the distro. Burned it after checking md5's. Ok. Booted/installed test cluster, completely vanilla, just defaults. SSH is too slow. Wow. 5-10 seconds to log in. Ok, now out at a customer site with the disks. Unhappily discovered that the following are missing: a) md (e.g. Software RAID): Just try to build one. Anaconda will happily let you do this ... though it will die in the formatting stages. Dropping into the shell (Alt-F2) and looking for the md module (lsmod) shows nothing. Insmod the md also doesn't do anything. Catting /proc/devices shows no md as a character or block device. If md is really not there anymore, it should be removed from anaconda, just like ... b) ext3. There is no ext3 available for the install. Also discovered how incredibly fragile anaconda is. In order to install, you have to wipe the disks. It will not install if there is an md (software raid) device, chosing instead to crap out after you have entered in all the information. To say that this is annoying is a slight understatement. This is an anaconda issue, not a ROCKS issue, though as a result of this issue, ROCKS is less functional than it could be. I also noted that there is no xfs option. This means that I will need to hack new kernels later on after the install. Moreover, I will also need to turn on the ext3 journaling features later on (post install). Hopefully 3.1.1 or 3.2 will fix some of these things. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 From junkscarce at hotmail.com Mon Dec 29 15:15:52 2003 From: junkscarce at hotmail.com (Reed Scarce) Date: Mon, 29 Dec 2003 23:15:52 +0000 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails Message-ID: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com>
  • 273.
    Are there anyexamples of Rocks 2.3.2 extend-compute.xml scripts that work? I need to know the limitations of the distribution. As far as I can tell the commands are available (`which command`locates the commands fine) but they don't necessarily perform the job as expected. I had seen the `eval...` clairification in the archives. As it stands I plan to mkdir, ln and echo in the extend-c... but then run the heart of the customization (scripted) once the nodes are up. It just doesn't seem to be what was intended. As always, thanks for your help --Reed >From: "Mason J. Katz" <mjk at sdsc.edu> >To: "Reed Scarce" <junkscarce at hotmail.com> >CC: npaci-rocks-discussion at sdsc.edu >Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails >Date: Fri, 26 Dec 2003 16:46:22 -0800 > >Not sure if this answers your question. But.. > >The <eval></eval> blocks are for code to be run on the kickstart server >(the one the generates the kickstart file). Code outside of the eval >blocks is run on the kickstarting host. > > -mjk > > >On Dec 26, 2003, at 3:35 PM, Reed Scarce wrote: > >>The line: >> >>chkconfig --level 3 gpm on >> >>works great from the command line, not in extend-compute.xml. Thanks for >>the new tool though, always glad. The line above is in a block without >><eval shell="bash"> tags. I'll keep trying and rtm. Is it possible this >>is a 2.6.2 issue? The live environment restricts me from using a more >>recent version. >> >> >>>From: "Mason J. Katz" <mjk at sdsc.edu> >>>To: "Reed Scarce" <junkscarce at hotmail.com> >>>CC: npaci-rocks-discussion at sdsc.edu >>>Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails >>>Date: Tue, 23 Dec 2003 16:35:13 -0800 >>> >>>"man chkconfig" >>> >>>If you use chkconfig you do not need to create the rc*.d/* files and they >>>are put in place for you. >>> >>> -mjk >>> >>>On Dec 23, 2003, at 3:43 PM, Reed Scarce wrote: >>> >>>>Within /export/home/install/profiles/2.3.2/site-nodes extend-compute.xml >>>>lies code like this commented code:
  • 274.
    >>>><post> >>>>/bin/mkdir /mnt/plc/ <--works --> >>>>/bin/mkdir /mnt/plc/plc_data <-- works --> >>>>/bin/ln -s /mnt/plc_data /data1 <-- works --> >>>>/bin/ln /etc/rc.d/init.d/gpm /etc/rc.d/rc3.d/S15gpm <-- fails to ln, >>>>source exists --> >>>></post> >>>> >>>>I don't understand why the ln to a directory succeeds but a ln to a >>>>script fails. >>>> >>>>BTW, Dr. Landman, I've attempted to use your build.pl but it seems to >>>>faill with: >>>>Can't stat >>>>`/usr/src/redhat/RPMS/noarch//finishing-server-"3.0"-1.noarch.rpm . >>>>(my note: the path ends at RPMS) I swear I thought I saw a solution to >>>>this once but I can't find it again. >>>>Upon reinstallation with the file your tool created >>>>(/usr/src/RedHat/RPMS/i386/finishing-scripts-3.00-1.i386.rpm) anaconda >>>>threw back the exception: Traceback (innermost last): file >>>>"/usr/bin/anaconda.real", line 633, in ? intf.run(id, dispatch, >>>>configFileData) File >>>>"/usr/src/build.90289-i386/install//usr/lib/anaconda/text.py", line 427 >>>>in run >>>>ok save debug >>>> >>>> >>>>TIA Reed Scarce >>>> >>>>_________________________________________________________________ >>>>Tired of slow downloads? Compare online deals from your local high-speed >>>>providers now. https://broadband.msn.com >>> >> >>_________________________________________________________________ >>Worried about inbox overload? Get MSN Extra Storage now! >>http://join.msn.com/?PAGE=features/es > _________________________________________________________________ Make your home warm and cozy this winter with tips from MSN House & Home. http://special.msn.com/home/warmhome.armx From dlane at ap.stmarys.ca Mon Dec 29 15:44:23 2003 From: dlane at ap.stmarys.ca (Dave Lane) Date: Mon, 29 Dec 2003 19:44:23 -0400 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails In-Reply-To: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com> Message-ID: <5.2.0.9.0.20031229194312.01ed0f40@ap.stmarys.ca> At 11:15 PM 12/29/2003 +0000, Reed Scarce wrote: >Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that work? Reed, Below is a script that worked fine for me (with 2.3.2). What it does should be fairly explanatory...Dave
  • 275.
    --->>> <post> <!-- Insert your post installation script here. This code will be executed on the destination node after the packages have been installed. Typically configuration files are built and services setup in this section. --> mv /usr/local /usr/local-old ln -s /home/local /usr/local ln -s /home/opt/intel /opt/intel ln -s /home/disc15 /disc15 mkdir /scratch/tmp chmod 1777 /scratch/tmp echo '#!/bin/bash' > /etc/init.d/wait echo 'sleep 60' >> /etc/init.d/wait chmod +x /etc/init.d/wait ln -s /etc/init.d/wait /etc/rc3.d/S11wait ln -s /etc/init.d/wait /etc/rc4.d/S11wait ln -s /etc/init.d/wait /etc/rc5.d/S11wait <eval sh="python"> <!-- This is python code that will be executed on the frontend node during kickstart generation. You may contact the database, make network queries, etc. These sections are generally used to help build more complex configuration files. The 'sh' attribute may point to any language interpreter such as "bash", "perl", "ruby", etc. --> </eval> </post> From bruno at rocksclusters.org Mon Dec 29 19:03:25 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 29 Dec 2003 19:03:25 -0800 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <20031229183225.M11961@scalableinformatics.com> References: <20031229183225.M11961@scalableinformatics.com> Message-ID: <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> > Pulled the distro. Burned it after checking md5's. Ok. > Booted/installed test > cluster, completely vanilla, just defaults. i'm assuming this is an x86 installation, yes? > SSH is too slow. Wow. 5-10 seconds to log in. that is not the case on our clusters. in fact, we tested this on all three architectures and all three are 'fast'. > Ok, now out at a customer site with the disks. > > Unhappily discovered that the following are missing: >
  • 276.
    > a) md (e.g. Software RAID): Just try to build one. Anaconda will > happily let > you do this ... though it will die in the formatting stages. Dropping > into the > shell (Alt-F2) and looking for the md module (lsmod) shows nothing. > Insmod the > md also doesn't do anything. Catting /proc/devices shows no md as a > character > or block device. > > If md is really not there anymore, it should be removed from anaconda, > just like ... > > b) ext3. There is no ext3 available for the install. > > Also discovered how incredibly fragile anaconda is. In order to > install, you > have to wipe the disks. It will not install if there is an md > (software raid) > device, chosing instead to crap out after you have entered in all the > information. To say that this is annoying is a slight understatement. > This is > an anaconda issue, not a ROCKS issue, though as a result of this > issue, ROCKS is > less functional than it could be. we'll look into the above two issues. > I also noted that there is no xfs option. This means that I will need > to hack > new kernels later on after the install. just curious, is xfs offered as an option on other redhat supported products? also (and i'm assuming this will be no consolation to you, but it may be to others), building a new kernel RPM is straightforward in rocks: http://www.rocksclusters.org/rocks-documentation/3.1.0/customization- kernel.html - gb From landman at scalableinformatics.com Mon Dec 29 19:44:16 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 29 Dec 2003 22:44:16 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> Message-ID: <1072755856.4432.15.camel@protein.scalableinformatics.com> On Mon, 2003-12-29 at 22:03, Greg Bruno wrote: > > Pulled the distro. Burned it after checking md5's. Ok. > > Booted/installed test > > cluster, completely vanilla, just defaults. >
  • 277.
    > i'm assumingthis is an x86 installation, yes? Yes. > > > SSH is too slow. Wow. 5-10 seconds to log in. > > that is not the case on our clusters. in fact, we tested this on all > three architectures and all three are 'fast'. 2 different clusters exhibited the same results. Fixed one by applying dnsmasq to one of them. > > > Ok, now out at a customer site with the disks. > > > > Unhappily discovered that the following are missing: > > > > a) md (e.g. Software RAID): Just try to build one. Anaconda will > > happily let > > you do this ... though it will die in the formatting stages. Dropping > > into the > > shell (Alt-F2) and looking for the md module (lsmod) shows nothing. > > Insmod the > > md also doesn't do anything. Catting /proc/devices shows no md as a > > character > > or block device. > > > > If md is really not there anymore, it should be removed from anaconda, > > just like ... > > > > b) ext3. There is no ext3 available for the install. > > > > Also discovered how incredibly fragile anaconda is. In order to > > install, you > > have to wipe the disks. It will not install if there is an md > > (software raid) > > device, chosing instead to crap out after you have entered in all the > > information. To say that this is annoying is a slight understatement. > > This is > > an anaconda issue, not a ROCKS issue, though as a result of this > > issue, ROCKS is > > less functional than it could be. > > we'll look into the above two issues. Thanks > > > I also noted that there is no xfs option. This means that I will need > > to hack > > new kernels later on after the install. > > just curious, is xfs offered as an option on other redhat supported > products? Nope, nor will Redhat likely do this in the near/mid term. This is fairly common knowledge. All the other major distros do offer Redhat. I hope that the defense of the current state isn't that "Redhat doesn't
  • 278.
    support it". Imight have misunderstood you, but Redhat is almost completely disinterested in clusters, so Redhat supporting/not supporting it is really not relevant. Curiously, cAos which is doing some of the similar things ROCKS is doing in terms of recompiling packages sans Redhat trademarks, has XFS and a number of other useful things in there. Regardless, having ext2 or vfat as your only fs options simply is not reasonable, as both neither of these are really appropriate for very large disks, or big file systems. > > also (and i'm assuming this will be no consolation to you, but it may > be to others), building a new kernel RPM is straightforward in rocks: > > http://www.rocksclusters.org/rocks-documentation/3.1.0/customization- > kernel.html I had been planning to use a similar approach to this. I was/am simply quite surprised that the two options for ROCKS file systems are really not very good, and the good choices are unavailable. In all fairness this is more likely a constraint of anaconda than of ROCKS. I fixed the ext2/ext3 by a reboot after a quick tune2fs session and some fixup of the /etc/fstab. I have to say that I get less and less impressed with anaconda as time goes on. I fixed the partitioning problem (anaconda dies when it runs in an md'ed set of partitions) by wiping the disk and using knoppix to fdisk the disks. Autopartitioning is not an option, as the default choices are not all that good (another anaconda-ism). > > - gb From cdwan at mail.ahc.umn.edu Mon Dec 29 20:58:20 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Mon, 29 Dec 2003 22:58:20 -0600 (CST) Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <1072755856.4432.15.camel@protein.scalableinformatics.com> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> Message-ID: <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> I also encountered the Software RAID problem today. It made upgrading an existing ROCKS cluster a little tricky. Another behavior I noticed was that the CDs were not ejecting as the node installs finished. It was managable, but required watching to prevent the
  • 279.
    endless reinstall cycle. -ChrisDwan From bruno at rocksclusters.org Mon Dec 29 21:48:22 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 29 Dec 2003 21:48:22 -0800 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> Message-ID: <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> > Another behavior I noticed was that the CDs were not ejecting as the > node > installs finished. It was managable, but required watching to prevent > the > endless reinstall cycle. actually, it isn't a problem as the last CD in the frontend will be a roll and rolls are not bootable. - gb From cdwan at mail.ahc.umn.edu Mon Dec 29 21:51:13 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Mon, 29 Dec 2003 23:51:13 -0600 (CST) Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> Message-ID: <Pine.GSO.4.58.0312292350001.4644@lenti.med.umn.edu> > > Another behavior I noticed was that the CDs were not ejecting as the > > node > > installs finished. It was managable, but required watching to prevent > > the > > endless reinstall cycle. > > actually, it isn't a problem as the last CD in the frontend will be a > roll and rolls are not bootable. You're right about the frontend. It was the compute nodes where it gave me trouble. Roll disks never go in those. -Chris Dwan From landman at scalableinformatics.com Mon Dec 29 22:03:06 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 30 Dec 2003 01:03:06 -0500
  • 280.
    Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To:<C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> Message-ID: <1072764186.4469.16.camel@protein.scalableinformatics.com> What I had noticed is that some CD hardware does not eject when prompting for swapping in the roll. I swapped hardware and that fixed it. Rather odd. Seen this in 3 different systems. Worked ok with previous ROCKS. Is it possible to do something like a frontend askmethod akin to the "linux askmethod" and specifically have the ISO's online in a directory somewhere? Just curious... I find it interesting that 10 years after swapping floppies for OS installs, I am now swapping CDs... There is irony here somewhere. On Tue, 2003-12-30 at 00:48, Greg Bruno wrote: > > Another behavior I noticed was that the CDs were not ejecting as the > > node > > installs finished. It was managable, but required watching to prevent > > the > > endless reinstall cycle. > > actually, it isn't a problem as the last CD in the frontend will be a > roll and rolls are not bootable. > > - gb From bruno at rocksclusters.org Mon Dec 29 22:28:45 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 29 Dec 2003 22:28:45 -0800 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <1072764186.4469.16.camel@protein.scalableinformatics.com> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072764186.4469.16.camel@protein.scalableinformatics.com> Message-ID: <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org> > Is it possible to do something like a > > frontend askmethod > > akin to the "linux askmethod" and specifically have the ISO's online in > a directory somewhere? Just curious... the ability to install frontends remotely is at the top of our priority list for the next release.
  • 281.
    > I findit interesting that 10 > years after swapping floppies for OS installs, I am now swapping CDs... > There is irony here somewhere. sorry, i'm going to have to evangelize rolls a bit. joe, do you not have just a bit of appreciation for rolls and what is going on under the sheets? we now have a formal way for you, that's right you, to augment the installation of a cluster. you get to programmatically interact with the installer at virtually any level. you get to tell the installer what bits you want it to lay down and how to configure them. and this is done completely independently of the core. the core has no idea of your bits, yet, it installs it and configures it to your specification. for you, this could be having the 'scalable informatics' roll that contains all your RPMS and XML configuration files. this ISO image could be completely proprietary, yet, the installer installs it. you could ship your roll worldwide and every one of your customers would, within 2 hours, have a scalable informatics cluster online running the applications you sold them. and, you know it would be running because you embedded the correct configuration into the roll. or, perhaps rolls work so smoothly, it just looks like CD swapping. :-) - gb From landman at scalableinformatics.com Mon Dec 29 22:50:30 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 30 Dec 2003 01:50:30 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072764186.4469.16.camel@protein.scalableinformatics.com> <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org> Message-ID: <1072767030.4463.57.camel@protein.scalableinformatics.com> On Tue, 2003-12-30 at 01:28, Greg Bruno wrote: > > There is irony here somewhere. > > sorry, i'm going to have to evangelize rolls a bit. > > joe, do you not have just a bit of appreciation for rolls and what is > going on under the sheets? we now have a formal way for you, that's > right you, to augment the installation of a cluster. you get to > programmatically interact with the installer at virtually any level. > you get to tell the installer what bits you want it to lay down and how > to configure them. and this is done completely independently of the > core. the core has no idea of your bits, yet, it installs it and > configures it to your specification.
  • 282.
    Actually I dohave a pretty good appreciation for them. I see that they are a different way of solving the problems I have been solving for a while using "other methods" (http://scalableinformatics.com/downloads/finishing/finishing-v3.1.0.tar.gz). What I don't see is how to build them (yes, I did see the "source" messages, and "cvs", ...). The major issue for me is going to be anaconda, all its joy and bugs, and what directions its use forces ROCKS to follow (vis-a-vis file systems, etc). > > for you, this could be having the 'scalable informatics' roll that > contains all your RPMS and XML configuration files. this ISO image > could be completely proprietary, yet, the installer installs it. you > could ship your roll worldwide and every one of your customers would, > within 2 hours, have a scalable informatics cluster online running the > applications you sold them. and, you know it would be running because > you embedded the correct configuration into the roll. This is a nice vision, though it is unfortunately a vision. The customer would have to re-install the cluster head node when a new version of the bits comes out. Right? This is simply not tenable for a production cycle facility that needs to upgrade a package. Please let me know if my understanding is incorrect, I would be quite happy to hear this. The "other method" that I developed doesn't have this as a problem. Just re-install the compute nodes, and load the RPM on the head nodes. In fact I built some tools which simplify both the "other method" and the ROCKS method. As I have to worry about multiple different cluster distros (not just ROCKS, sorry, customers get what they need/want), I have to worry about interfacing with that distro. So I have some tools (the auto-build scripts) which simplify adding/removing packages into the extend-compute.xml. What I am hoping for rolls are two things: 1) insertable/removable from a live cluster without forcing a re-install of the head node (compute nodes, thats fine, not the head nodes) 2) simple documentation on how to build. If they are really quite simple, I see no reason I could not take the same tool I use to automate the building of installable RPMS for the other method actually emit a ROCKS roll. But I need to know how to do this. I am not sure I have sufficient time to "read the source, Luke" for this. I would be happy to do this given time, and customer demand/need. The other method had that, hence its development. > > or, perhaps rolls work so smoothly, it just looks like CD swapping. :-) My point was that after inserting the SGE roll, I had to get up from the console, walk over to the unit, swap in the next roll, iterate.... Felt like CD swapping to me. Rolls wont solve other problems which are anaconda specific (file systems, partitioning, formatting, RAID, network detection, etc). As there are multiple similar RHEL de-redhatifying efforts, some of which are drastically improving the installation process (by not using anaconda), are you folks looking to move away from anaconda any time
  • 283.
    soon? > > - gb -- From bruno at rocksclusters.org Mon Dec 29 23:45:52 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Mon, 29 Dec 2003 23:45:52 -0800 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <1072767030.4463.57.camel@protein.scalableinformatics.com> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072764186.4469.16.camel@protein.scalableinformatics.com> <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072767030.4463.57.camel@protein.scalableinformatics.com> Message-ID: <31636F66-3A9C-11D8-9E96-000A95C4E3B4@rocksclusters.org> > This is a nice vision, though it is unfortunately a vision. The > customer would have to re-install the cluster head node when a new > version of the bits comes out. Right? This is simply not tenable for a > production cycle facility that needs to upgrade a package. Please let > me know if my understanding is incorrect, I would be quite happy to > hear > this. we've talked about this on the list and we've talked with you about this in person. you know the above statement is true. you also know it is a future direction for rolls. > What I am hoping for rolls are two things: 1) insertable/removable from > a live cluster without forcing a re-install of the head node (compute > nodes, thats fine, not the head nodes) 2) simple documentation on how > to > build. If they are really quite simple, I see no reason I could not > take the same tool I use to automate the building of installable RPMS > for the other method actually emit a ROCKS roll. But I need to know > how > to do this. I am not sure I have sufficient time to "read the source, > Luke" for this. I would be happy to do this given time, and customer > demand/need. The other method had that, hence its development. a roll developer's guide is in progress. and, as stated above, adding rolls to a live frontend is on our roadmap. > Rolls wont solve other problems which are anaconda specific (file > systems, partitioning, formatting, RAID, network detection, etc). not true. if you wish to get deeply involved with the red hat installer, you can develop a 'patch' roll that will change the installer to do as you wish. > As > there are multiple similar RHEL de-redhatifying efforts, some of which
  • 284.
    > are drasticallyimproving the installation process (by not using > anaconda), are you folks looking to move away from anaconda any time > soon? please educate us -- where can we download these installers and find the developer guides that describe how to interact with the installer. as for moving away from anaconda, i don't think that will happen anytime soon. anaconda has served us well. we have all had issues with the installer, but i would rather work with anaconda rather than reinvent it. the boys and girls at redhat have a vested interest in detecting and configuring the latest hardware and i plan on leveraging that. of the issues you mention above, the only one we don't know how to control yet is file system selection (but, we will look into it per your earlier request). we already manipulate anaconda to partition and format the drives to our specifications, and we have ideas on how to handle RAID and network naming (which is what i think you mean by network detection). - gb From landman at scalableinformatics.com Tue Dec 30 00:55:37 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 30 Dec 2003 03:55:37 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <31636F66-3A9C-11D8-9E96-000A95C4E3B4@rocksclusters.org> References: <20031229183225.M11961@scalableinformatics.com> <BC750EE0-3A74-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072755856.4432.15.camel@protein.scalableinformatics.com> <Pine.GSO.4.58.0312292255360.2986@lenti.med.umn.edu> <C6F470F2-3A8B-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072764186.4469.16.camel@protein.scalableinformatics.com> <6BB0542E-3A91-11D8-9E96-000A95C4E3B4@rocksclusters.org> <1072767030.4463.57.camel@protein.scalableinformatics.com> <31636F66-3A9C-11D8-9E96-000A95C4E3B4@rocksclusters.org> Message-ID: <1072774537.4463.131.camel@protein.scalableinformatics.com> On Tue, 2003-12-30 at 02:45, Greg Bruno wrote: > > This is a nice vision, though it is unfortunately a vision. The > > customer would have to re-install the cluster head node when a new > > version of the bits comes out. Right? This is simply not tenable for a > > production cycle facility that needs to upgrade a package. Please let > > me know if my understanding is incorrect, I would be quite happy to > > hear > > this. > > we've talked about this on the list and we've talked with you about > this in person. you know the above statement is true. you also know it > is a future direction for rolls. I was simply responding to the evangelism which seemed to imply the functionality existed today. It doesn't, and we both agree that it is necessary. Although the vision will provide innumerable benefits ... ROCKS is not there yet, and won't be for a while.
  • 285.
    Thats ok though,as I have a reasonable work around for some of these issues. And when I can insert and delete rolls live into a cluster, I'll modify my tools to emit rolls. Until then, it is as you said, a vision for the future. [...] > a roll developer's guide is in progress. and, as stated above, adding > rolls to a live frontend is on our roadmap. Adding and removing are needed as we have discussed. > > > Rolls wont solve other problems which are anaconda specific (file > > systems, partitioning, formatting, RAID, network detection, etc). > > not true. if you wish to get deeply involved with the red hat > installer, you can develop a 'patch' roll that will change the > installer to do as you wish. I guess I am at a loss to understand what it is you are doing then. If you are telling me I can hack around anaconda to my hearts content, why do you tell me later on that ROCKS is deeply wedded to anaconda and will not change soon? I will assume I am missing something here. Can I replace anaconda? This is what I think you are saying. If you are instead saying, no don't replace, just hack it, I am not sure I want to do that. It is a very large and complex beast, with one system doing the job of many. Jack of all trades. More than half of the pain I have experienced deploying ROCKS is directly attributable to anaconda. I would like to work around it. If I can completely replace it under ROCKS this could be of interest. If I cannot, and ROCKS will always remain closely tied to RedHat specific technology (e.g. anaconda), that is also important to know. > > > As > > there are multiple similar RHEL de-redhatifying efforts, some of which > > are drastically improving the installation process (by not using > > anaconda), are you folks looking to move away from anaconda any time > > soon? > > please educate us -- where can we download these installers and find > the developer guides that describe how to interact with the installer. If you are serious about this, I would be happy to help you find more development info and help make introductions to some of the people doing this stuff. If you are not serious about this, thats fine too. > as for moving away from anaconda, i don't think that will happen > anytime soon. anaconda has served us well. we have all had issues with > the installer, but i would rather work with anaconda rather than > reinvent it. the boys and girls at redhat have a vested interest in > detecting and configuring the latest hardware and i plan on leveraging > that. Knoppix makes good use of the anaconda detection routines without using anaconda. You do not need anaconda in its entirety for the detection routines.
  • 286.
    While Redhat hasa vested interest in making sure it detects hardware well, the software that does it's installation has been getting more and more fragile compared to other installation systems. Simple failures of one item or the other in the SUSE YAST tool, or the Mandrake installer, or for that matter, most of the non-anaconda based installers do not force you to start over from the beginning. Stack traces are not given, and you are not asked to debug an arcane and complex python program from a highly limited command window. You are brought back to a well known and well defined state, and you have a finite and non zero chance of recovering from the failure. This is different than the anaconda experience, where the slightest hiccup, which would be trivially correctable given the opportunity, results in a complete failure of the process. This has resulted in our discovery of the RH9/RHEL fragility and sensitivity (and lack of ability) to software raid, partitioning, and related. This has wasted many hours of our collective time, and the inability to use the upgrade option for those of us with software RAID systems. As ROCKS depends critically upon this bit of technology that you indicate later on is so important, ROCKS happens to share in its pitfalls, even though these are not ROCKS problems. I am not sure if you understand how much time I have to spend explaining to customers and end users why what they are seeing are not ROCKS problems but a Redhat artifacts. Part of the reason I am raising this issue in this forum is that I have spent all together too much time trying to explain this to various users. > of the issues you mention above, the only one we don't know how to > control yet is file system selection (but, we will look into it per > your earlier request). we already manipulate anaconda to partition and > format the drives to our specifications, and we have ideas on how to > handle RAID and network naming (which is what i think you mean by > network detection). Network detection is a) getting the right network driver config 1) by detection 2) from floppy/usb/whatever b) getting the correct network interface ordering (what you call naming) The point you (somewhat whimsicality) made was that I could create Scalable Informatics rolls and ship them around the world for people to use in 2 hours. Great. Good vision, and that is something like what I am looking at. I have that now with my tools, but I can always expand their functionality. Now the problem is, if after shipping out my roll, when my end users install it, anaconda barfs in some new and exciting manner (has happened already with the finishing scripts, and I have worked hard to try to figure out what is broken in anaconda to work around its bugs), who are the customers going to blame? My experience thus far is that ROCKS is taking more than its fair share of heat over bugs that it has nothing to do with.
  • 287.
    From fds atsdsc.edu Tue Dec 30 05:53:48 2003 From: fds at sdsc.edu (fds at sdsc.edu) Date: Tue, 30 Dec 2003 05:53:48 -0800 (PST) Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails In-Reply-To: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com> References: <BAY1-F661p8yTXBgtnm0000a4dd@hotmail.com> Message-ID: <1291.194.125.171.53.1072792428.squirrel@uhura.sdsc.edu> Code in the <post> section of an xml file (extend-compute or otherwise) can be almost anything. When the script is run, the environment is not as full as usual, which is why we always recommend specifying the full path to commands. As you saw, /bin and /usr/bin are in the path, so certain things like "which sed" will work, for example. Remember that everything in the eval tags gets run at kickstart generation time (on the frontend). Everything else (the naked commands in the post section) are run by the node being installed. We do intend for the heart of the customization to be performed at kickstart time. I would be suprised if you had to postpone many tasks until the node was up, although this does happen occasionally. The globus and condor post configuration contain tasks that cannot be done at install time. Send us the scripts in question and we will take a look. -Federico > Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that > work? > I need to know the limitations of the distribution. As far as I can tell > the commands are available (`which command`locates the commands fine) but > they don't necessarily perform the job as expected. I had seen the > `eval...` clairification in the archives. > > As it stands I plan to mkdir, ln and echo in the extend-c... but then run > the heart of the customization (scripted) once the nodes are up. It just > doesn't seem to be what was intended. > > As always, thanks for your help > --Reed > From purikk at hotmail.com Tue Dec 30 06:03:02 2003 From: purikk at hotmail.com (Purushotham Komaravolu) Date: Tue, 30 Dec 2003 09:03:02 -0500 Subject: [Rocks-Discuss]Licensing References: <200312300711.hBU7BeJ14002@postal.sdsc.edu> Message-ID: <BAY1-DAV14HJL2WZcXm0000fc27@hotmail.com> Hi All, I would like to know the list of the components that have to be
  • 288.
    licensed, when weinstall ROCKS as a commercial solution. Thanks Happy Holidays Puru From doug at seismo.berkeley.edu Tue Dec 30 10:53:36 2003 From: doug at seismo.berkeley.edu (Doug Neuhauser) Date: Tue, 30 Dec 2003 10:53:36 -0800 (PST) Subject: [Rocks-Discuss]Rocks 3.1.0 install problems Message-ID: <200312301853.hBUIragp015469@perry.geo.berkeley.edu> I am having a problem upgrading Rocks 2.3.2 to 3.1.0. Both my head node and compute nodes are dual XEON 2.4 GHz boxes. We burned the CDs from the following images: rocks-base-3.1.0.i386.iso roll-hpc-3.1.0-0.i386.iso roll-grid-3.1.0-0.any.iso roll-intel-3.1.0-0.any.iso roll-sge-3.1.0-0.any.iso I verified the md5s both on the downloaded images from the rocks web site and the md5s on the burned cds. They are fine. I have run the upgrade several times -- at least once with all of the rolls, and once with nust the rocks base and hpc roll. I head node installs with no problem using the command frontend upgrade I can login and run insert-ethers, telling it to look for compute nodes. When I power on a compute node, it boots grub, selects the only kernel on its local disk Rocks Reinstall and runs through the /sbin/loader. The blue screen comes up, the compute node requests and receives a dynamic IP address from the head node, but then within a few seconds aborts with the messages: install exited abnormally - received signal 11 sending termination signals ... done sending kill signals ... done disabling swap ... unmounting filesystems ... /proc/bus/usb done /proc done /dev/pts done You may safely reboot your system It appears the the "Rocks Reinstall" kernel on the disk is not compatible with Rocks 3.1.0. When I changed the compute node boot order to perform a PXE boot before the hard disks, it properly downloads the 3.1.0 kernel from the head node, reformats the disk, and installes 3.1.0 properly. I have to catch it in the reboot, and change the boot order to use the disk before PXE, or I get into an infinite loop. Is there any better way to address this problem? The procedure of: set PXE boot first boot from net, install rocks 3.1.0 on disk reboot catch node during reboot, change boot order to floppy,disk,net
  • 289.
    reboot for each nodeis tedious. Did I do something wrong in how I shut my 2.3.2 cluster down before the upgrade? If so, some notes about this in the install instructions would be useful. - Doug N ------------------------------------------------------------------------ Doug Neuhauser University of California, Berkeley doug at seismo.berkeley.edu Berkeley Seismological Laboratory Phone: 510-642-0931 215 McCone Hall # 4760 Fax: 510-643-5811 Berkeley, CA 94720-4760 From bruno at rocksclusters.org Tue Dec 30 11:29:14 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 30 Dec 2003 11:29:14 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <200312301853.hBUIragp015469@perry.geo.berkeley.edu> References: <200312301853.hBUIragp015469@perry.geo.berkeley.edu> Message-ID: <73E3933E-3AFE-11D8-9E96-000A95C4E3B4@rocksclusters.org> On Dec 30, 2003, at 10:53 AM, Doug Neuhauser wrote: > I am having a problem upgrading Rocks 2.3.2 to 3.1.0. > Both my head node and compute nodes are dual XEON 2.4 GHz boxes. > > We burned the CDs from the following images: > rocks-base-3.1.0.i386.iso > roll-hpc-3.1.0-0.i386.iso > roll-grid-3.1.0-0.any.iso > roll-intel-3.1.0-0.any.iso > roll-sge-3.1.0-0.any.iso > I verified the md5s both on the downloaded images from the rocks > web site and the md5s on the burned cds. They are fine. > I have run the upgrade several times -- at least once with all of the > rolls, and once with nust the rocks base and hpc roll. > > I head node installs with no problem using the command > frontend upgrade > I can login and run insert-ethers, telling it to look for compute > nodes. > > When I power on a compute node, it boots grub, selects the only > kernel on its local disk > Rocks Reinstall > and runs through the /sbin/loader. > The blue screen comes up, the compute node requests and receives a > dynamic IP address from the head node, but then within a few seconds > aborts with the messages: > install exited abnormally - received signal 11 > sending termination signals ... done > sending kill signals ... done > disabling swap ... > unmounting filesystems ... > /proc/bus/usb done
  • 290.
    > /proc done > /dev/pts done > You may safely reboot your system > > It appears the the "Rocks Reinstall" kernel on the disk is not > compatible > with Rocks 3.1.0. When I changed the compute node boot order to > perform > a PXE boot before the hard disks, it properly downloads the 3.1.0 > kernel > from the head node, reformats the disk, and installes 3.1.0 properly. > I have to catch it in the reboot, and change the boot order to use the > disk before PXE, or I get into an infinite loop. > > Is there any better way to address this problem? The procedure of: > set PXE boot first > boot from net, install rocks 3.1.0 on disk > reboot > catch node during reboot, change boot order to floppy,disk,net > reboot > for each node is tedious. > > Did I do something wrong in how I shut my 2.3.2 cluster down before the > upgrade? If so, some notes about this in the install instructions > would > be useful. your right, the 2.3.2 installer (anaconda from redhat's version 7.3) is not compatible with the installer on rocks 3.1 (anaconda from redhat's enterprise linux 3.0). the way you will have to reinstall your cluster is one of two ways: 1) if your compute nodes support PXE that is enabled from the keyboard -- that is, when you boot the node, in BIOS you see a message that says "Press F12 for Network Boot (PXE)". if your nodes have that, then you'll have to boot the nodes, one by one and, when you see the message, press the F12 key, then move to the next node. 2) use the rocks base CD to boot each compute node. when insert-ethers reports that it discovered the node, take the CD out and put it in the next compute node. but, if your compute nodes were initially installed with PXE, the fastest way to upgrade the compute nodes is to simply turn all the compute nodes off, upgrade the frontend, run insert-ethers, then turn the compute nodes on one by one. the compute nodes should be set for PXE boot which will pull the installer from the frontend and therefore be updated installer. as you state above, we need to document this. thanks for the bug report. - gb
  • 291.
    From doug atseismo.berkeley.edu Tue Dec 30 11:45:59 2003 From: doug at seismo.berkeley.edu (Doug Neuhauser) Date: Tue, 30 Dec 2003 11:45:59 -0800 (PST) Subject: [Rocks-Discuss]Rocks 3.1.0 install problems Message-ID: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> Greg, 1. I don't have cdroms on my compute nodes, only floppy. :( 2. My boot order on the compute nodes is normally: floppy, disk, PXE 3. I don't have a hot-key override to force PXE boot. I have to change the BIOS boot order to enable PXE boot. > but, if your compute nodes were initially installed with PXE, the > fastest way to upgrade the compute nodes is to simply turn all the > compute nodes off, upgrade the frontend, run insert-ethers, then turn > the compute nodes on one by one. the compute nodes should be set for > PXE boot which will pull the installer from the frontend and therefore > be updated installer. I don't understand this. I can't leave the compute nodes with PXE boot first, or it will create an endless loop. The compute node will boot via PXE, install rocks 3.1.0, and then reboot via PXE and repeat the process ad-nauseum. Can I use the old floppy boot image found at: ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img to force a network boot? The 3.1.0 online manual has a link in the section 1.3 Install your Compute Nodes to ftp://www.rocksclusters.org/pub/rocks/bootnet.img but this does not exist. - Doug N ------------------------------------------------------------------------ Doug Neuhauser University of California, Berkeley doug at seismo.berkeley.edu Berkeley Seismological Laboratory Phone: 510-642-0931 215 McCone Hall # 4760 Fax: 510-643-5811 Berkeley, CA 94720-4760 From junkscarce at hotmail.com Tue Dec 30 11:57:16 2003 From: junkscarce at hotmail.com (Reed Scarce) Date: Tue, 30 Dec 2003 19:57:16 +0000 Subject: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails Message-ID: <BAY1-F39DSuCcN0o41B0005872e@hotmail.com> I tested your echo ... wait and ln wait... S11wait lines. They worked perfectly. Then I tried the same with gpm and left wait in the script. Wait worked as before, and gpm didn't work - like before. I've given up on doing anything very fancy and have started to make a script to run the first time it boots, with hand removal. Thanks for the perspective, --Reed
  • 292.
    >From: Dave Lane<dlane at ap.stmarys.ca> >To: "Reed Scarce" <junkscarce at hotmail.com> >CC: npaci-rocks-discussion at sdsc.edu >Subject: Re: [Rocks-Discuss]Extend-compute.xml issue, ln creation fails >Date: Mon, 29 Dec 2003 19:44:23 -0400 > >At 11:15 PM 12/29/2003 +0000, Reed Scarce wrote: >>Are there any examples of Rocks 2.3.2 extend-compute.xml scripts that >>work? > >Reed, > >Below is a script that worked fine for me (with 2.3.2). What it does should >be fairly explanatory...Dave > >--->>> > ><post> > <!-- Insert your post installation script here. This > code will be executed on the destination node after the > packages have been installed. Typically configuration files > are built and services setup in this section. --> > >mv /usr/local /usr/local-old >ln -s /home/local /usr/local >ln -s /home/opt/intel /opt/intel >ln -s /home/disc15 /disc15 >mkdir /scratch/tmp >chmod 1777 /scratch/tmp >echo '#!/bin/bash' > /etc/init.d/wait >echo 'sleep 60' >> /etc/init.d/wait >chmod +x /etc/init.d/wait >ln -s /etc/init.d/wait /etc/rc3.d/S11wait >ln -s /etc/init.d/wait /etc/rc4.d/S11wait >ln -s /etc/init.d/wait /etc/rc5.d/S11wait > > <eval sh="python"> > <!-- This is python code that will be executed on > the frontend node during kickstart generation. You > may contact the database, make network queries, etc. > These sections are generally used to help build > more complex configuration files. > The 'sh' attribute may point to any language interpreter > such as "bash", "perl", "ruby", etc. > --> > </eval> ></post> > _________________________________________________________________ Get reliable dial-up Internet access now with our limited-time introductory offer. http://join.msn.com/?page=dept/dialup From landman at scalableinformatics.com Tue Dec 30 12:01:44 2003 From: landman at scalableinformatics.com (Joe Landman)
  • 293.
    Date: Tue, 30Dec 2003 15:01:44 -0500 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> Message-ID: <1072814503.4469.196.camel@protein.scalableinformatics.com> Hi Doug: As long as pxe is in there, you should be able to do this (semi)-automatically. All you need to do is to wipe the partition tables and boot sectors of the compute nodes. I seem to remember a really simply single floppy that did this. See http://paud.sourceforge.net/ and http://dban.sourceforge.net/ I think dban is the right one. After that (only on compute nodes) you should be able to pxe boot. Joe On Tue, 2003-12-30 at 14:45, Doug Neuhauser wrote: > Greg, > > 1. I don't have cdroms on my compute nodes, only floppy. :( > 2. My boot order on the compute nodes is normally: > floppy, disk, PXE > 3. I don't have a hot-key override to force PXE boot. > I have to change the BIOS boot order to enable PXE boot. > > > but, if your compute nodes were initially installed with PXE, the > > fastest way to upgrade the compute nodes is to simply turn all the > > compute nodes off, upgrade the frontend, run insert-ethers, then turn > > the compute nodes on one by one. the compute nodes should be set for > > PXE boot which will pull the installer from the frontend and therefore > > be updated installer. > > I don't understand this. > > I can't leave the compute nodes with PXE boot first, or it will create an > endless loop. The compute node will boot via PXE, install rocks 3.1.0, > and then reboot via PXE and repeat the process ad-nauseum. > > Can I use the old floppy boot image found at: > ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img > to force a network boot? > > The 3.1.0 online manual has a link in the section > 1.3 Install your Compute Nodes > to ftp://www.rocksclusters.org/pub/rocks/bootnet.img > but this does not exist. > > - Doug N > ------------------------------------------------------------------------ > Doug Neuhauser University of California, Berkeley > doug at seismo.berkeley.edu Berkeley Seismological Laboratory > Phone: 510-642-0931 215 McCone Hall # 4760 > Fax: 510-643-5811 Berkeley, CA 94720-4760
  • 294.
    From bruno atrocksclusters.org Tue Dec 30 12:07:34 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 30 Dec 2003 12:07:34 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> Message-ID: <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> On Dec 30, 2003, at 11:45 AM, Doug Neuhauser wrote: > Greg, > > 1. I don't have cdroms on my compute nodes, only floppy. :( > 2. My boot order on the compute nodes is normally: > floppy, disk, PXE > 3. I don't have a hot-key override to force PXE boot. > I have to change the BIOS boot order to enable PXE boot. > >> but, if your compute nodes were initially installed with PXE, the >> fastest way to upgrade the compute nodes is to simply turn all the >> compute nodes off, upgrade the frontend, run insert-ethers, then turn >> the compute nodes on one by one. the compute nodes should be set for >> PXE boot which will pull the installer from the frontend and therefore >> be updated installer. > > I don't understand this. i'll try to a better explanation. when compute nodes are installed via PXE, rocks detects this and manipulates the boot sector of the disk drive on the compute node that makes the disk non-bootable. that way, if the compute node is reset, it will try to PXE boot. it will PXE boot even if your boot order is: hard disk, cd/floppy, PXE. this occurs because the hard disk is non-bootable so the BIOS boot loader will skip the hard disk and move on to the other boot devices. > I can't leave the compute nodes with PXE boot first, or it will create > an > endless loop. The compute node will boot via PXE, install rocks 3.1.0, > and then reboot via PXE and repeat the process ad-nauseum. > > Can I use the old floppy boot image found at: > ftp://rocksclusters.org/pub/rocks/current/i386/bootnet.img > to force a network boot? > > The 3.1.0 online manual has a link in the section > 1.3 Install your Compute Nodes > to ftp://www.rocksclusters.org/pub/rocks/bootnet.img > but this does not exist. we are no longer supporting the boot floppy as it was problematic to make one that contained the appropriate device drivers that worked on most compute nodes. - gb
  • 295.
    From doug atseismo.berkeley.edu Tue Dec 30 12:28:46 2003 From: doug at seismo.berkeley.edu (Doug Neuhauser) Date: Tue, 30 Dec 2003 12:28:46 -0800 (PST) Subject: [Rocks-Discuss]Rocks 3.1.0 install problems Message-ID: <200312302028.hBUKSkgp017318@perry.geo.berkeley.edu> Greg, Thanks for the detailed boot/reboot explaination. My problem dates back to my intitial rocks 2.3.2 installation. My compute node motherboards have 3 ethernet interfaces (1 100Mb, 2 1Gb), but initially only the 100 Mb supported PXE. When I used that for PXE boot, Linux would then remap the interfaces so that it tried to use one of the Gbit interfaces on the next reboot. Needless to say, the head node did not respond to DHCP because the MAC address was unknown to it. My solution was to get a new BIOS from Tyan that supported PXE on all interfaces. However, since my cluster was initially installed using the boot floppy, my compute nodes have the vestiges of floppy boot config, not PXE boot config. I'll try Joe Landman's suggestion of a scrub floppy to scrub the boot sector of the boot disk on the compute nodes. If I can't do that, I CAN go through the manual process of setting and resetting the boot order on each compute node, but it is a slow and sequential process. - Doug N ------------------------------------------------------------------------ Doug Neuhauser University of California, Berkeley doug at seismo.berkeley.edu Berkeley Seismological Laboratory Phone: 510-642-0931 215 McCone Hall # 4760 Fax: 510-643-5811 Berkeley, CA 94720-4760 From sjenks at uci.edu Tue Dec 30 12:37:26 2003 From: sjenks at uci.edu (Stephen Jenks) Date: Tue, 30 Dec 2003 12:37:26 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> Message-ID: <FAA5CF63-3B07-11D8-BF62-000A95B96C68@uci.edu> On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote: > when compute nodes are installed via PXE, rocks detects this and > manipulates the boot sector of the disk drive on the compute node that > makes the disk non-bootable. that way, if the compute node is reset, > it will try to PXE boot. it will PXE boot even if your boot order is: > hard disk, cd/floppy, PXE. this occurs because the hard disk is > non-bootable so the BIOS boot loader will skip the hard disk and move > on to the other boot devices. Hi Greg, et al. Is there any way to force this behavior even if I initially used a CD to install the compute nodes? My nodes are capable of PXE boot, but
  • 296.
    since I didn'tuse that, I presume they didn't do the non-bootable disk trick upon install. Now that I'm clear about how the PXE install works, I'd prefer to move to that, but don't really want to have to corrupt the disks to cause the PXE install. The nodes are currently loaded with 3.0, so perhaps that will work with 3.1's kickstart, but I'm curious about the PXE issue. Thanks, Steve Jenks From bruno at rocksclusters.org Tue Dec 30 12:48:08 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 30 Dec 2003 12:48:08 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <FAA5CF63-3B07-11D8-BF62-000A95B96C68@uci.edu> References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> <FAA5CF63-3B07-11D8- BF62-000A95B96C68@uci.edu> Message-ID: <796DAC35-3B09-11D8-9E96-000A95C4E3B4@rocksclusters.org> On Dec 30, 2003, at 12:37 PM, Stephen Jenks wrote: > > On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote: >> when compute nodes are installed via PXE, rocks detects this and >> manipulates the boot sector of the disk drive on the compute node >> that makes the disk non-bootable. that way, if the compute node is >> reset, it will try to PXE boot. it will PXE boot even if your boot >> order is: hard disk, cd/floppy, PXE. this occurs because the hard >> disk is non-bootable so the BIOS boot loader will skip the hard disk >> and move on to the other boot devices. > > Hi Greg, et al. > > Is there any way to force this behavior even if I initially used a CD > to install the compute nodes? My nodes are capable of PXE boot, but > since I didn't use that, I presume they didn't do the non-bootable > disk trick upon install. Now that I'm clear about how the PXE install > works, I'd prefer to move to that, but don't really want to have to > corrupt the disks to cause the PXE install. > > The nodes are currently loaded with 3.0, so perhaps that will work > with 3.1's kickstart, but I'm curious about the PXE issue. 3.0 is based on redhat 7.3 and 3.1 is based on redhat enterprise linux 3.0 -- so you'll hit a similar problem as doug did when you perform an upgrade. give me a bit of time to cook up a procedure for forcing your compute nodes to PXE boot. - gb
  • 297.
    From cdwan atmail.ahc.umn.edu Tue Dec 30 14:22:18 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Tue, 30 Dec 2003 16:22:18 -0600 (CST) Subject: [Rocks-Discuss]NIS outside, 411 inside? Message-ID: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> Is there a preferred way to have the 411 server on the head node replicate information (passwd and auto.whatever) from an external NIS server to the compute nodes? It seems to me that a cron job like the one below does the trick, but it feels crufty to me: ypcat passwd > yp.passwd; cat /etc/passwd yp.passwd > 411.passwd ** build the 411 distributed passwd from the file above instead of ** /etc/passwd. I'd love to hear suggestions for a more elegant solution. -Chris Dwan The University of Minnesota From bruno at rocksclusters.org Tue Dec 30 15:16:36 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 30 Dec 2003 15:16:36 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <FAA5CF63-3B07-11D8-BF62-000A95B96C68@uci.edu> References: <200312301945.hBUJjxgp016489@perry.geo.berkeley.edu> <CEAFCC25-3B03-11D8-9E96-000A95C4E3B4@rocksclusters.org> <FAA5CF63-3B07-11D8- BF62-000A95B96C68@uci.edu> Message-ID: <3737B584-3B1E-11D8-9E96-000A95C4E3B4@rocksclusters.org> On Dec 30, 2003, at 12:37 PM, Stephen Jenks wrote: > > On Dec 30, 2003, at 12:07 PM, Greg Bruno wrote: >> when compute nodes are installed via PXE, rocks detects this and >> manipulates the boot sector of the disk drive on the compute node >> that makes the disk non-bootable. that way, if the compute node is >> reset, it will try to PXE boot. it will PXE boot even if your boot >> order is: hard disk, cd/floppy, PXE. this occurs because the hard >> disk is non-bootable so the BIOS boot loader will skip the hard disk >> and move on to the other boot devices. > > Hi Greg, et al. > > Is there any way to force this behavior even if I initially used a CD > to install the compute nodes? My nodes are capable of PXE boot, but > since I didn't use that, I presume they didn't do the non-bootable > disk trick upon install. Now that I'm clear about how the PXE install > works, I'd prefer to move to that, but don't really want to have to > corrupt the disks to cause the PXE install. > > The nodes are currently loaded with 3.0, so perhaps that will work > with 3.1's kickstart, but I'm curious about the PXE issue. here's a procedure to ensure that your non-3.1.0 compute nodes PXE
  • 298.
    install after afrontend upgrade. this assumes your compute nodes support PXE installs. before you upgrade the frontend, login to the frontend and execute: # ssh-agent $SHELL # ssh-add # cluster-fork 'touch /boot/grub/pxe-install' # cluster-fork '/boot/kickstart/cluster-kickstart --start' # cluster-fork '/sbin/chkconfig --del rocks-grub' now you can shutdown your compute nodes. then upgrade your frontend. after you login to your new frontend, run insert-ethers, then reset each compute node, one at a time. doug, you'll have a bit harder time. if you can find a bootable floppy, after the compute node boots, you can chroot to the root partition on the disk and run the three cluster-fork commands above. i apologize for making this procedure tough on you. - gb From mjk at sdsc.edu Tue Dec 30 15:32:20 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 30 Dec 2003 15:32:20 -0800 Subject: [Rocks-Discuss]Licensing In-Reply-To: <BAY1-DAV14HJL2WZcXm0000fc27@hotmail.com> References: <200312300711.hBU7BeJ14002@postal.sdsc.edu> <BAY1- DAV14HJL2WZcXm0000fc27@hotmail.com> Message-ID: <69879D2D-3B20-11D8-98D0-000A95DA5638@sdsc.edu> Nothing! Rocks is entirely open source with various GNU, BSD, Artistic, etc open source licenses attached. The underlying RedHat OS (as of Rocks 3.1.0 -- available now) is recompiled from RedHat's publicly available SRPMS. You of course welcome to send us money and hardware to help further the causes. Several vendor do in fact do this, and this helps us support them. -mjk On Dec 30, 2003, at 6:03 AM, Purushotham Komaravolu wrote: > Hi All,
  • 299.
    > I would like to know the list of the components that have > to be > licensed, when we install ROCKS as a commercial solution. > Thanks > Happy Holidays > Puru From mjk at sdsc.edu Tue Dec 30 15:35:39 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 30 Dec 2003 15:35:39 -0800 Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> Message-ID: <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in some ways this job is simpler now, although no one has done this yet. A simple ypcat like you have will do most of the right thing and 411 will pick up the changed and send them around the cluster. But, you need to figure out how to merge the cluster information with the external NIS information. This will include things like the IP address for the cluster compute nodes. -mjk On Dec 30, 2003, at 2:22 PM, Chris Dwan (CCGB) wrote: > > Is there a preferred way to have the 411 server on the head node > replicate > information (passwd and auto.whatever) from an external NIS server to > the > compute nodes? It seems to me that a cron job like the one below does > the > trick, but it feels crufty to me: > > ypcat passwd > yp.passwd; > cat /etc/passwd yp.passwd > 411.passwd > ** build the 411 distributed passwd from the file above instead of > ** /etc/passwd. > > I'd love to hear suggestions for a more elegant solution. > > -Chris Dwan > The University of Minnesota > From mitchskin at comcast.net Tue Dec 30 17:13:44 2003 From: mitchskin at comcast.net (Mitchell Skinner) Date: Tue, 30 Dec 2003 17:13:44 -0800 Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <200312302028.hBUKSkgp017318@perry.geo.berkeley.edu> References: <200312302028.hBUKSkgp017318@perry.geo.berkeley.edu> Message-ID: <1072833146.8645.1114.camel@zeitgeist>
  • 300.
    On Tue, 2003-12-30at 12:28, Doug Neuhauser wrote: > I'll try Joe Landman's suggestion of a scrub floppy to scrub the boot > sector of the boot disk on the compute nodes. If I can't do that, I > CAN go through the manual process of setting and resetting the boot > order on each compute node, but it is a slow and sequential process. Something I'm going to try and implement at our site is support for the pxelinux 'localboot' option. If the hard drives have a valid boot sector, I can leave the BIOS set to PXE boot before the hard drive, and by changing the pxelinux configuration on the head node, I can set a particular node to boot from the network or from the local disk. In other words, when a node PXE boots, it might get either the kickstart instructions or the 'boot from hard drive' instructions. That will take some fiddling, I think, because the head node then has to maintain some more state for all of the compute nodes. I really want to avoid going through the BIOS setup on all my nodes more than once, though. Is this something that the ROCKS mainline would be interested in? Mitch From doug at seismo.berkeley.edu Tue Dec 30 17:51:49 2003 From: doug at seismo.berkeley.edu (Doug Neuhauser) Date: Tue, 30 Dec 2003 17:51:49 -0800 (PST) Subject: [Rocks-Discuss]Rocks 3.1.0 install problems Message-ID: <200312310151.hBV1pngp026060@perry.geo.berkeley.edu> My solution to force PXE boot is outlined below. 1. Boot dban floppy (floppy image at http://dban.sourceforge.net/ ). 2. Run "quick" purge of disks on system (I only have 1 disk on compute nodes). I let the disk purge get far enough into the disk to overwrite the boot sectors and filesystem -- I didn't wait for it to completely erase the entire disk. 3. Reset the system, and CYCLE POWER on the compute node. NOTE: If you don't cycle power, the BIOS sees the disk, but reports that it has a fatal error reading from it. This caused the following problems: a. PXE boot worked, but Rocks install also did not see the disk. It asked whether you want to manually configure the disk, but the configuration failed immeditately irregardless of whether I answered yes or no. The Rocks developers may want to look into this bug. b. By the time that I figured out that I needed to cycle power, the BIOS had already removed the disk from the boot order. My boot order was now: floppy, PXE, disk Rocks installed properly once, twice, .... until I reset the boot order to: floppy, disk, PXE. 4. Compute node will now perform PXE boot, install Rocks 3.1.0, and subsequent "controlled reboots" will boot from disk. If the node
  • 301.
    is powered downor reset with reset button, no boot block is left on disk, and the system will perform PXE boot and reinstall Rocks. ------------------------------------------------------------------------ Doug Neuhauser University of California, Berkeley doug at seismo.berkeley.edu Berkeley Seismological Laboratory Phone: 510-642-0931 215 McCone Hall # 4760 Fax: 510-643-5811 Berkeley, CA 94720-4760 From tim.carlson at pnl.gov Tue Dec 30 19:17:11 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Tue, 30 Dec 2003 19:17:11 -0800 (PST) Subject: [Rocks-Discuss]Rocks 3.1.0 install problems In-Reply-To: <200312310151.hBV1pngp026060@perry.geo.berkeley.edu> Message-ID: <Pine.GSO.4.44.0312301914310.23660-100000@poincare.emsl.pnl.gov> On Tue, 30 Dec 2003, Doug Neuhauser wrote: > 2. Run "quick" purge of disks on system (I only have 1 disk on compute nodes). > I let the disk purge get far enough into the disk to overwrite the boot > sectors and filesystem -- I didn't wait for it to completely erase the > entire disk. Here is something that is a bit quicker cluster-fork dd if=/dev/zero of=/dev/hda bs=1k count=512 Then either power cycle or cluster-fork reboot Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From cdwan at mail.ahc.umn.edu Tue Dec 30 19:44:11 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Tue, 30 Dec 2003 21:44:11 -0600 (CST) Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> Message-ID: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> > As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in > some ways this job is simpler now, although no one has done this yet. > A simple ypcat like you have will do most of the right thing and 411 > will pick up the changed and send them around the cluster. But, you > need to figure out how to merge the cluster information with the > external NIS information. This will include things like the IP address > for the cluster compute nodes.
  • 302.
    The shuffling belowwould work, I think, but it still gives me the willies to be mucking with the passwd file every hour: mv /etc/passwd /etc/passwd.local ypcat /etc/passwd > /etc/passwd.nis cat /etc/passwd.local /etc/passwd.nis > /etc/passwd service 411 commit cp /etc/passwd.local /etc/passwd Am I missing the simple way? I seem to have an affinity for finding the maximially complex way to do things... -Chris Dwan The University of Minnesota From mjk at sdsc.edu Tue Dec 30 19:58:43 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Tue, 30 Dec 2003 19:58:43 -0800 Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> Message-ID: <A04191F8-3B45-11D8-98D0-000A95DA5638@sdsc.edu> This sounds reasonable, but you still have a chance of conflicting UIDs in your password file. If you only issues accounts from your LAN NIS server than you should be fine. I'd suggest adding the accounts created by Rocks into your server (just look at the initial passwd file). The SGE roll creates an SGE user, others may also exist. You can also try setting up your frontend as an NIS client of your external server, with the same UID issues above. The bad news is we don't have a canned answer, and need someone to give us one. The good news is with 411 in place only the frontend need be changed and the compute node will still function as stock Rocks. -mjk On Dec 30, 2003, at 7:44 PM, Chris Dwan (CCGB) wrote: > >> As of Rocks 3.1.0 we no longer use NIS "inside" the cluster. So in >> some ways this job is simpler now, although no one has done this yet. >> A simple ypcat like you have will do most of the right thing and 411 >> will pick up the changed and send them around the cluster. But, you >> need to figure out how to merge the cluster information with the >> external NIS information. This will include things like the IP >> address >> for the cluster compute nodes. > > The shuffling below would work, I think, but it still gives me the > willies to be mucking with the passwd file every hour: > > mv /etc/passwd /etc/passwd.local > ypcat /etc/passwd > /etc/passwd.nis
  • 303.
    > cat /etc/passwd.local /etc/passwd.nis > /etc/passwd > service 411 commit > cp /etc/passwd.local /etc/passwd > > Am I missing the simple way? I seem to have an affinity for finding > the > maximially complex way to do things... > > -Chris Dwan > The University of Minnesota From csamuel at vpac.org Tue Dec 30 19:59:51 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 31 Dec 2003 14:59:51 +1100 Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> Message-ID: <200312311459.54054.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 31 Dec 2003 02:44 pm, Chris Dwan (CCGB) wrote: > mv /etc/passwd /etc/passwd.local > ypcat /etc/passwd > /etc/passwd.nis > cat /etc/passwd.local /etc/passwd.nis > /etc/passwd > service 411 commit > cp /etc/passwd.local /etc/passwd Hmm, how about: ypcat passwd > /etc/passwd.nis cat /etc/passwd /etc/passwd.nis > /etc/passwd.tmp cp /etc/passwd /etc/passwd.local mv /etc/passwd.tmp /etc/passwd service 411 commit mv /etc/passwd.local /etc/passwd That should mean that you're never operating without a password file and the overwrites should be approaching atomic (I hope). Of course, it'd be nice if you could do whatever the 411 init file does on something else than /etc/passwd :-) Disclaimer: I have not tried this myself & don't (yet) have a 3.1 system to test with, caveat emptor, batteries not includeded, IANAL, etc.. cheers! Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
  • 304.
    -----BEGIN PGP SIGNATURE----- Version:GnuPG v1.2.2 (GNU/Linux) iD8DBQE/8km3O2KABBYQAh8RAnpPAJ9a9oRdGXeBUBAokdX6wmwrVbgXkQCeKD0C xh8eT6qTbZpxhu8+FHPSt90= =lhiY -----END PGP SIGNATURE----- From csamuel at vpac.org Tue Dec 30 20:01:39 2003 From: csamuel at vpac.org (Chris Samuel) Date: Wed, 31 Dec 2003 15:01:39 +1100 Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <200312311459.54054.csamuel@vpac.org> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> <200312311459.54054.csamuel@vpac.org> Message-ID: <200312311501.43675.csamuel@vpac.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 31 Dec 2003 02:59 pm, Chris Samuel wrote: > cp /etc/passwd /etc/passwd.local should be: cp -p /etc/passwd /etc/passwd.local Oh, and what happens if users overlap ? :-) cheers, Chris - -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/8kojO2KABBYQAh8RAmWTAJwNhpm77IclXcWLoAuhp2/B4/GsCgCfZWek me3Lk2I7VDmRj4ygTSLSaaY= =Pv8G -----END PGP SIGNATURE----- From cdwan at mail.ahc.umn.edu Tue Dec 30 20:12:34 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Tue, 30 Dec 2003 22:12:34 -0600 (CST) Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <200312311459.54054.csamuel@vpac.org> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> <200312311459.54054.csamuel@vpac.org>
  • 305.
    Message-ID: <Pine.GSO.4.58.0312302206370.25976@lenti.med.umn.edu> > Ofcourse, it'd be nice if you could do whatever the 411 init file does on > something else than /etc/passwd :-) That would be a really big step. I'm deeply wary of cron jobs that overwrite my passwd file. The next step might be to put this functionality into 411 itself. it would be truly cool to have an automatic, non NIS way to make the passwd, group, autofs, and host lookup stuff be consistent and static across the cluster nodes. On the other hand, I appreciate that this is probably a complex enough system without trying to reinvent NIS but leave out the brittle server bits. We can work around for the time being. -Chris Dwan From doug at seismo.berkeley.edu Tue Dec 30 20:34:25 2003 From: doug at seismo.berkeley.edu (Doug Neuhauser) Date: Tue, 30 Dec 2003 20:34:25 -0800 (PST) Subject: [Rocks-Discuss]Mozilla / ssh DISPLAY problem with Rocks 3.1.0 Message-ID: <200312310434.hBV4YPgp028521@perry.geo.berkeley.edu> I am having a problem using mozilla with the default Rocks monitor web page over an ssh session to my headnode from a Sun workstation with a 24-bit display. My workstation is Sun Blade 150 running Solaris 8, and I am using SSH Secure Shell 3.2.5 (non-commercial version). When I ssh to my frontend and to run mozilla, I get an empty Mozilla frame. Running mozilla with debugging options "--g-fatal-warnings" I get: Gdk-WARNING **: Attempt to draw a drawable with depth 24 to a drawable with depth 8 aborting... xwinfino shows the following window characteristics: xwininfo: Window id: 0x9400034 "GCLCluster Cluster - Mozilla" Absolute upper-left X: 175 Absolute upper-left Y: 150 Relative upper-left X: 0 Relative upper-left Y: 0 Width: 1021 Height: 738 Depth: 8 Visual Class: PseudoColor Border width: 0 Class: InputOutput Colormap: 0x22 (installed) Bit Gravity State: NorthWestGravity Window Gravity State: NorthWestGravity Backing Store State: NotUseful Save Under State: no Map State: IsViewable Override Redirect State: no
  • 306.
    Corners: +175+150 -84+150 -84-136 +175-136 -geometry 1021x738-78+125 Is there a way to configure mozilla to use only a 8-bit drawable? If I ssh from a workstation with an 8-bit display, mozilla starts up OK, and creates an 8-bit window. - Doug N ------------------------------------------------------------------------ Doug Neuhauser University of California, Berkeley doug at seismo.berkeley.edu Berkeley Seismological Laboratory Phone: 510-642-0931 215 McCone Hall # 4760 Fax: 510-643-5811 Berkeley, CA 94720-4760 From qian1129 at yahoo.com Tue Dec 30 22:47:57 2003 From: qian1129 at yahoo.com (li lee) Date: Tue, 30 Dec 2003 22:47:57 -0800 (PST) Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0 Message-ID: <20031231064757.52813.qmail@web41508.mail.yahoo.com> Hi, I want to install Rocks v3.1.0 in PCs, but I do not want to so many CDs: roll-grid-3.1.0-0.any.iso roll-intel-3.1.0-0.any.iso roll-sge-3.1.0-0.any.iso ...... So, how to install all these after Rocks and HPC installation on clusters? Thanks Li __________________________________ Do you Yahoo!? Find out what made the Top Yahoo! Searches of 2003 http://search.yahoo.com/top2003 From bruno at rocksclusters.org Tue Dec 30 23:35:28 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Tue, 30 Dec 2003 23:35:28 -0800 Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0 In-Reply-To: <20031231064757.52813.qmail@web41508.mail.yahoo.com> References: <20031231064757.52813.qmail@web41508.mail.yahoo.com> Message-ID: <E7D709AA-3B63-11D8-9E96-000A95C4E3B4@rocksclusters.org> > I want to install Rocks v3.1.0 in PCs, but I do not > want to so many CDs: > roll-grid-3.1.0-0.any.iso > roll-intel-3.1.0-0.any.iso > roll-sge-3.1.0-0.any.iso > ...... > So, how to install all these after Rocks and HPC
  • 307.
    > installation onclusters? for now, we do not have a systematic way in which to incorporate rolls after the frontend is up. this is on our 'todo' list. - gb From tim.carlson at pnl.gov Wed Dec 31 07:29:21 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed, 31 Dec 2003 07:29:21 -0800 (PST) Subject: [Rocks-Discuss]Mozilla / ssh DISPLAY problem with Rocks 3.1.0 In-Reply-To: <200312310434.hBV4YPgp028521@perry.geo.berkeley.edu> Message-ID: <Pine.GSO.4.44.0312310727220.9033-100000@poincare.emsl.pnl.gov> On Tue, 30 Dec 2003, Doug Neuhauser wrote: > > I am having a problem using mozilla with the default Rocks monitor web page > over an ssh session to my headnode from a Sun workstation with a 24-bit > display. My workstation is Sun Blade 150 running Solaris 8, and I am > using SSH Secure Shell 3.2.5 (non-commercial version). > > When I ssh to my frontend and to run mozilla, I get an empty Mozilla frame. > Running mozilla with debugging options "--g-fatal-warnings" I get: This sounds like an X tunnel problem. I see X tunnel errors all the time (OpenGL, colormap, etc). What happens if you just set the DISPLAY variable back to your Sun box and do the proper xhost command on the Sun? Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From mjk at sdsc.edu Wed Dec 31 09:45:49 2003 From: mjk at sdsc.edu (Mason J. Katz) Date: Wed, 31 Dec 2003 09:45:49 -0800 Subject: [Rocks-Discuss]How to install Roll CDs in Rocks 3.1.0 In-Reply-To: <20031231064757.52813.qmail@web41508.mail.yahoo.com> References: <20031231064757.52813.qmail@web41508.mail.yahoo.com> Message-ID: <2BEBEC90-3BB9-11D8-9BE3-000A95DA5638@sdsc.edu> For this release you need all these CDs (if you want this functionality). Think of Rolls as add-on packs to Rocks, and remember that software belongs on a CD (not a tar ball, or ftp site). CDs are the accepted commercial way of releasing software, they a very nice. But we have some issues with this that we are addressing right now: - Meta-Rolls. That is how do you merge multiple Rolls into a single CD image. This is actually very easy to do, and we have some early code for this, it will be there in the next release. For IA64 we merge the
  • 308.
    HPC Roll ontothe base DVD, so we have a proof of concept here. - Rolls cannot be added after a cluster is installed, and must be used during installation. - Rolls cannot be uninstalled. Rolls are maturing pretty quickly, and we know where they need to go. -mjk On Dec 30, 2003, at 10:47 PM, li lee wrote: > Hi, > > I want to install Rocks v3.1.0 in PCs, but I do not > want to so many CDs: > roll-grid-3.1.0-0.any.iso > roll-intel-3.1.0-0.any.iso > roll-sge-3.1.0-0.any.iso > ...... > So, how to install all these after Rocks and HPC > installation on clusters? > > Thanks > > Li > > __________________________________ > Do you Yahoo!? > Find out what made the Top Yahoo! Searches of 2003 > http://search.yahoo.com/top2003 From michal at harddata.com Wed Dec 31 10:05:26 2003 From: michal at harddata.com (Michal Jaegermann) Date: Wed, 31 Dec 2003 11:05:26 -0700 Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu>; from cdwan@mail.ahc.umn.edu on Tue, Dec 30, 2003 at 09:44:11PM -0600 References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <E05DB9B2-3B20-11D8-98D0-000A95DA5638@sdsc.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> Message-ID: <20031231110526.B11252@mail.harddata.com> On Tue, Dec 30, 2003 at 09:44:11PM -0600, Chris Dwan (CCGB) wrote: > > > The shuffling below would work, I think, but it still gives me the > willies to be mucking with the passwd file every hour: > > mv /etc/passwd /etc/passwd.local > ypcat /etc/passwd > /etc/passwd.nis > cat /etc/passwd.local /etc/passwd.nis > /etc/passwd > service 411 commit > cp /etc/passwd.local /etc/passwd > > Am I missing the simple way?
  • 309.
    cp -p /etc/passwd/etc/passwd.local ypcat passwd >> /etc/passwd service 411 commit mv /etc/passwd.local /etc/passwd unless 'service 411' cat be told to use another file. You minimize that way a time gap when you are without /etc/passwd, you make sure that file attributes on /etc/passwd will remain intact and you are not left with extra files. You can also play with (symbolic) links but I am not sure if every possible /etc/passwd reader will indeed follow a link. Michal From michal at harddata.com Wed Dec 31 10:16:18 2003 From: michal at harddata.com (Michal Jaegermann) Date: Wed, 31 Dec 2003 11:16:18 -0700 Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <200312311501.43675.csamuel@vpac.org>; from csamuel@vpac.org on Wed, Dec 31, 2003 at 03:01:39PM +1100 References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> <200312311459.54054.csamuel@vpac.org> <200312311501.43675.csamuel@vpac.org> Message-ID: <20031231111618.C11252@mail.harddata.com> On Wed, Dec 31, 2003 at 03:01:39PM +1100, Chris Samuel wrote: > should be: > > cp -p /etc/passwd /etc/passwd.local > > Oh, and what happens if users overlap ? :-) 'sort -u' over relevant fields after replacing ':'s with blanks? But this is getting somewhat tad more involved and an "automatic conflict resolution" still may screw up. A bit of coordination between whomever maintains NIS and local user data, like reserving some names and uid ranges for one or another, is likely more effective in practice. Michal From bruno at rocksclusters.org Wed Dec 31 10:42:21 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 31 Dec 2003 10:42:21 -0800 Subject: [Rocks-Discuss]Roll Documentation posted on the web site Message-ID: <117308FA-3BC1-11D8-9E96-000A95C4E3B4@rocksclusters.org> just posted documentation for some of the rolls on the web site -- see the left-hand side of the web page: http://www.rocksclusters.org/Rocks/ and here are the links to the roll documentation:
  • 310.
    HPC Roll: http://www.rocksclusters.org/rocks-documentation/3.1.0/ SGERoll: http://www.rocksclusters.org/roll-documentation/sge/3.1.0/ Grid Roll: http://www.rocksclusters.org/roll-documentation/grid/3.1.0/ Intel Roll: http://www.rocksclusters.org/roll-documentation/intel/3.1.0/ as a side note, for every one of the rolls you install above, the documentation will be available on your frontend at: http://localhost/roll-documentation/ - gb From cdwan at mail.ahc.umn.edu Wed Dec 31 11:07:37 2003 From: cdwan at mail.ahc.umn.edu (Chris Dwan (CCGB)) Date: Wed, 31 Dec 2003 13:07:37 -0600 (CST) Subject: [Rocks-Discuss]NIS outside, 411 inside? In-Reply-To: <20031231111618.C11252@mail.harddata.com> References: <Pine.GSO.4.58.0312301614500.554@lenti.med.umn.edu> <Pine.GSO.4.58.0312302131450.24366@lenti.med.umn.edu> <200312311459.54054.csamuel@vpac.org> <200312311501.43675.csamuel@vpac.org> <20031231111618.C11252@mail.harddata.com> Message-ID: <Pine.GSO.4.58.0312311239310.3992@lenti.med.umn.edu> > this is getting somewhat tad more involved and an "automatic > conflict resolution" still may screw up. I agree with this assessment. The key is to keep the local passwd file as small as possible, and remove redundant accounts on the frontend node. Since it consists mostly of non-login accounts anyway, this shouldn't be too difficult...and it's a one time task anyway. I've settled on the hourly cron job below. I'll report any weirdness as appropriate. Thanks for all the suggestions and discussion. #!/bin/sh ypcat auto.master > /etc/auto.master ypcat auto.home > /etc/auto.home ypcat auto.net > /etc/auto.net ypcat auto.web > /etc/auto.web ypcat passwd > /etc/passwd.nis cat /etc/passwd.local /etc/passwd.nis > /etc/passwd.combined cp /etc/passwd.combined /etc/passwd ypcat group > /etc/group.nis cat /etc/group.local /etc/group.nis > /etc/group.combined cp /etc/group.combined /etc/group -Chris Dwan The University of Minnesota
  • 311.
    From maz attempestcomputers.com Wed Dec 31 11:37:09 2003 From: maz at tempestcomputers.com (John Mazza) Date: Wed, 31 Dec 2003 14:37:09 -0500 Subject: [Rocks-Discuss]Rocks 3.1.0 with Adaptec I2O RAID Message-ID: <200312311937.hBVJb9J25828@postal.sdsc.edu> Does anyone know of a way to make 3.1.0 (x86-64) version work with an Adaptec 2100S SCSI RAID card? My master node needs to use this card, but it doesn't appear to be in the kernel on the CD. Also, does it support the SysKonnect SK-9821 (Ver 2.0) Gig cards? Thanks! From tim.carlson at pnl.gov Wed Dec 31 12:49:25 2003 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed, 31 Dec 2003 12:49:25 -0800 (PST) Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <20031229183225.M11961@scalableinformatics.com> Message-ID: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> On Mon, 29 Dec 2003, landman wrote: > SSH is too slow. Wow. 5-10 seconds to log in. Just getting around to this. I did a clean install on our test cluster (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal user, a "cluster-fork date" command on 4 nodes took under .6 seconds Sounds like you have some type of DNS issue. Did you get a bad /etc/resolv.conf file on the nodes for some reason? > a) md (e.g. Software RAID): Just try to build one. Anaconda will > happily let you do this ... though it will die in the formatting stages. > Dropping into the shell (Alt-F2) and looking for the md module (lsmod) > shows nothing. Insmod the md also doesn't do anything. Catting > /proc/devices shows no md as a character or block device. The odd bit here is that you can do a modprobe raid0 on a running frontend and it gets installed but there is no associated "md" module. Was "md" built directly into the kernel? very odd. >b) ext3. There is no ext3 available for the install. This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not having ext3 as an install option isn't a show stopper for me since I can do a tune2fs after the fact. But ext3 should be there. Having version 2.0.8 of the myrinet drivers up and running is a big + in
  • 312.
    my book. SGE5.3p5 is also nice to see. It will be some time before I upgrade any production clusters given the differences between Rh 7.3 and WS 3.0. Too big of a jump for me right now. We first need to convert a couple hundred desktop boxes :) Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson at pnl.gov EMSL UNIX System Support From James_ODell at Brown.edu Wed Dec 31 13:09:25 2003 From: James_ODell at Brown.edu (James O'Dell) Date: Wed, 31 Dec 2003 16:09:25 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> Message-ID: <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu> For whatever its worth, MPICH works MUCH better when run over rsh that ssh. It seems as if ssh doesn't pass along signals nearly as well as rsh. Since enabling rsh and configuring MPICH to use it, we have had no Zombie jobs on our compute nodes. When using SSH they were a common occurrence. In fact, if you look at the MPICH implementation for myrinet, you'll see the contortions that they use to try and clean up compute nodes when using ssh. Jim On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote: > On Mon, 29 Dec 2003, landman wrote: > >> SSH is too slow. Wow. 5-10 seconds to log in. > > Just getting around to this. I did a clean install on our test cluster > (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal > user, a "cluster-fork date" command on 4 nodes took under .6 seconds > > Sounds like you have some type of DNS issue. Did you get a bad > /etc/resolv.conf file on the nodes for some reason? > >> a) md (e.g. Software RAID): Just try to build one. Anaconda will >> happily let you do this ... though it will die in the formatting >> stages. >> Dropping into the shell (Alt-F2) and looking for the md module (lsmod) >> shows nothing. Insmod the md also doesn't do anything. Catting >> /proc/devices shows no md as a character or block device. > > The odd bit here is that you can do a > > modprobe raid0 > > on a running frontend and it gets installed but there is no associated > "md" module. Was "md" built directly into the kernel? very odd.
  • 313.
    > >> b) ext3.There is no ext3 available for the install. > > This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not > having ext3 as an install option isn't a show stopper for me since I > can > do a tune2fs after the fact. But ext3 should be there. > > Having version 2.0.8 of the myrinet drivers up and running is a big + > in > my book. SGE 5.3p5 is also nice to see. > > It will be some time before I upgrade any production clusters given the > differences between Rh 7.3 and WS 3.0. Too big of a jump for me right > now. > We first need to convert a couple hundred desktop boxes :) > > Tim Carlson > Voice: (509) 376 3423 > Email: Tim.Carlson at pnl.gov > EMSL UNIX System Support > From landman at scalableinformatics.com Wed Dec 31 14:46:22 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 31 Dec 2003 17:46:22 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> Message-ID: <1072910782.4470.268.camel@protein.scalableinformatics.com> On Wed, 2003-12-31 at 15:49, Tim Carlson wrote: > On Mon, 29 Dec 2003, landman wrote: > > > SSH is too slow. Wow. 5-10 seconds to log in. > > Just getting around to this. I did a clean install on our test cluster > (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal > user, a "cluster-fork date" command on 4 nodes took under .6 seconds Yeah, some weirdness in DNS. Re-load on one cluster head took care of it, on the other applying dnsmasq helped. > > Sounds like you have some type of DNS issue. Did you get a bad > /etc/resolv.conf file on the nodes for some reason? > > > a) md (e.g. Software RAID): Just try to build one. Anaconda will > > happily let you do this ... though it will die in the formatting stages. > > Dropping into the shell (Alt-F2) and looking for the md module (lsmod) > > shows nothing. Insmod the md also doesn't do anything. Catting > > /proc/devices shows no md as a character or block device. > > The odd bit here is that you can do a > > modprobe raid0 >
  • 314.
    > on arunning frontend and it gets installed but there is no associated > "md" module. Was "md" built directly into the kernel? very odd. True, but I wanted to do a raid 1. I tried the insmod raid1 but it didn't work, from what I can see the module was not in the build. This is ok, as some of it can be done later. > > >b) ext3. There is no ext3 available for the install. > > This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not > having ext3 as an install option isn't a show stopper for me since I can > do a tune2fs after the fact. But ext3 should be there. Thats what I did. I'll post a quick set of instructions for this a little later. > > Having version 2.0.8 of the myrinet drivers up and running is a big + in > my book. SGE 5.3p5 is also nice to see. I agree, though I would like to see people do a cluster-fork "/etc/init.d/rcsge stop" cluster-fork "chown -R root:root /opt/gridengine/bin /opt/gridengine/utilbin" cluster-fork "/etc/init.d/rcsge start" to fix the compute node sge permissions. Some of the utils don't work otherwise. > > It will be some time before I upgrade any production clusters given the > differences between Rh 7.3 and WS 3.0. Too big of a jump for me right now. > We first need to convert a couple hundred desktop boxes :) :) > > Tim Carlson > Voice: (509) 376 3423 > Email: Tim.Carlson at pnl.gov > EMSL UNIX System Support > From landman at scalableinformatics.com Wed Dec 31 14:48:08 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 31 Dec 2003 17:48:08 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu> References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu> Message-ID: <1072910887.4464.271.camel@protein.scalableinformatics.com> Hi James: Did you rebuild MPICH for this? I noticed the signal handling bit
  • 315.
    using mpiBLAST. Lots of zombies to deal with. Joe On Wed, 2003-12-31 at 16:09, James O'Dell wrote: > For whatever its worth, MPICH works MUCH better when run over rsh that > ssh. It seems as if ssh doesn't pass along > signals nearly as well as rsh. Since enabling rsh and configuring MPICH > to use it, we have had no Zombie jobs > on our compute nodes. When using SSH they were a common occurrence. In > fact, if you look at the MPICH implementation for myrinet, you'll see > the contortions that they use to try and clean up compute nodes when > using ssh. > > Jim > > On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote: > > > On Mon, 29 Dec 2003, landman wrote: > > > >> SSH is too slow. Wow. 5-10 seconds to log in. > > > > Just getting around to this. I did a clean install on our test cluster > > (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal > > user, a "cluster-fork date" command on 4 nodes took under .6 seconds > > > > Sounds like you have some type of DNS issue. Did you get a bad > > /etc/resolv.conf file on the nodes for some reason? > > > >> a) md (e.g. Software RAID): Just try to build one. Anaconda will > >> happily let you do this ... though it will die in the formatting > >> stages. > >> Dropping into the shell (Alt-F2) and looking for the md module (lsmod) > >> shows nothing. Insmod the md also doesn't do anything. Catting > >> /proc/devices shows no md as a character or block device. > > > > The odd bit here is that you can do a > > > > modprobe raid0 > > > > on a running frontend and it gets installed but there is no associated > > "md" module. Was "md" built directly into the kernel? very odd. > > > >> b) ext3. There is no ext3 available for the install. > > > > This is a bit annoying. Nobody really uses ext2 anymore do they? :) Not > > having ext3 as an install option isn't a show stopper for me since I > > can > > do a tune2fs after the fact. But ext3 should be there. > > > > Having version 2.0.8 of the myrinet drivers up and running is a big + > > in > > my book. SGE 5.3p5 is also nice to see. > > > > It will be some time before I upgrade any production clusters given the > > differences between Rh 7.3 and WS 3.0. Too big of a jump for me right > > now. > > We first need to convert a couple hundred desktop boxes :) > >
  • 316.
    > > Tim Carlson > > Voice: (509) 376 3423 > > Email: Tim.Carlson at pnl.gov > > EMSL UNIX System Support > > From James_ODell at Brown.edu Wed Dec 31 15:12:59 2003 From: James_ODell at Brown.edu (James O'Dell) Date: Wed, 31 Dec 2003 18:12:59 -0500 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <1072910887.4464.271.camel@protein.scalableinformatics.com> References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> <9CBB7CF1-3BD5-11D8-9574-0030656A27CC@Brown.edu> <1072910887.4464.271.camel@protein.scalableinformatics.com> Message-ID: <DFF94A81-3BE6-11D8-9574-0030656A27CC@Brown.edu> The cheap way to do it is to grep the bin directory and look for SSH in the execution scripts. You can change them to RSH and MPICH will use RSH to execute. An alternative is to set RSHCOMMAND=rsh during a rebuild. I'm pretty sure that this method accomplishes precisely the same thing as simply editing the execution scripts. Jim On Dec 31, 2003, at 5:48 PM, Joe Landman wrote: > Hi James: > > Did you rebuild MPICH for this? I noticed the signal handling bit > using mpiBLAST. Lots of zombies to deal with. > > Joe > > On Wed, 2003-12-31 at 16:09, James O'Dell wrote: >> For whatever its worth, MPICH works MUCH better when run over rsh that >> ssh. It seems as if ssh doesn't pass along >> signals nearly as well as rsh. Since enabling rsh and configuring >> MPICH >> to use it, we have had no Zombie jobs >> on our compute nodes. When using SSH they were a common occurrence. >> In >> fact, if you look at the MPICH implementation for myrinet, you'll see >> the contortions that they use to try and clean up compute nodes when >> using ssh. >> >> Jim >> >> On Dec 31, 2003, at 3:49 PM, Tim Carlson wrote: >> >>> On Mon, 29 Dec 2003, landman wrote: >>> >>>> SSH is too slow. Wow. 5-10 seconds to log in. >>>
  • 317.
    >>> Just gettingaround to this. I did a clean install on our test >>> cluster >>> (Dell 1550 and 1750 boxes). No delays with ssh. As root or a normal >>> user, a "cluster-fork date" command on 4 nodes took under .6 seconds >>> >>> Sounds like you have some type of DNS issue. Did you get a bad >>> /etc/resolv.conf file on the nodes for some reason? >>> >>>> a) md (e.g. Software RAID): Just try to build one. Anaconda will >>>> happily let you do this ... though it will die in the formatting >>>> stages. >>>> Dropping into the shell (Alt-F2) and looking for the md module >>>> (lsmod) >>>> shows nothing. Insmod the md also doesn't do anything. Catting >>>> /proc/devices shows no md as a character or block device. >>> >>> The odd bit here is that you can do a >>> >>> modprobe raid0 >>> >>> on a running frontend and it gets installed but there is no >>> associated >>> "md" module. Was "md" built directly into the kernel? very odd. >>> >>>> b) ext3. There is no ext3 available for the install. >>> >>> This is a bit annoying. Nobody really uses ext2 anymore do they? :) >>> Not >>> having ext3 as an install option isn't a show stopper for me since I >>> can >>> do a tune2fs after the fact. But ext3 should be there. >>> >>> Having version 2.0.8 of the myrinet drivers up and running is a big + >>> in >>> my book. SGE 5.3p5 is also nice to see. >>> >>> It will be some time before I upgrade any production clusters given >>> the >>> differences between Rh 7.3 and WS 3.0. Too big of a jump for me right >>> now. >>> We first need to convert a couple hundred desktop boxes :) >>> >>> Tim Carlson >>> Voice: (509) 376 3423 >>> Email: Tim.Carlson at pnl.gov >>> EMSL UNIX System Support >>> From bruno at rocksclusters.org Wed Dec 31 15:46:23 2003 From: bruno at rocksclusters.org (Greg Bruno) Date: Wed, 31 Dec 2003 15:46:23 -0800 Subject: [Rocks-Discuss]3.1.0 surprises In-Reply-To: <1072910782.4470.268.camel@protein.scalableinformatics.com> References: <Pine.LNX.4.44.0312311230120.18826-100000@roach.emsl.pnl.gov> <1072910782.4470.268.camel@protein.scalableinformatics.com> Message-ID: <8ABA2E3A-3BEB-11D8-83CE-000A95C4E3B4@rocksclusters.org>
  • 318.
    >> Having version2.0.8 of the myrinet drivers up and running is a big + >> in >> my book. SGE 5.3p5 is also nice to see. > > I agree, though I would like to see people do a > > cluster-fork "/etc/init.d/rcsge stop" > cluster-fork "chown -R root:root /opt/gridengine/bin > /opt/gridengine/utilbin" > cluster-fork "/etc/init.d/rcsge start" > > to fix the compute node sge permissions. Some of the utils don't work > otherwise. so we can test the fixes, what utilities need the above changes? - gb From landman at scalableinformatics.com Wed Dec 31 21:04:14 2003 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 01 Jan 2004 00:04:14 -0500 Subject: [Rocks-Discuss]looking for a work-around Message-ID: <1072933453.4463.293.camel@protein.scalableinformatics.com> Ok, this one is weird. On two different clusters using the same replace-auto-partition.xml I get two completely different behaviors. I am positive this is an anaconda issue, but it could be something else. Both systems have IDE hard disks. I made the second one (my office system) match the other system, so the IDE hard disks are hda and hdb. Yes, I know this is not ideal, and I know that this should be changed. I am simply trying to match their system. First the partitioning: <main> <clearpart>--all</clearpart> <part> / --size 4096 --ondisk hda </part> <part> swap --size 1024 --ondisk hda </part> <part> raid.00 --size 1 --grow --ondisk hda </part> <part> /tmp --size 4096 --ondisk hdb </part> <part> swap --size 1024 --ondisk hdb </part> <part> raid.01 --size 1 --grow --ondisk hdb </part> </main> On one cluster (my office), this works perfectly. On the other cluster, it fails with: An unhandled exception has occurred. This is # ??? ??? most likely a bug. Please copy the full text ??? ??? ??? of this exception or save the crash dump to a ??? ??? ??? floppy then file a detailed bug report against ??? ??? ??? anaconda at http://bugzilla.redhat.com/bugzilla/ ???
  • 319.
    ??? ??? ??? ??? ??? Traceback (most recent call last): ??? ??? ??? File "/usr/bin/anaconda.real", line 1081, in ? ??? ??? ??? intf.run(id, dispatch, configFileData) ??? ??? ??? File ??? ??? ??? "/var/tmp/anaconda-9.1//usr/lib/anaconda/text.py ??? ??? ??? ", line 448, in run ??? ??? ??? File "/tmp/ksclass.py", line 799, in __call__ ??? ??? ??? KeyError: swap # ??? ??? ??? ??? ?????????????????? ???????????????????????? ?????????????? ??? OK ??? ??? Save ??? ??? Debug ??? ??? ??? ?????????????????? ???????????????????????? ?????????????? ?? (sorry about the question marks). It appears that this is a python KeyError, which occurs when the element being sought has not been found. Any ideas? Joe -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615