SlideShare a Scribd company logo
1 of 20
CIS 210 February 2013
Sun/Oracle Grid Engine is:
 A quick and easy way to set up a multi-
  cluster system using existing hardware
 Oracle Grid Engine is the most widely
  deployed workload management solution in
  the industry and offers unmatched
  scalability. On top of a rich set of advanced
  scheduling capabilities and the flexibility to
  adapt to any computing environment and
  application workload, Oracle Grid Engine
  offers comprehensive support for the cloud
  computing model.
How to Install
 Via Webappl.blogspot.com
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
Install SGE on master node:
   Install SGE on master node:
    mpiuser@ub0:~$ sudo apt-get install
    gridengine-client gridengine-common
    gridengine-master gridengine-qmon
    gridengine-exec
    #remove gridengine-exec from the list if
    master node is not supposed to run jobs
    #during the installation, we need to set
    the cluster CELL name (such as
    „default‟)
Install SGE on other nodes:
 Install SGE on other nodes:
 mpiuser@ub1:~$ sudo apt-get install
  gridengine-client gridengine-exec

   The CELL name is set the same as that
    of the master node
Set SGE_ROOT and
SGE_CELL
   Set SGE_ROOT and SGE_CELL
    environment variables:
    $SGE_ROOT refers to the installation path
    of SGE
    $SGE_CELL is cell name which is „default‟
    on our machine
    Edit /etc/profile and /etc/bash.bachrc, add
    the following two lines
    export SGE_ROOT=/var/lib/gridengine
    #this is the path on our machines
    export SGE_CELL=default
    Source the script: source /etc/profile
Configure SGE with qmon
   Configure SGE with qmon (This section is
    modified from a note by Junjun Mao)
   Invoke qmon as superuser:
    mpiuser@ub0:~$ sudo qmon
   #On our machine, qmon failed to start due to
    missing fonts „-adobe-helvetica-…”
   # To solve the fonts problem:
    mpiuser@ub0:~$ sudo apt-get install xfs xfstt
    mpiuser@ub0:~$ sudo apt-get install t1-
    xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-
    nonfree-syriac xfonts-75dpi xfonts-100dpi
    mpiuser@ub0:~$ sudo reboot #after reboot,
    the problem is gone
Configure hosts
 Configure hosts
 "Host Configuration" -> "Administration
  Host" -> Add master node and other
  administrative nodes
  "Host Configuration" -> "Submit Host" ->
  Add master node and other submit
  nodes
  "Host Configuration" -> "Execution Host"
  -> Add slave nodes
  ->Click on "Done" to finish
Configure the user
 Configure the user
 Add or delete users that are allowed to
  access SGE here. In this example, a user
  is added to an existing group and later this
  group will be allowed to submit jobs.
  Everything else is left as default values.
 "User Configuration" -> "Userset" ->
  Highlight userset "arusers" and click on
  "Modify" -> Input user name in
  "User/Group" field
  ->Click "Done" to finish
Configure the queue
   Configure the queue
    While Host Configuration deals what
    computing resources are available and
    User Configuration defines who have
    access to the resources, this Queue
    Control defines ways to connect hosts
    and users.
Queue Control
   "Queue Control" -> "Hosts" -> Confirm the execution
    hosts show up there.
    "Queue Control" -> "Cluster Queues" -> Click on
    "Add" -> Name the queue, add execution nodes to
    Hostlist;
    and
    "Use access" -> allow access to user group arusers;
    "General Configuration" -> Field "Slots" -> Raise the
    number to total CPU cores on slave nodes (ok to use
    a bigger number than actual CPU cores).
    "Queue Control" -> "Queue Instances" -> This is the
    place to manually assign hosts to queues, and
    control the state (active, suspend ...) of hosts.
Configure parallel environment
   Configure parallel environment
    "Queue Control" -> "Cluster Queues" -> Select a queue that will
    run parallel jobs -> Click on "Modify" -> "Parallel Environment" -
    > Click on icon "PE" below the right and left arrows -> Click on
    "Add" -> Name the PE, slots = 999, start_proc_args =
    $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args =
    $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check
    "Control slaves" to make this variable checked.
    Make sure the configured PE is loaded from "Available PE" to
    "Referenced PE".
    Confirm and close all config windows and open "Queue Control"
    -> "Cluster Queues" -> "Parallel Environment" again, the named
    PE should show up.
    Once created and linked to a queue, PE can be edited from
    "Queue Control" -> "PE" too.
Check whether sge hosts are
running properly
   Check whether sge hosts are running properly
    mpiuser@ub0:~$ qhost #it should list the system info from all
    nodes
    mpiuser@ub0:~$ qconf -sel #it should list the hostnames of
    nodes
    mpiuser@ub0:~$ qconf -sql #it should list the queues
    mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep
    #check master daemon
    mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep
    #check execute daemon
    mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep
    #check execute daemon
    #If sge_qmaster or sge_execd daemon is not running, try
    starting by service
    #mpiuser@ub1:~$ sudo service gridengine-master start
    #mpiuser@ub1:~$ sudo service gridengine-exec start
    …
    #Reboot node(s) if sge_qmaster or sge_execd fails to start
Run a test script
   Run a test script
    Make a script named „test‟ with content:
    #!/bin/bash
    ### Request Bourne shell as shell for job
    #$ -S /bin/bash
    ### Use current directory as working directory
    #$ -CWD
    ### Name the job:
    #$ -N test
    echo “Running environment:”
    env
    echo “=============================”
    ###end of script
Job Submission
   To submit the job: qsub test
    #a job id returned if successful
    Query the job status: qstat
    #If the job is running successfully, there
    will be two output files produced in the
    current working directory with name
    test.oXXX (the standard output) and
    test.eXXX (the standard error), where
    test is the job name and XXX is the job
    id.
Always check your logs
   Check log messages if error occurs
    mpiuser@ub0:~$ less
    /var/spool/gridengine/qmaster/messages
    #master node
    mpiuser@ub0:~$ less
    /var/spool/gridengine/execd/ub0/messag
    es #exec node
Possible Errors
   Question: My output file has a Warning: no
    access to tty (Bad file descriptor).Thus no
    job control in this shell.
    Answer: This warning is caused if you are
    using the tcsh or csh as shell for submitting
    job. It is safe to ignore this warning.
    Alternatively you can qsub -S /bin/bash to
    run your program in different shell or add a
    line of „#$ -S /bin/bash‟ in the job script.
Possible Errors
   Question: Master host failed to respond properly. Error message is “error: commlib
    error: access denied (client IP resolved to host name „ub0…‟. This is not identical to
    clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟”
    Answer: Reboot the master node or install the SGE from source code on master node
    (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full
    path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname
    to that from running command „hostname -f‟. If this is the case (e.g., host having
    multiple network interfaces), create a file named „host_aliases‟ under
    „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows,
    # cat host_aliases
    ub0 ub0.my.com ub0-grid
    ub1 ub1.my.com ub1-grid
    ub2 ub2.my.com ub2-grid
    ub3 ub3.my.com ub3-grid
    and then restart the gridengine daemon (see man page of sge_host_aliases for
    details). Check the aliases:
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0
    #both of them should return ub0
Sources
 http://manpages.ubuntu.com/manpages/
  /jaunty/man5/sge_conf.5.html
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
 http://pka.engr.ccny.cuny.edu/~jmao/nod
  e/49
 http://webappl.blogspot.com/2011/05/set
  ting-up-mpich2-cluster-with-ubuntu.html

More Related Content

More from Dan Morrill

Using Regular Expressions in Grep
Using Regular Expressions in GrepUsing Regular Expressions in Grep
Using Regular Expressions in GrepDan Morrill
 
Understanding the security_organization
Understanding the security_organizationUnderstanding the security_organization
Understanding the security_organizationDan Morrill
 
You should ask before copying that media
You should ask before copying that mediaYou should ask before copying that media
You should ask before copying that mediaDan Morrill
 
Cis 216 – shell scripting
Cis 216 – shell scriptingCis 216 – shell scripting
Cis 216 – shell scriptingDan Morrill
 
Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Dan Morrill
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computingDan Morrill
 
Social Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleSocial Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleDan Morrill
 
Case Studies In Social Media Chinese
Case Studies In Social Media ChineseCase Studies In Social Media Chinese
Case Studies In Social Media ChineseDan Morrill
 
Case Studies In Social Media
Case Studies In Social MediaCase Studies In Social Media
Case Studies In Social MediaDan Morrill
 
Turn On Tune In Step Out
Turn On Tune In Step OutTurn On Tune In Step Out
Turn On Tune In Step OutDan Morrill
 
Technology And The Future Of Management
Technology And The Future Of ManagementTechnology And The Future Of Management
Technology And The Future Of ManagementDan Morrill
 

More from Dan Morrill (13)

Using Regular Expressions in Grep
Using Regular Expressions in GrepUsing Regular Expressions in Grep
Using Regular Expressions in Grep
 
Understanding the security_organization
Understanding the security_organizationUnderstanding the security_organization
Understanding the security_organization
 
You should ask before copying that media
You should ask before copying that mediaYou should ask before copying that media
You should ask before copying that media
 
Cis 216 – shell scripting
Cis 216 – shell scriptingCis 216 – shell scripting
Cis 216 – shell scripting
 
Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computing
 
Social Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleSocial Media Plan for CityU of Seattle
Social Media Plan for CityU of Seattle
 
BSIS Overview
BSIS OverviewBSIS Overview
BSIS Overview
 
Case Studies In Social Media Chinese
Case Studies In Social Media ChineseCase Studies In Social Media Chinese
Case Studies In Social Media Chinese
 
Case Studies In Social Media
Case Studies In Social MediaCase Studies In Social Media
Case Studies In Social Media
 
Turn On Tune In Step Out
Turn On Tune In Step OutTurn On Tune In Step Out
Turn On Tune In Step Out
 
Technology And The Future Of Management
Technology And The Future Of ManagementTechnology And The Future Of Management
Technology And The Future Of Management
 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Working with Oracle/Sun Grid Engine

  • 2.
  • 3. Sun/Oracle Grid Engine is:  A quick and easy way to set up a multi- cluster system using existing hardware  Oracle Grid Engine is the most widely deployed workload management solution in the industry and offers unmatched scalability. On top of a rich set of advanced scheduling capabilities and the flexibility to adapt to any computing environment and application workload, Oracle Grid Engine offers comprehensive support for the cloud computing model.
  • 4. How to Install  Via Webappl.blogspot.com  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html
  • 5. Install SGE on master node:  Install SGE on master node: mpiuser@ub0:~$ sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec #remove gridengine-exec from the list if master node is not supposed to run jobs #during the installation, we need to set the cluster CELL name (such as „default‟)
  • 6. Install SGE on other nodes:  Install SGE on other nodes:  mpiuser@ub1:~$ sudo apt-get install gridengine-client gridengine-exec  The CELL name is set the same as that of the master node
  • 7. Set SGE_ROOT and SGE_CELL  Set SGE_ROOT and SGE_CELL environment variables: $SGE_ROOT refers to the installation path of SGE $SGE_CELL is cell name which is „default‟ on our machine Edit /etc/profile and /etc/bash.bachrc, add the following two lines export SGE_ROOT=/var/lib/gridengine #this is the path on our machines export SGE_CELL=default Source the script: source /etc/profile
  • 8. Configure SGE with qmon  Configure SGE with qmon (This section is modified from a note by Junjun Mao)  Invoke qmon as superuser: mpiuser@ub0:~$ sudo qmon  #On our machine, qmon failed to start due to missing fonts „-adobe-helvetica-…”  # To solve the fonts problem: mpiuser@ub0:~$ sudo apt-get install xfs xfstt mpiuser@ub0:~$ sudo apt-get install t1- xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86- nonfree-syriac xfonts-75dpi xfonts-100dpi mpiuser@ub0:~$ sudo reboot #after reboot, the problem is gone
  • 9. Configure hosts  Configure hosts  "Host Configuration" -> "Administration Host" -> Add master node and other administrative nodes "Host Configuration" -> "Submit Host" -> Add master node and other submit nodes "Host Configuration" -> "Execution Host" -> Add slave nodes ->Click on "Done" to finish
  • 10. Configure the user  Configure the user  Add or delete users that are allowed to access SGE here. In this example, a user is added to an existing group and later this group will be allowed to submit jobs. Everything else is left as default values.  "User Configuration" -> "Userset" -> Highlight userset "arusers" and click on "Modify" -> Input user name in "User/Group" field ->Click "Done" to finish
  • 11. Configure the queue  Configure the queue While Host Configuration deals what computing resources are available and User Configuration defines who have access to the resources, this Queue Control defines ways to connect hosts and users.
  • 12. Queue Control  "Queue Control" -> "Hosts" -> Confirm the execution hosts show up there. "Queue Control" -> "Cluster Queues" -> Click on "Add" -> Name the queue, add execution nodes to Hostlist; and "Use access" -> allow access to user group arusers; "General Configuration" -> Field "Slots" -> Raise the number to total CPU cores on slave nodes (ok to use a bigger number than actual CPU cores). "Queue Control" -> "Queue Instances" -> This is the place to manually assign hosts to queues, and control the state (active, suspend ...) of hosts.
  • 13. Configure parallel environment  Configure parallel environment "Queue Control" -> "Cluster Queues" -> Select a queue that will run parallel jobs -> Click on "Modify" -> "Parallel Environment" - > Click on icon "PE" below the right and left arrows -> Click on "Add" -> Name the PE, slots = 999, start_proc_args = $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args = $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check "Control slaves" to make this variable checked. Make sure the configured PE is loaded from "Available PE" to "Referenced PE". Confirm and close all config windows and open "Queue Control" -> "Cluster Queues" -> "Parallel Environment" again, the named PE should show up. Once created and linked to a queue, PE can be edited from "Queue Control" -> "PE" too.
  • 14. Check whether sge hosts are running properly  Check whether sge hosts are running properly mpiuser@ub0:~$ qhost #it should list the system info from all nodes mpiuser@ub0:~$ qconf -sel #it should list the hostnames of nodes mpiuser@ub0:~$ qconf -sql #it should list the queues mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep #check master daemon mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep #check execute daemon mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep #check execute daemon #If sge_qmaster or sge_execd daemon is not running, try starting by service #mpiuser@ub1:~$ sudo service gridengine-master start #mpiuser@ub1:~$ sudo service gridengine-exec start … #Reboot node(s) if sge_qmaster or sge_execd fails to start
  • 15. Run a test script  Run a test script Make a script named „test‟ with content: #!/bin/bash ### Request Bourne shell as shell for job #$ -S /bin/bash ### Use current directory as working directory #$ -CWD ### Name the job: #$ -N test echo “Running environment:” env echo “=============================” ###end of script
  • 16. Job Submission  To submit the job: qsub test #a job id returned if successful Query the job status: qstat #If the job is running successfully, there will be two output files produced in the current working directory with name test.oXXX (the standard output) and test.eXXX (the standard error), where test is the job name and XXX is the job id.
  • 17. Always check your logs  Check log messages if error occurs mpiuser@ub0:~$ less /var/spool/gridengine/qmaster/messages #master node mpiuser@ub0:~$ less /var/spool/gridengine/execd/ub0/messag es #exec node
  • 18. Possible Errors  Question: My output file has a Warning: no access to tty (Bad file descriptor).Thus no job control in this shell. Answer: This warning is caused if you are using the tcsh or csh as shell for submitting job. It is safe to ignore this warning. Alternatively you can qsub -S /bin/bash to run your program in different shell or add a line of „#$ -S /bin/bash‟ in the job script.
  • 19. Possible Errors  Question: Master host failed to respond properly. Error message is “error: commlib error: access denied (client IP resolved to host name „ub0…‟. This is not identical to clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟” Answer: Reboot the master node or install the SGE from source code on master node (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname to that from running command „hostname -f‟. If this is the case (e.g., host having multiple network interfaces), create a file named „host_aliases‟ under „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows, # cat host_aliases ub0 ub0.my.com ub0-grid ub1 ub1.my.com ub1-grid ub2 ub2.my.com ub2-grid ub3 ub3.my.com ub3-grid and then restart the gridengine daemon (see man page of sge_host_aliases for details). Check the aliases: mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0 #both of them should return ub0
  • 20. Sources  http://manpages.ubuntu.com/manpages/ /jaunty/man5/sge_conf.5.html  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html  http://pka.engr.ccny.cuny.edu/~jmao/nod e/49  http://webappl.blogspot.com/2011/05/set ting-up-mpich2-cluster-with-ubuntu.html