The queue software PBS is used to manage access to the
myrinet-4
and myrinet-3
compute clusters. Reference material regarding PBS can be found using
the web site http://www.openpbs.org.
Online help is also available as described below.The PBS configuration on
the MITgcm facility allows a job script to be written that requests
a set of nodes be made available for exclusive use by that job. If the
nodes (or other resources) that are requested by the job script are
unavailable then the script will sit in a queue and only start executing
once the resources become available. The PBS software resides in the
directory /usr/pbs. Adding the directory /usr/pbs/bin to you
command search PATH environment variable and the directory /usr/pbs/man
to your MANPATH shell environment variable will allow you to access
PBS commands and to get online help for PBS commands through the man
command. A simple, example PBS job script is shown below
#!/bin/csh # # Example PBS script to run a job on the myrinet-3 cluster. # The lines beginning #PBS set various queuing parameters. # # o -N Job Name #PBS -N examplejob # # # o -l resource lists that control where job goes # here we ask for 3 nodes, each
with the attribute "p4". #PBS -l nodes=3:p4 # # o Where to write output #PBS -e stderr
#PBS -o stdout
# # o Export all my environment variables to the job #PBS -V # echo $PBS_NODEFILE cat $PBS_NODEFILE echo 'The list above shows the nodes this job has exclusive
access to.' echo 'The list can be found in the file named in the variable $PBS_NODEFILE'
typing this job script into a file and then submitting the file using
the command qsubfilename
where filename is the name of the file into which the job script
was typed ( the command qsub is in the directory /usr/pbs/bin)
, should give the following output in a file called stdout
Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. /usr/spool/PBS/aux/22703.cg01 myrinet-3-03 myrinet-3-01 myrinet-3-02 The list above shows the nodes this job has exclusive access to. The list can be found in the file named in the variable $PBS_NODEFILE
This output shows that the job script was allocated the nodes
myrinet-3-03, myrinet-3-01 and myrinet-3-02. The list of nodes that was
allocated to the jobs script is accessed through the file named in the
environment variable $PBS_NODEFILE, which in the case of the test job
above happened to be /usr/spool/PBS/aux/22703.cg01. However, this path
name will be different for every PBS job script, so scripts always
use the $PBS_NODEFILE variable. The example script does not execute any
programs. However, a real script would either
use rsh to log in to the nodes given in $PBS_NODEFILE and execute
programs on those nodes.
use MPI to start a job that runs in parallel across the set of nodes
listed in $PBS_NODEFILE.
The full list of PBS commands can be found in the directory /usr/bin/pbs
with online manual information in /usr/pbs/man. The most useful ones are
summarized here
qsub
Submits a job script. Useful options include -I for an interactive
job. qsub with a filename will read the job script from a file.
Without a file name it will read the job script from the command
line. A job can be specified in a single command line. For example qsub
-I -l nodes=4:p4 will start an interactive job on four nodes
with attribute "p4" (myrinet-3, Pentium 4 cluster nodes).
Similarly the command qsub -I -l nodes=4:ppn=1:p4 will start
an interactive job on four nodes with attribute "p4". In both cases the
nodes given by $PBS_NODEFILE will be exclusive to the job.
qstat
Queries the current job queue and lists its contents. Useful
options include -a which lists all jobs and -f which
gives a full listing of queued jobs. With -f the listing
includes information on why a job is not currently running. The
meaning of the output from qstat is described on the qstat man page.
qdel jobid
This command will stop a running or queued pbs job identified by jobid.
The parameter jobid is the identifier listed in qstat
and returned by qsub.
pbsnodes -a
This command will list all the nodes that PBS can send jobs
to and their attributes. Attributes are arbitrary keys listed in the
properites field. They can be used to control which set of nodes a
job executes on. The node name (for example myrinet-4-01) is also an
attribute that can be specified. Specifying nodes names as
attributes requests that a job runs on those specific nodes. For
example to run interactively on specific nodes myrinet-3-27 and myrinet-3-18 you can specify
these in the properties arguments to qsub e.g. qsub -I -l
nodes=1:myrinet-3-27+1:myrinet-3-18