::MITgcm cluster facility

Using the PBS queue software  
Home
Overview
Get an account
Login
PBS Queues
Compilers
Standard Libraries
Parallel Execution
MITgcm Examples
FAQ
Output Analysis
Storage
More Help
Hardware Layout
Technical Specs
Table Of Contents

Back Next


    The queue software PBS is used to manage access to the myrinet-4 and myrinet-3 compute clusters. Reference material regarding PBS can be found using the web site http://www.openpbs.org. Online help is also available as described below.The PBS configuration on the MITgcm facility allows a job script to be written that requests a set of nodes be made available for exclusive use by that job. If the nodes (or other resources) that are requested by the job script are unavailable then the script will sit in a queue and only start executing once the resources become available. The PBS software resides in the directory /usr/pbs. Adding the directory /usr/pbs/bin to you command search PATH environment variable and the directory /usr/pbs/man to your MANPATH shell environment variable will allow you to access PBS commands and to get online help for PBS commands through the man command. A simple, example PBS job script is shown below
 
#!/bin/csh
#
# Example PBS script to run a job on the myrinet-3 cluster.
# The lines beginning #PBS set various queuing parameters.
#
# o -N Job Name
#PBS -N examplejob
#
#
# o -l resource lists that control where job goes
#      here we ask for 3 nodes, each with the attribute "p4".
#PBS -l nodes=3:p4
#
# o Where to write output
#PBS -e stderr
#PBS -o stdout
#
# o Export all my environment variables to the job
#PBS -V
#
echo $PBS_NODEFILE
cat  $PBS_NODEFILE
echo 'The list above shows the nodes this job has exclusive access to.'
echo 'The list can be found in the file named in the variable $PBS_NODEFILE'

typing this job script into a file and then submitting the file using the command
qsub filename
where filename is the name of the file into which the job script was typed ( the command qsub is in the directory /usr/pbs/bin) , should give the following output in a file called stdout
 
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/usr/spool/PBS/aux/22703.cg01
myrinet-3-03
myrinet-3-01
myrinet-3-02
The list above shows the nodes this job has exclusive access to.
The list can be found in the file named in the variable $PBS_NODEFILE

This output shows that the job script was allocated the nodes myrinet-3-03, myrinet-3-01 and myrinet-3-02. The list of nodes that was allocated to the jobs script is accessed through the file named in the environment variable $PBS_NODEFILE, which in the case of the test job above happened to be /usr/spool/PBS/aux/22703.cg01. However, this path name will be different for every PBS job script, so scripts always use the $PBS_NODEFILE variable. The example script does not execute any programs. However, a real script would either
bulletuse rsh to log in to the nodes given in $PBS_NODEFILE and execute programs on those nodes.
bulletuse MPI to start a job that runs in parallel across the set of nodes listed in $PBS_NODEFILE.
The full list of PBS commands can be found in the directory /usr/bin/pbs with online manual information in /usr/pbs/man. The most useful ones are summarized here
 
qsub Submits a job script. Useful options include -I for an interactive job. qsub with a filename will read the job script from a file. Without a file name it will read the job script from the command line. A job can be specified in a single command line. For example qsub -I -l nodes=4:p4 will start an interactive job on four nodes with attribute "p4" (myrinet-3, Pentium 4 cluster nodes). Similarly the command qsub -I -l nodes=4:ppn=1:p4 will start an interactive job on four nodes with attribute "p4". In both cases the nodes given by $PBS_NODEFILE will be exclusive to the job.
qstat Queries the current job queue and lists its contents. Useful options include -a which lists all jobs and -f which gives a full listing of queued jobs. With -f the listing includes information on why a job is not currently running. The meaning of the output from qstat is described on the qstat man page.
qdel jobid This command will stop a running or queued pbs job identified by jobid. The parameter jobid is the identifier listed in qstat and returned by qsub.
pbsnodes -a This command will list all the nodes that PBS can send jobs to and their attributes. Attributes are arbitrary keys listed in the properites field. They can be used to control which set of nodes a job executes on. The node name (for example myrinet-4-01) is also an attribute that can be specified. Specifying nodes names as attributes requests that a job runs on those specific nodes. For example to run interactively on specific nodes myrinet-3-27 and myrinet-3-18 you can specify these in the properties arguments to qsub e.g. qsub -I -l nodes=1:myrinet-3-27+1:myrinet-3-18