bsub - submit a job for batched execution by the lsbatch system
bsub [ -h ] [ -V ] [ -x ] [ -r ] [ -N ] [ -B ]
[ -q queue_name ... ] [ -m host_name ... ] [ -n num_processors ] [
-R res_req ]
[ -J job_name ] [ -b begin_time ] [ -t term_time ]
[ -i in_file ] [ -o out_file ] [ -e err_file ] [ [ -f «lfile op [
rfile ]» ] ... ]
[ -c cpu_limit[/host_spec ] ] [ -W run_limit[/host_spec ] ] [
-F file_limit ]
[ -M mem_limit ] [ -D data_limit ] [ -S stack_limit ] [ -C core_limit
]
[ -k chpnt_dir [ chkpnt_period ] ] [ -s sigval ]
[ -w depend_cond ] [ -E «pre_exec_command [ argument ... ]» ]
[ -L login_shell ]
[ command [ argument ... ] ]
Submit a job for batched execution on host(s) that satisfy the resource requirements of the job and can provide a fast turnaround time. If the load of all the candidate hosts is too high, or some specific conditions configured in lsbatch are not satisfied, the job is held in a queue. The job will be executed later when system resources become available and the conditions are satisfied. This allows the system to restrict the number of jobs that are executed simultaneously so as to keep system overhead low, and to adjust the number of jobs based on the current system load. Jobs are started and suspended according to the current system load.
The job is submitted to a batch job queue configured in the lsbatch system in the local LSF cluster. To get information about the queues, see bqueues(1) . The lsbatch may automatically select an appropriate queue for your job if you don't specify a specific queue (see option -q below). If the job is successfully submitted, then a unique job ID (a positive number) is printed together with the queue to which the job has been submitted. You can later use this job ID to operate on the job.
The batch job can be specified by the command line argument command, or through the standard input if the command is not present on the command line. The command can be anything that is provided to a UNIX Bourne shell (see sh(1) ). command is assumed to begin with the first word that is not part of a bsub option. All arguments that follow command are provided as the arguments to the command.
If the batch job is not given on the command line, bsub reads the job commands from standard input. If the standard input is a controlling terminal, the user is prompted with «bsub>» for the commands of the job. The input is terminated by entering CTRL-D on a new line. You can submit multiple commands through standard input. The commands are executed in the order in which they are given. bsub options can also be specified in the standard input if the line begins with #BSUB; e.g., «#BSUB -x". If an option is given on both the bsub command line, and in the standard input, the command line option overrides the option in the standard input. The user can specify the shell to run the commands by specifying the shell pathname in the first line of the standard input; e.g., «#! /bin/csh". If the shell is not given in the first line, the Bourne shell is used. The standard input facility can be used to spool a user's job script; e.g., «bsub < script". See EXAMPLES below for examples of specifying commands through standard input.
The user's execution environment, including the current working directory, file creation mask, and all the environment variables, is set for the batch job. In addition, the following lsbatch environment variables are set before starting the batch job:
LSB_JOBID
This is the ID of the job assigned by the lsbatch system, as shown by
bjobs(1)
.
LSB_HOSTS
This is a list of hosts that are used to run the batch job. For
sequential jobs, it contains only one host names. For parallel jobs,
it contains multiple host names separated by spaces. Host names may be
repeated if multiple components of a parallel job is allocated on this
host.
LSB_QUEUE
This is the name of the queue from which the job is dispatched.
LSB_JOBNAME
This is the name of the job. The name of a job can be specified explicitly
when the user submits the job (see bsub(1)
). If the user does
not specify a job name, the job name will be the last 60 characters of
the job's command line.
When a job is executed, command line and stdout/stderr buffers are stored in the directory home_directory/.lsbatch on the execution host. The directory given in the /etc/passwd file on the execution host is used as the job's home directory. If this directory is not accessible, /tmp/lsbtmp<userId> is used as the job's home directory. If the current working directory is under the home directory on the submission host, then the current working directory is also set to be the same relative directory under the home directory on the execution host. The job is run in /tmp if the current working directory is not accessible on the execution host.
Parallel jobs are typically submitted with the -n option having a value greater than one. This allows the job to initially use multiple processors. The job is dispatched to, and started on, the first host chosen by the lsbatch system, and the environment variable LSB_HOSTS contains the list of chosen host names.
-h Print command usage to stderr and exit.
-V Print LSF release version to stderr and exit.
-x Exclusive execution mode. The job is exclusively executed on a host. There are no other batch jobs running on the host when the job starts, and no further job, batch or interactive, is dispatched to run on the host until the job completes. You cannot submit an exclusive job unless the queue is configured to allow exclusive jobs.
-r Specify that the job can be rerun. If the execution host of the job is considered to be unavailable, the lsbatch system requeues this job in the same job queue, and reruns it from its start when a suitable host is found, as if the job were submitted as a new job. A new job ID is assigned. The user who submitted the failed job receives a mail informing of the job failure, the requeueing of the job, and the new job ID.
For a job that is checkpointed (see -k option and bchkpnt(1) ) before the execution host becomes unavailable, the job is restarted from the last checkpoint. The restarted job is requeued for execution in the same manner as for a job that is restarted using the brestart command (see brestart(1) ). In order for the job to be successfully restarted, the job's checkpoint directory must reside in a shared filesystem accessible to the previous host and the host receiving the restarted job.
-N Send the notification about the job termination to the submitter by mail when the job finishes. For default, see -o option below.
-B Send mail to the submitter when the job is dispatched and begins execution. The default is not to send such mail.
-q queue_name ...
Submit the job to one of the queues specified by queue_name .... This
can be either a single queue name, or a list of queue names defined in
the lsbatch system. In the latter case, the list must be enclosed by
quotation marks (» « or ` `). Queues are usually named to correspond
to the type of jobs usually submitted to them, or to the type of services
they provide.
When a list of queue names is specified, lsbatch selects an appropriate queue in the list for the job based on the job's resource limits and other restrictions, such as the requested host(s), user's accessability to a queue, queue status (closed or open), whether a queue can accept exlusive jobs, etc. The order in which the queues are considered in selection is the same order in which these queues are listed; the queue listed first is considered first.
If this option is absent, the user default queue list specified by the user's environment variable LSB_DEFAULTQUEUE is used for the queue selection. If neither this option nor LSB_DEFAULTQUEUE is present, the system default queue list specified by the LSF administrator in the lsb.params configuration file is used (see lsbatch(5) for parameter DEFAULT_QUEUE).
-m host_name ...
Limit the candidate hosts for executing this job to those specified by
host_name ... . This can be either a single host name, or a list of
host names or host group names defined by the lsbatch system. In the
latter case, the list must be enclosed by quotation marks (» « or `
`). You can find membership of a host group using the bmgroup command.
If a job queue is specified using the -q option, then the host list of
that queue (see bqueues(1)
) must include all the hosts specified by
this option for the job to be acceptable. The default is to use the
hosts of the queue as candidates, which satisfy -R option.
-n num_processors
The initial number of processors needed by a (parallel) job. The
default is 1. After accepting a parallel job, lsbatch searches for
hosts that both meet the resource requirements of the job and are
lightly loaded. Once the specified number of such processors is
available (some may be on the same multiprocessor host), the job is
dispatched to the first host selected, with the list of selected host
names for the job specified in the environment variable LSB_HOSTS.
The job itself is expected to start parallel components on these hosts
and establish communication among them, optionally using the Remote
Execution Server (RES) of LSF.
-R res_req
Resource requirement. If this option is not specified, the lsbatch
system tries to obtain resource requirement information for the com_mand
from the remote task list that is maintained by the load sharing
library (see lsfintro(1)
). If the command is not listed in the remote
task list, the default resource requirement is to run command on a
host or hosts that are of the same host type (see lshosts(1)
) as the
submission host.
-J job_name
Assign the character string specified by job_name to the job. You can
later use this job_name to identify this job. The default job_name is
the command. The job name need not be unique.
-b begin_time
Dispatch the job for execution on or after begin_time. begin_time is
in the form of [[month:]day:]hour:minute where month is 1-12, day is
1-31, hour is 0-23, and minute is 0-59. The time refers to the next
matching wall clock time. The default is to start the job as soon as
possible. If -b is used, then at least two fields must be given.
These fields are assumed to be hour:minute. If three fields are given,
they are assumed to be day:hour:minute, and four fields are assumed to
be month:day:hour:minute.
-t term_time
The job termination deadline. If the job is still running by
term_time, then it is sent a USR2 signal. If the job does not terminate
within 10 minutes after being sent this signal, then it is
killed. term_time is in the same form as begin_time for the -b
option. The default is to allow the job to run as long as its
resource limits permit.
-i in_file
The batch job gets its standard input from file in_file. in_file is a
file path name. Default is /dev/null (no input). If the file in_file
is not found on the execution host, the file is copied from the submission
host to a temporary file in the user's $HOME/.lsbatch directory
on the execution host. This file is removed when the job completes.
The file copy can be performed only if RES is running on the
submission host, or if the user has allowed rcp access (see rcp(1)
).
-o out_file
Store the standard output of the job to the file out_file. If the
out_file file already exists, the job output is appended to it.
out_file is a file path name. If -e is not present, the standard
error of the job will also be stored to file out_file. If -N is not
present, the information about the job termination is output as the
header of file out_file and no mail is sent to the submitter of the
job. If -o is not specified, then the same information that would
otherwise be stored in file the out_file is sent by mail to the submitter.
-e err_file
Store the standard error output of the job to the file err_file. For
default, see the -o option.
-f lfile op [ rfile ]
Copy a file between the local (submission) host and the remote (execution)
host. lfile/rfile can be an absolute or a relative path name of
a file that is available on the local/remote host. If rfile is not
specified, rfile defaults to lfile. Use multiple -f options to
specify multiple files.
op is an operator that specifies whether the file is copied to the remote host, or whether it is copied back from the remote host. op must be surrounded by white space. The following describes the op operators:
`>' copy lfile to rfile before job starts. rfile is overwritten if it exists.
`<' copy rfile to lfile after the job completes. lfile is overwritten if it exists.
`<<' append rfile to lfile after the job completes. lfile is created if it does not exist.
`><' and `<>' : equivalent to performing `>' and then the `<' operation. `<>' is the same as `><'.
The stdin file is copied to a temporary file on the remote host at execution time if it is not found on that host (see the -i option description). The stdout and stderr files must be explicitly specified using the -f option if the user wants those files to be copied back to the submission host when job execution completes.
If the local and remote hosts have different file name spaces, you must always specify relative path names. If the local and remote hosts do not share the same file system, you must ensure that the directory containing rfile exists. It is recommended that the file name be given only for rfile when running in heterogeneous file systems; this places the file in the job's current working directory. If the file is shared between the submission and execution hosts, then no file transfer is performed.
Files can be transferred only if RES is running on the local host at execution time, or if the user has allowed rcp access (see rcp(1) ). Jobs that are submitted from LSF client hosts should specify the -f option only if rcp is allowed.
-c cpu_limit[/host_spec]
Set the per-process UNIX soft CPU time limit to cpu_limit for each of
the processes belonging to this batch job (see getrlimit(2)
). The
default is no limit. This option is useful for preventing erroneous
jobs from running away or using up too much resource. A SIGXCPU signal
is sent to the process by UNIX when it has accumulated the specified
amount of time. If the job has no signal handler for SIGXCPU,
this causes it to be killed. On HP-UX, CPU limit cannot be set, so
this option has no effect.
cpu_limit is in the form of [hour:]minute, where minute can be greater than 59. So, three and a half hours can either be specified as 3:30, or 210. Optionally, a host name or a host model name defined in LSF can be provided as host_spec following cpu_limit and a `/' character. (See lsinfo(1) to get host model information.) host_spec is also used in the option -W. In its absence, the system default is assumed (see the description of DEFAULT_HOST_SPEC in «lsb.queues file» section of lsbatch(5) ); if the system default is not defined, the host model of the local machine is assumed. The appropriate CPU scaling factor for the specified host or host model defined in LSF is used to adjust the actual CPU time limit at the execution host by multiplying the factor of host_spec and dividing the factor of the execution host.
-W run_limit[/host_spec]
Set the wall-clock run time limit of this batch job. The default is
no limit. If the accumulated time the job has spent in RUN state
exceeds this limit, the job is sent a USR2 signal. If the job does not
terminate within 10 minutes after being sent this signal, it is
killed. run_limit is in the same form as cpu_limit of the -c option.
host_spec is the same as in the -c option.
-F file_limit
Set a per-process (soft) file size limit for each of the processes
that belong to this batch job (see getrlimit(2)
). The default is no
soft limit. If a process of this job attempts to write to a file such
that the file size would increase beyond file_limit Kbytes, that process
is sent a SIGXFSZ signal. This condition normally terminates the
process, but may be caught. On HP-UX, the file size limit cannot be
set, so this option has no effect.
-M mem_limit
Set a per-process (soft) process resident set size limit to mem_limit
Kbytes for each of the processes that belong to this batch job (see
getrlimit(2)
). The default is no soft limit. Exceeding this limit
when free physical memory is in short supply results in a low scheduling
priority being assigned to the process. That is, the process will
be re-niced. On HP-UX and Sun Solaris 2.x, a resident set size limit
cannot be set, so this option has no effect.
-D data_limit
Set a per-process (soft) data segment size limit for each of the
processes that belong to this batch job (see getrlimit(2)
). The
default is no soft limit. A sbrk call to extend the data segment
beyond data_limit Kbytes will return an error. On HP-UX, a data size
limit cannot be set, so this option has no effect.
-S stack_limit
Set a per-process (soft) stack segment size limit for each of the
processes that belong to this batch job (see getrlimit(2)
). The
default is no soft limit. On HP-UX, a stack size limit cannot be set,
so this option has no effect.
-C core_limit
Set a per-process (soft) core file size limit for all the processes
that belong to this batch job (see getrlimit(2)
). The default is no
soft limit. If a process of this job attempts to create a core file
beyond core_limit Kbytes, then that process will be sent a SIGXFSZ
signal or the writing of a core file will terminate at this limit,
depending on the UNIX systems on different platforms. On HP-UX, a
core file size limit cannot be set, so this option has no effect.
-k chkpnt_dir [ chkpnt_period ]
The job is specified as checkpointable. Optionally, a checkpoint
period of chkpnt_period minutes may be specified. When the job is
checkpointed, the checkpoint information is stored in chkpnt_dir. The
chkpnt_period must be a positive integer. The running job is checkpointed
automatically every chkpnt_period minutes if chkpnt_period is
given. chkpnt_period can be used to specify the checkpoint period
after the job has been submitted (see bchkpnt(1)
). Because checkpointing
is a heavyweight operation, it is suggested that the checkpoint
period be greater than half an hour. Quotation marks (") or (')
must surround chkpnt_dir and chkpnt_period if the checkpoint period is
given, e.g., -k «job1chkdir 10". The checkpoint directory can be a
relative or absolute pathname, and is used for restarting the job (see
brestart(1)
). Process checkpointing is not available on all host
types, and may require linking programs with a special library (see
libckpt.a(3)
). If this option is not specified, the job is considered
as non-checkpointable.
-s sigval
This option applies to the job only if it is submitted to a queue that
has run windows (see bqueues(1)
). lsbatch will send the signal sigval
to this job ten minutes before the run window closes. This allows the
job to clean up or checkpoint itself, if desired. If the job does not
terminate within 10 minutes after being sent this signal, it is
suspended.
-w depend_cond
depend_cond specifies the dependency condition of a batch job. Only
when depend_cond is satisfied (TRUE), will the job be considered for
dispatch.
depend_cond is defined as an arbitrary AND/OR logic expression of batch job conditions (see below for the definitions). Use the logic operators `&&' (AND) and `||' (OR), and parentheses `(' and `)' to compose the depend_cond expression. If there is a space character or any one of the logic operators or parenthesis in the expression string, the string must be enclosed by quotation marks (") or ('). When a positive integer is used as a job name, it must be enclosed by quotes. Otherwise, it would be interpreted as a job ID.
Batch job conditions:
started( jobId | jobName )
If the specified batch job has started running or has
already finished, the condition is TRUE;
done( jobId | jobName )
If the specified batch job has finished successfully with
DONE state, the condition is TRUE, otherwise FALSE.
exit( jobId | jobName )
If the specified batch job has finished with EXIT state, the
condition is TRUE, otherwise FALSE.
If only the jobId/jobName is specified, the system assumes it means done( jobId | «jobName» | `jobName' ). Note that the job name should be enclosed by double quotes, e.g. -w «'jobName'» since the shell treats -w «210» as the same as -w 210.
While JobId may be used to specify jobs of any users, Job name can only be used to specify the user's own jobs. If more than one job use the same job name, the last submitted job is assumed.
If any one of the specified batch jobs is not found, the job is not submitted.
The following are examples of depend_cond :
«done(1351) && (started('job2') || exit('job3'))"
The job with this condition is eligible for dispatch only if job
1351 has finished successfully and either job2 has started or
job3 has finished with error.
«1351 || `job2' || started('job3')"
The condition is TRUE if either job 1351 or job2 has finished
successfully or job3 has started.
-E pre_exec_command [ arguments ... ]
Execute the pre_exec_command on the host to which the batch job is
dispatched to run (or on the first host selected for the parallel
batch job) before actually running the batch job. If the
pre_exec_command exits with 0, then the real job is started on the
host, otherwise the job goes back to PEND status and is rescheduled
later. If the pre_exec_command does not exist or can not be executed,
the batch job will be aborted and an error report will be mailed to
the user.
If the pre_exec_command is not in the user's normal execution path (the $PATH variable), the full path name of the command must be specified. Though this is a general-purpose interface, lsbatch assumes that the pre_exec_command can be run many times without having side effects. The pre_exec_command can be any executable with a CPU time consumption of less than 60 seconds and an execution duration of less than 180 seconds. If it runs beyond either of these limits, the command is terminated by the system and the batch job will be aborted and an error report will be mailed to the user. No standard input and output support is provided for the execution of the pre_exec_command.
-L login_shell
specifies the name of the login shell under which the job will be executed.
If this option is specified, lsbatch will start login_shell as
though it were the login shell, thus the system and user startup files
will be sourced and the job will run under this environment. After
sourcing the startup files, the process of login_shell will be overlaid
by the process of the job file. For example, given -L csh , csh
is started as a login shell and /etc/login, ~/.cshrc and ~/.login are
sourced, and then the csh process is overlaid when job starts. The
default is not to start a login shell but just run the job file and
the execution environment is the environment when the job was submitted.
Note that the environment variable LSB_QUEUE is set by lsbatch so that shell scripts (say the user's .profile or .cshrc script) can test for batch job execution when appropriate, and not (for example) perform any setting of terminal characteristics, since a batch job is not connected to a terminal. For example, if your login shell is C-shell, the following .login file prevents stty and tset from being run during batch jobs.
if (! $?LSB_QUEUE) then
stty erase ^H kill ^U
tset -S
endif
If your login shell is Bourne shell, the following .profile file has the same effect.
if test «$LSB_QUEUE» = «» ; then
stty erase ^H kill ^U
tset -S
fi
% bsub sleep 100
Submit the UNIX command sleep together with its argument 100 as a
batch job.
% bsub -q short -o my_output_file «pwd; ls" Submit the UNIX command pwd and ls as a batch job to the queue named short and store the job output in my_output file.
% bsub -m «host1 host3 host8 host9» my_program Submit my_program to run on one of the candidate hosts: host1, host3, host8 and host9.
% bsub -q «queue1 queue2 queue3» -c 5 my_program Submit my_program to one of the candidate queues: queue1, queue2, and queue3 which are selected according the CPU time limit specified by -c 5.
% bsub -b 20:00 -J my_job_name my_program
Submit my_program to run after 8pm and assign it the job name
my_job_name.
% bsub my_script
Submit my_script as a batch job. Since my_script is specified as a
command line argument, the my_script file is not spooled. Later
changes to the my_script file before the job completes may affect this
job.
% bsub < default_shell_script
where default_shell_script contains:
sim1.exe
sim2.exe
The file default_shell_script is spooled, and the commands will be run under the Bourne shell since a shell specification is not given in the first line of the script.
% bsub < csh_script
where csh_script contains:
#! /bin/csh
sim1.exe
sim2.exe
csh_script is spooled and the commands will be run under /bin/csh.
% bsub -q night < my_script
where my_script contains:
#! /bin/sh
#BSUB -q test
The job is submitted using the bsub options given in the the my_script file. The job is submitted to the night queue instead of the test queue as the command line options override options in the script.
% bsub -b 20:00 -J my_job_name
bsub> sleep 1800
bsub> my_program
bsub> CTRL-D
The job commands are entered interactively.
lsbatch(1) , bjobs(1) , bqueues(1) , bhosts(1) , bmgroup(1) , bchkpnt(1) , brestart(1) , sh(1) , getrlimit(2) , sbrk(2) , libckpt.a(3) , lsbatch(5) , mbatchd(8)