Table of Contents

NAME

lsf - load sharing facility

DESCRIPTION

LSF, or Load Sharing Facility, is load sharing software built on top of UNIX. It consists of the Load Information Manager (LIM), the Remote Execution Server (RES), the Load Sharing LIBrary (LSLIB), and a variety of load sharing applications and utilities. LSF interoperates on many UNIX platforms. Some of the load sharing applications (each with its own man page) are as follows:

lstcsh(1)
load sharing version of tcsh(1) . Transparently send interactive jobs to remote hosts for execution when the remote hosts are faster or have a lower load than the local host.

lsbatch(1) load sharing batch utility. Distribute parallel as well as sequential batch jobs to the hosts in a distributed system for execution.

lsmake(1)
load sharing version of GNU make. Use multiple hosts to process make tasks (such as compilations) in parallel.

lslogin(1) load sharing version of login(1) .

xlsmon(1)
Motif graphical user interface for monitoring the load on the hosts in an LSF cluster.

LSF also has a set of commands that can be used as tools to convert binary programs into load sharing versions without recompiling or relinking. The currently available tools are:

lseligible(1) display the remote execution eligibility of a task.

lshosts(1) display configuration information about hosts participating in load sharing.

lsid(1)
display the name of the local LSF cluster and the name of its master LIM host.
lsinfo(1)
display load sharing configuration information.

lsclusters(1) display general configuration information about LSF clusters.

lsload(1)
display the load information of load sharing hosts.

lsloadadj(1) adjust the load condition data of load sharing hosts.

lsltasks(1) display or updates the local task list.

lsrtasks(1) display or updates the remote task list.

lsplace(1) display the currently best host or hosts for executing one or more load sharing tasks.

lsrun(1)
run a task using LSF load sharing, with possible host selection.
lsgrun(1)
run a task using LSF load sharing on a list of hosts.
ch(1)
change the host on which subsequent commands will be executed.
lsmon(1)
full-screen LSF monitoring utility that displays and updates the load information of hosts in the local cluster.

RESOURCE REQUIREMENT STRINGS

Many of the above commands and utilities permit a resource requirement string to be specified. A resource requirement string contains information used for querying for information from the LIM about hosts or requesting task placement decisions.

A resource requirement string is divided into three sections including a selection section, an ordering section, and a resource usage section. The selection section specifies the criteria for selecting hosts from the system. The ordering section indicates how the hosts which meet the selection criteria should be sorted. The resource usage section specifies the expected resource consumption of the task. The syntax of a resource requirement expression is

«select[ selectstring ] order[ orderstring ] rusage[ usagestring ]"

where `select', `order' and `rusage' are the section names. Any character in the resource requirement expression not within the above sections are ignored. If a section is repeated multiple times in a resource requirement expression, then only the first occurence is considered. The syntax for each of `selectstring', `orderstring' and `usagestring' is defined below. Depending on the command, one or more of these sections may be ignored. For example, lshosts(1) will only select hosts, but not order them, lsload(1) will select and order the hosts, while lsplace(1) uses the information in all three sections to select an appropriate host for a task. lsloadadj(1) uses the resource usage section to determine how the load information should be adjusted on a host. Sections other than these are ignored. If no section name is given, then the string is treated as a `selectstring'. The `select' keyword may be omitted if the `selectstring' appears as the first string in the resource requirement.

Selection String

The selection string specifies the characteristics a host must have to be returned. It is a logical expression built from a set of resource names. The resource names and their descriptions can be obtained by running the LSF utility program lsinfo(1) . The resource names `swap', `idle', `login', and `cpu' are aliases for `swp', `it', `ls' and `r1m' respectively which are returned by lsinfo(1) .

Resource names correspond to information maintained by the LIM about hosts. Some resources correspond to dynamic information about a host, such as its CPU queue length, available memory, and available swap space. These resources are referred to as load indices and can be retrieved via lsload(1) . Other resources correspond to static information about a host such as its type, host model, relative CPU speed, total memory and total swap space. This information can be retrieved via lshosts(1) . The system administrator can define other resources in the system in addition to those built in to LIM.

An arbitrary expression with resource names being combined with logical or mathematical operators can be specified. Valid operators include `&&' (logical AND), `||' (logical OR), `!' (logical NOT), `+' (addition), `-' (subtraction), `*',(multiplication) and `/' (division). The selection expression is evaluated for each host; if the result is non-zero, then that host is selected.

For example,

«select[ (swp>50 && mem>=10 && type==MIPS) || (swp>35 && type==ALPHA) ]"

«select[ ((2*r15s + 3*r1m + r15m)/6 < 1.0) && fs && (cpuf > 4.0) ]"

are valid selection expressions. Resource names which are of the boolean type (e.g. `fs' for a file server resource) have a value of 1 if they are defined for a host, and 0 otherwise. The default is to select all configured hosts in the cluster. For the string valued resources `type' and `model', special values of `any' and `local' can be used to select any value or the same value as that of the local host, respectively. For example, «type==local» would select hosts of the same type as the local host. When the run queue lengths `r15s', `r1m', or `r15m' are specified, the effective value of the queue length is used.

For tasks where an arbitrary selection string is not required a restricted syntax is provided. The restricted syntax allows for combining resource names using `:' (logical AND) and `-' (logical NOT). For example, «r15m=1.5:mem=20:swp=12:-ultrix» is a valid selection string in the restricted syntax. It is equivalent to «r15m <= 1.5 && mem >= 20 && swp >= 12 && !ultrix» in the unrestricted syntax. A selection string in the restricted syntax is of the form

«res[=value]:res[=value]: ... :res[=value]"

where each `res' is resource name. value may only be specifed for resources whose value type is numeric or string. The semantics of `=' depends on the value type and sorting order for the resource. For string resources `=', is equivalent to `=='. For numeric valued resources `=' is equivalent to `>=' (`<=') if the sorting order for the resource is decreasing (increasing). A `-' may only be used in front of a boolean resource name or in isolation. In isolation `-' is equivalent to «type==any". If the value is not given for a numeric resource then it is equivalent to saying that the resource must have a non-zero value. Other examples of a selection string in the restricted syntax are, «-:swp=12", «type=MIPS:maxmem=20", and «status=busy".

Order String

The order string allows the selected hosts to be sorted according to the value(s) of resource(s). The syntax of the order string is «[-]res:[]res:...[-]res", each `res' is a resource name with a numeric value type. Currently only load indices such as `mem', `swp', and `tmp' which are returned by lsload(1) are considered for sorting. For example, «swp:r1m:tmp:r15s» is a valid order string. The order string is used as input to a multi-level sorting algorithm, where each sorting phase orders the hosts according to one particular load index and discards some hosts. The remaining hosts are passed onto the next phase. The first phase begins with the last index and proceeds from right to left. The final phase of sorting orders the hosts according to their status, with hosts that are currently not available for load sharing (i.e., not in the ok state) listed at the end. When sorting is done on the particular index, the direction in which the hosts are sorted (increasing vs. decreasing values) is determined by the default order returned by lsinfo(1) for that index. This direction is chosen such that after sorting, the hosts are ordered from best to worst on that index. A `-' before the index name reverses the order.

If no sorting string is specified, the default sorting string is «r15s:pg".

When the run queue lengths `r15s', `r1m', or `r15m' are specified, the normalized value of the queue length is used when sorting.

Resource Usage String

This string specifies the expected resource usage of the task. The resource usage string is used in adjusting the load and for mapping tasks onto hosts during a placement decision (see lsplace(1) and lsloadadj(1) ). The syntax of the resource usage string is «res[=value]:res[=value]: ... :res[=value]" where `res' is one of the resources whose value is returned by lsload(1) . For example, «r1m=0.5:mem=20:swp=40» indicates that the task is expected to increase the 1-minute run queue length by 0.5, consume 20 Mbytes of memory and 40 Mbytes of swap space. If no value is specified, the task is assumed to be intensive in using that resource. In this case no more than one task will be assigned to a host regardless of how many CPUs it has. External indicies are not considered in the resource usage string.

The default resource usage for a task is assumed to be «r15s=1.0:r1m=1.0:r15m=1.0» which indicates a CPU intensive task which consumes few other resources.

RUN QUEUE LENGTHS

The raw CPU queue length is collected by the LIM from the kernel of the host operating system every 5 seconds. This number represents the total number of processes that are contending for the CPU(s) on the host. The raw queue length is averaged over 15 seconds, 1 minute, and 15 minutes to produce the `r15s', `r1m', and `r15m' load indices, respectively. The raw queue lengths can be viewed using lsload(1) .

In order to compare queue lengths on hosts having different numbers of CPUs and relative CPU speeds, two variations of the raw queue length are defined. The effective queue length attempts to account for multiprocessor hosts by considering the number of CPUs. The effective queue length is calculated by taking the multiprocessor's multitasking feature into consideration such that even if many of the processors are busy, the host's effective queue length may appear to be as good as an idle uniprocessor (as long as there is one or more idle processors). The effective queue length is the same as the raw queue length on uniprocessor hosts. Effective queue lengths are listed when using the -E option of lsload. The effective queue length is used by LIM when testing whether the host has exceeded its busy thresholds. When `r15s', `r1m', or `r15m' are specified in the selection section of resource requirement strings, they refer to the effective queue length. It is also used by lsbatch(1) when comparing the values specified for queue and host thresholds against the current load.

The normalized queue length is used by the LIM when making placement decision about where to send a job (see lsplace(1) ). It consideres both the number of CPUs and the CPU factor of a host. This is also the value returned by lsload when using the -N option. The normalized queue length attempts to estimate what the load would be on a host if an additional CPU bound job was dispatched to that host.

NOTES

If lsf.conf (see lsf(5) ) is not in the default /etc directory, set the environment variable LSF_ENVDIR to the name of the directory where lsf.conf is stored.

SEE ALSO

lsf.conf(5) , lim(8) , res(8) , nios(8) , lslib(3) , tcsh(1) , login(1)


Table of Contents