Non-MPI Distributed Memory Job Submission Examples

While the vast majority of distributed memory jobs use MPI to handle any required communication among the tasks, there are occasional exceptions. We present two possible methods of launching such jobs in this page.

We actually consider a hybrid distributed memory and shared memory parallelism as the most general case (if you wish to adapt to a distributed memory only case, the examples are largely the same except the number of threads would be set to 1). We use the non-MPI, multi-threaded version of hello-umd as our code. This is cheating a bit, as it really does not do any distributed memory communication, so we are basically running a bunch of distinct multi-threaded codes. However, even the MPI-enabled version of hello-umd really does not do any intra-task communication, so it is only a bit of a stretch. A real code would need to implement some means for the tasks to communicate with each other.

We present two submission scripts:

one using the Slurm srun command
one using the standard ssh command

In both cases, we basically just invoke 3 instances of hello-umd each with 15 threads, using the same sbatch parameters that we would use if this were a hybrid MPI and multi-threaded code. We chose this combination of instances and threads as this will require multiple nodes on the Deepthought2 cluster but still fit in the debug queue, which makes for a better example. It might or might not require multiple nodes on Juggernaut, as some nodes have quite a few cores. As mentioned above, these are actually three independent processes, but we pretend that it is a single job communicating over something other than MPI . We change the default message from hello-umd for each "task" to make it clear which task the message is coming from.

The hello-umd example code is just a variant of a Hello World! program.

Both examples basically consist of a job submission script submit.sh which gets submitted to the cluster via the sbatch command. The srun example also includes a small wrapper script around the hello-umd command to change the default message for each task.

(see for a listing and explanation of the script) which gets submitted to the cluster via the sbatch command.

The script is designed to show many good practices; including:

setting standard sbatch options within the script
loading the needed modules within the script
printing some useful diagnostic information at start of the script
creating a job specific temporary work directory
running the code and saving the exit code
copying back any files that should be retained
exiting with the exit code from the main application

Many of the practices above are rather overkill for such a simple job --- indeed, the vast majority of lines are for these "good practices" rather than the running of the intended code, but are included for educational purposes.

Case 1: Using the `srun` command

For this case, we use the Slurm srun command to spawn the different instances of hello-umd. Being a Slurm command, srun just knows what nodes were allocated to the job (and how many tasks allocated to each node). In order to change the message printed by the hello-umd command for each instantiation of the code, we use a small helper script.

The submission script submit.sh can be downloaded as plain text. We present a copy with line numbers for discussion below (click on lines to link to discussion for those lines):


#!/bin/bash
# The line above this is the "shebang" line.  It must be first line in script
#-----------------------------------------------------
#	OnDemand Job Template for Hello-UMD, Hybrid Srun version
#	Runs a multithreaded hello-world code using srun to span nodes
#-----------------------------------------------------
#
# Slurm sbatch parameters section:
#	Request 3 "tasks" with 15 CPU cores each
#SBATCH -n 3
#SBATCH -c 15
#	Request 5 minutes of walltime
#SBATCH -t 5
#	Request 1 GB of memory per CPU core
#SBATCH --mem-per-cpu=1024
#	Do not allow other jobs to run on same node
#SBATCH --exclusive
#       Run on debug partition for rapid turnaround.  You will need
#	to change this (remove the line) if walltime > 15 minutes
#SBATCH --partition=debug
#       Do not inherit the environment of the process running the
#       sbatch command.  This requires you to explicitly set up the
#       environment for the job in this script, improving reproducibility
#SBATCH --export=NONE
#

# This job will run the hello-umd with multiple "tasks" each using 
# multiple threads.  We use srun to launch the "tasks".
# We create a directory on parallel filesystem from where we actually 
# will run the job.

# Section to ensure we have the "module" command defined
unalias tap >& /dev/null
if [ -f ~/.bash_profile ]; then
	source ~/.bash_profile
elif [ -f ~/.profile ]; then
	source ~/.profile
fi

# Set SLURM_EXPORT_ENV to ALL. 
# We want the environment of the job script to be passed to all 
# tasks/processes of the job
export SLURM_EXPORT_ENV=ALL

# Module load section
# First clear our module list 
module purge
# and reload the standard modules
module load hpcc/deepthought2
# Load the desired compiler and package modules
module load gcc/8.4.0

module load hello-umd/1.5

# Section to make a scratch directory for this job
# Because different tasks might be on different nodes, and will need
# access to it, we put it in a parallel file system.  
# We include the SLURM jobid in the directory name to avoid interference 
# if multiple jobs running at same time.
TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"
mkdir $TMPWORKDIR
cd $TMPWORKDIR

# Section to output information identifying the job, etc.
echo "Slurm job ${SLURM_JOBID} running on"
hostname
echo "To run with ${SLURM_NTASKS} tasks of  ${SLURM_CORES_PER_TASK} threads"
echo "All nodes: ${SLURM_JOB_NODELIST}"
date
pwd
echo "Loaded modules are:"
module list
echo "Job will be started out of $TMPWORKDIR"

# Get the full path to our hello-umd executable.  It is best
# to provide the full path of our executable to srun etc.
# We export it so it is available to our helper script
export MYEXE=`which hello-umd`
echo "Main: Using executable $MYEXE"

# Use srun to launch our script.  
srun /bin/bash ${SLURM_SUBMIT_DIR}/hello-wrapper.sh > hello.out 2>&1
# Save the exit code
ECODE=$?

# Output from the above command was placed in multiple files in a work 
# directory in a parallel # filesystem.  That parallel filesystem does 
# _not_ get cleaned up automatically.  And it is not normally visible 
# from the Job Composer.
# To deal with this, we make a symlink from the job submit directory to
# the work directory for the job.  
#
# NOTE: The work directory will continue to exist until you delete it.  It will
# not get deleted when you delete the job in Job Composer.

ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir

echo "Job finished with exit code $ECODE.  Work dir is $TMPWORKDIR"
date

# Exit with the cached exit code
exit $ECODE

HelloUMD-HybridSrun job submission script
Line#	Code
`1`	`#!/bin/bash`
`2`	`# The line above this is the "shebang" line. It must be first line in script`
`3`	`#-----------------------------------------------------`
`4`	`# OnDemand Job Template for Hello-UMD, Hybrid Srun version`
`5`	`# Runs a multithreaded hello-world code using srun to span nodes`
`6`	`#-----------------------------------------------------`
`7`	`#`
`8`	`# Slurm sbatch parameters section:`
`9`	`# Request 3 "tasks" with 15 CPU cores each`
`10`	`#SBATCH -n 3`
`11`	`#SBATCH -c 15`
`12`	`# Request 5 minutes of walltime`
`13`	`#SBATCH -t 5`
`14`	`# Request 1 GB of memory per CPU core`
`15`	`#SBATCH --mem-per-cpu=1024`
`16`	`# Do not allow other jobs to run on same node`
`17`	`#SBATCH --exclusive`
`18`	`# Run on debug partition for rapid turnaround. You will need`
`19`	`# to change this (remove the line) if walltime > 15 minutes`
`20`	`#SBATCH --partition=debug`
`21`	`# Do not inherit the environment of the process running the`
`22`	`# sbatch command. This requires you to explicitly set up the`
`23`	`# environment for the job in this script, improving reproducibility`
`24`	`#SBATCH --export=NONE`
`25`	`#`
`26`
`27`	`# This job will run the hello-umd with multiple "tasks" each using`
`28`	`# multiple threads. We use srun to launch the "tasks".`
`29`	`# We create a directory on parallel filesystem from where we actually`
`30`	`# will run the job.`
`31`
`32`	`# Section to ensure we have the "module" command defined`
`33`	`unalias tap >& /dev/null`
`34`	`if [ -f ~/.bash_profile ]; then`
`35`	`source ~/.bash_profile`
`36`	`elif [ -f ~/.profile ]; then`
`37`	`source ~/.profile`
`38`	`fi`
`39`
`40`	`# Set SLURM_EXPORT_ENV to ALL.`
`41`	`# We want the environment of the job script to be passed to all`
`42`	`# tasks/processes of the job`
`43`	`export SLURM_EXPORT_ENV=ALL`
`44`
`45`	`# Module load section`
`46`	`# First clear our module list`
`47`	`module purge`
`48`	`# and reload the standard modules`
`49`	`module load hpcc/deepthought2`
`50`	`# Load the desired compiler and package modules`
`51`	`module load gcc/8.4.0`
`52`
`53`	`module load hello-umd/1.5`
`54`
`55`	`# Section to make a scratch directory for this job`
`56`	`# Because different tasks might be on different nodes, and will need`
`57`	`# access to it, we put it in a parallel file system.`
`58`	`# We include the SLURM jobid in the directory name to avoid interference`
`59`	`# if multiple jobs running at same time.`
`60`	`TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"`
`61`	`mkdir $TMPWORKDIR`
`62`	`cd $TMPWORKDIR`
`63`
`64`	`# Section to output information identifying the job, etc.`
`65`	`echo "Slurm job ${SLURM_JOBID} running on"`
`66`	`hostname`
`67`	`echo "To run with ${SLURM_NTASKS} tasks of ${SLURM_CORES_PER_TASK} threads"`
`68`	`echo "All nodes: ${SLURM_JOB_NODELIST}"`
`69`	`date`
`70`	`pwd`
`71`	`echo "Loaded modules are:"`
`72`	`module list`
`73`	`echo "Job will be started out of $TMPWORKDIR"`
`74`
`75`	`# Get the full path to our hello-umd executable. It is best`
`76`	`# to provide the full path of our executable to srun etc.`
`77`	`# We export it so it is available to our helper script`
`78`	export MYEXE=`which hello-umd`
`79`	`echo "Main: Using executable $MYEXE"`
`80`
`81`	`# Use srun to launch our script.`
`82`	`srun /bin/bash ${SLURM_SUBMIT_DIR}/hello-wrapper.sh > hello.out 2>&1`
`83`	`# Save the exit code`
`84`	`ECODE=$?`
`85`
`86`	`# Output from the above command was placed in multiple files in a work`
`87`	`# directory in a parallel # filesystem. That parallel filesystem does`
`88`	`# _not_ get cleaned up automatically. And it is not normally visible`
`89`	`# from the Job Composer.`
`90`	`# To deal with this, we make a symlink from the job submit directory to`
`91`	`# the work directory for the job.`
`92`	`#`
`93`	`# NOTE: The work directory will continue to exist until you delete it. It will`
`94`	`# not get deleted when you delete the job in Job Composer.`
`95`
`96`	`ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir`
`97`
`98`	`echo "Job finished with exit code $ECODE. Work dir is $TMPWORKDIR"`
`99`	`date`
`100`
`101`	`# Exit with the cached exit code`
`102`	`exit $ECODE`

Line 1: The Unix shebang

This is the standard Unix shebang line which defines which program should be used to interpret the script. This "shebang" MUST be the first line of the script --- it is not recognized if there are any line, even comment lines and/or blank lines before it. The Slurm scheduler requires that your job script starts with a shebang line.

Like most of our examples, this shebang uses the /bin/bash interpretter, which is the bash (Bourne-again) shell. This is a compatible replacement to and enhancement of the original Unix Bourne shell. You can opt to specify another shell or interpretter if you so desire, common choices are:

the Bourne shell (/bin/sh) in your shebang (note that this basically just uses bash in a restricted mode)
or one of the C shell variants (/bin/csh or /bin/tcsh)

However, we recommend the use of the bash shell, as it has the support for scripting; this might not matter for most job submission scripts because of their simplicity, but might if you start to need more advanced features. The examples generally use the bash shell for this reason.

Lines 3-6: Comments

These are comment lines describing the script. Note that the bash (as well as sh, csh, and tcsh) shells will treat any line starting with an octothorpe/pound/number sign (#) as a comment. This includes some special lines which are significant and effect the Slurm scheduler:

The "shebang" line is a comment to the shell, but is not ignored by the system or the Slurm commands, and controls which shell is used to interpret the rest of the script file.
The various lines starting with #SBATCH are used to control the Slurm scheduler and will be discussed elsewhere.

But other than the cases above, feel free to use comment lines to remind yourself (and maybe others reading your script) of what the script is doing.

Lines 10-24: Sbatch options

The various lines starting with #SBATCH can be used to control how the Slurm sbatch command submits the job. Basically, any command line flags can be provided witha #SBATCH line in the script, and you can mix and match command line options and options in #SBATCH.

NOTE: any #SBATCH must precede any "executable lines" in the script. It is recommended that you have nothing but the shebang line, comments and blank lines before any #SBATCH lines.

Lines 10-11

These lines requests 3 (non-MPI) tasks (--ntasks=3 or -n 3) with each task having 15 CPU cores for each task (--cpus-per-task=15 or -c 15).

Note that we do not specify a number of nodes, and we recommend that you do not specify a node count for most jobs --- by default Slurm will allocate enough nodes to satisfy this job's needs, and if you specify a value which is incorrect it will only cause problems.

We choose 3 tasks of 15 cores as this will usually require multiple nodes on both Deepthought2 and Juggernaut (although some of the larger Juggernaut nodes can support this request on a single node), and so makes a better demonstration, but will still fit in the debug partition.

Line 13

This line requests a walltime of 5 minutes.The #SBATCH -t TIME line sets the time limit for the job. The requested TIME value can take any of a number of formats, including:

MINUTES
MINUTES:SECONDS
HOURS:MINUTES:SECONDS
DAYS-HOURS
DAYS-HOURS:MINUTES
DAYS-HOURS:MINUTES:SECONDS

It is important to set the time limit appropriately. It must be set longer than you expect the job to run, preferable with a modest cushion for error --- when the time limit is up, the job will be canceled.

You do not want to make the requested time excessive, either. Although you are only charged for the actual time used (i.e. if you requested 12 hours and the job finished in 11 hours, your job is only charged for 11 not 12 hours), there are other downsides of requesting too much wall time. Among them, the job may spend more time in the queue, and might not run at all if your account is low on funds (the scheduler will use the requested wall time to estimate the number of SUs the job will consume, and will not start a job if it and all currently running jobs are projected to have sufficient SUs to complete). And if it starts, and excessive walltime might block other jobs from running for a similar reason.

In general, you should estimate the maximum run time, and pad it by 10% or so.

In this case, the hello-umd will run very quickly; much less than 5 minutes.

Line 15

This sets the amount of memory to be requested for the job.

There are several parameters you can give to Slurm/sbatch to specify the memory to be allocated for the job. It is recommended that you always include a memory request for your job --- if omitted it will default to 6GB per CPU core. The recommended way to request memory is with the --mem-per-cpu=N flag. Here N is in MB. This will request N MB of RAM for each CPU core allocated to the job. Since you often wish to ensure each process in the job has sufficient memory, this is usually the best way to do so.

An alternative is with the --mem=N flag. This sets the maximum memory use by node. Again, N is in MB. This could be useful for single node jobs, especially multithreaded jobs, as there is only a single node and threads generally share significant amounts of memory. But for MPI jobs the --mem-per-cpu flag is usually more appropriate and convenient.

For MPI codes, we recommend using --mem-per-cpu instead of --mem since you generally wish to ensure each MPI task has sufficient memory.

The hello-umd does not use much memory, so 1 GB per core is plenty.

Line 17

This line tells Slurm that you do not wish to allow any other jobs to run on the nodes allocated to your job while it is running.

The lines SBATCH --share, SBATCH --oversubscribe, or SBATCH --exclusive decide whether or not other jobs are able to run on the same node(s) are your job.

NOTE: The Slurm scheduler changed the name of the flag for "shared" mode. The proper flag is now #SBATCH --oversubscribe. You must use the "oversubscribe" flag on Juggernaut. You can currently use either form on Deepthought2, but the "#SBATCH --share form is deprecated and at some point will no longer be supported. Both forms effectively do the same thing.

In exclusive mode, no other jobs are able to run on a node allocated to your job while your job is running. This greatly reduces the possibility of another job interfering with the running of your job. But if you are not using all of the resources of the node(s) your job is running on, it is also wasteful of resources. In exclusive mode, we charge your job for all of the cores on the nodes allocated to your job, regardless of whether you are using them or not.

In share/oversubscribe mode, other jobs (including those of other users) may run on the same node as your job as long as there are sufficient resources for both. We make efforts to try to prevent jobs from interfering with each other, but such methods are not perfect, so while the risk of interference is small, it is much greater risk in share mode than in exclusive mode. However, in share mode you are only charged for the requested number of cores (not all cores on the node unless you requested such), and your job might spend less time in the queue (since it can avail itself of nodes which are in use by other jobs but have some unallocated resources).

Our recommendation is that large (many-core/many-node) and/or long running jobs use exclusive mode, as the potential cost of adverse interence is greatest here. Plus large jobs tend to use most if not all cores of most of the nodes they run on, so the cost of exclusive mode tends to be less. Smaller jobs, and single core jobs in particular, generally benefit from share/oversubscribe mode, as they tend to less fully utiliize the nodes they run on (indeed, on a standard Deepthought2 node, a single core job will only use 5% of the CPU cores).

The default for the cluster is, unless you specify otherwise, to default single core jobs to share mode, and multicore/multinode jobs to exclusive mode. This is not an ideal choice, and might change in the future. We recommend that you always explicitly request either share/oversubscribe or exclusive as appropriate.

Again, as a multi-core job, #SBATCH --exclusive is the default, but we recommend explicitly stating this.

Line 20

This line specifies what partition we wish to use. This line states that we wish to submit this job to the debug partition. The debug partition has limited resources, and a maximum 15 minute walltime, but this is a very short and small job, so the debug partition suits it well.

For real production work, the debug queue is probably not adequate, in which case it is recommended that you just omit this line and let the scheduler select an appropriate partition for you.

Line 24

This line instructs sbatch not to let the job process inherit the environment of the process which invoked the sbatch command. This requires the job script to explicitly set up its required environment, as it can no longer depend on environmental settings you had when you run the sbatch command. While this may require a few more lines in your script, it is a good practice and improves the reproducibility of the job script --- without this it is possible the job would only run correctly if you had a certain module loaded or variable set when you submit the job.

Lines 33-38: Reading the bash profile

These lines make sure that the module command is available in your script. They are generally only required if the shell specified in the shebang line does not match your default login shell, in which case the proper startup files likely did not get invoked.

The unalias line is to ensure that there is no vestigal tap command. It is sometimes needed on RHEL6 systems, should not be needed on the newer platforms but is harmless when not needed. The remaining lines will read in the appropriate dot files for the bash shell --- the if, then, elif construct enables this script to work on both the Deepthought2 and Juggernaut clusters, which have a slightly different name for the bash startup file.

Line 43: Setting SLURM_EXPORT_ENV

This line changes an environemntal variable that affects how various Slurm commands operate. This line sets the variable SLURM_EXPORT_ENV to the value ALL, which causes the environment to be shared with other processes spawned by Slurm commands (which also includes mpirun and similar).

At first this might seem to contradict our recommendation to use #SBATCH --export=NONE, but it really does not. The #SBATCH --export=NONE setting will cause the job script not to inherit the environment of the shell in which you ran the sbatch command. But we are now in the job script, which because of the --export=NONE flag, has it's own environment which was set up in the script. We want this environment to be shared with other MPI tasks and processes spawned by this job. These MPI tasks and processes will inherit the environment set up in this script, not the environment from which the sbatch command ran.

This important for MPI jobs like this, because otherwise the mpirun code might not spawn properly.

Lines 47-53: Module loads

These lines ensure that the proper modules are loaded.

To begin with, we do a module purge to clear out any previously loaded modules. This prevents them from interfering with subsequent module loads. Then we load the default module for the cluster with module load hpcc/deepthought2; this line should be adjusted for the cluster being used (e.g. module load hpcc/juggernaut for the Juggernaut cluster).

We then load the desired compiler and MPI library, finally the hello-umd. We recommend that you always load the compiler module first, and then if needed the MPI library, and then any higher level applications. Many packages have different builds for different compilers, MPI libraries, etc., and the module command is smart enough to load the correct versions of these. Note that we specify the versions; if you omitted the version the module command will usually try to the load most recent version installed.

We recommend that you always specify the specific version you want in your job scripts --- this makes your job more reproducible. Systems staff may add newer versions of existing packages without notification, and if you do not specify a version, the default version may change without your expecting it. In particular, a job that runs fine today using today's default version might crash unexpectedly when you try running it again in six months because the packages it uses were updated and your inputs are not compatible with the new version of the code.

Lines 60-62: Creating a working directory

These lines generate a working directory on the high-performance lustre filesystem. Generally for MPI jobs you want a working directory which is accessible to all tasks running as part of the job, regardless of what node it is running on. The /tmp is specific to a single node, so that is usually not suitable for MPI jobs. The lustre filesystem is accessible by all of the compute nodes of cluster, so it is a good choice for MPI jobs.

The TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"> or similar line defines an environmental variable containing the name of our chosen work directory. The ${SLURM_JOBID} references another environmental variable which is automatically set by Slurm (when the job starts to run) to the job number for this job --- using this in our work directory names helps ensure it will not conflict with any other job. The mkdir command creates this work directory, and the cd changes our working directory to that directory--- note in those last commands the use of the environmental variable we just created to hold the directory name.

Lines 65-73: Identifiying ourselves

These lines print some information about the job into the Slurm output file. It uses the environmental variables SLURM_JOBID, SLURM_NTASKS, SLURM_JOB_NUM_NODES, and SLURM_JOB_NODELIST which are set by Slurm at the start of the job to list the job number, the number of MPI tasks, the number of nodes, and the names of the nodes allocated to the job. It also prints the time and date that the job started (the date command), the working directory (the pwd command), and the list of loaded modules (the module list command). Although you are probably not interested in any of that information if the job runs as expected, they can often be helpful in diagnosing why things did not work as expected.

Lines 78-79: Find the binary

These lines determine the full path to the hello-umd command, and stores it in an environmental variable named MYEXE, and then outputs the path for added diagnostics. We find that MPI jobs run better when you provide the absolute path to the executable to the mpirun or similar command.

In this case, we include the bash export qualifier so that the value of MYEXE will be passed to the srun command.

Line 82: Actually run the command

Finally! We actually run the command for this job script. We use the Slurm srun command for this. This launches the specified command on every node allocated to the job, as many times as the node has tasks allocated to it.

As this is a non-MPI job, we are using a version of hello-umd which does not support MPI. Therefore, from the standpoint of the hello-umd, it is always MPI task number 0, and if we simply ran hello-umd directly, it would be difficult to distinguish between the different tasks. So we use a wrapper script hello-wrapper.sh which uses the environment variable SLURM_PROCID, automatically set by the srun command for each instance of the code to the number of the task (starting at zero). The wrapper script invokes hello-umd with a different message to identify the task.

We actually directly pass the /bin/bash binary to srun, and give the wrapper script as an argument to bash to avoid any issues with execute permissions on the script.

We run the code so as to save the output in the file hello.out in the current working directory. The > operator does output redirection, meaning that all of the standard output goes to the specified file (hello.out in this case). The >&1 operator causes the standard error output to be sent to the standard output stream (1 is the number for the standard output stream), and since standard output was redirected to the file, so will the standard error be.

For this simple case, we could have omitted the redirection of standard output and standard error, in which case any such output would end up in the Slurm output file (usually named slurm-JOBNUMBER.out. However, if your job produces a lot (many MBs) of output to standard output/standard error, this can be problematic. It is good practice to redirect output if you know your code will produce more than 1000 or so lines of output.

Line 84: Store the error code

This line stores the shell error code from the previous command (which actually ran the code we are interested in). This is not needed if the code is that last line in your job script, but it is not in this case (we have to copy some files, etc). Slurm will look at the exit code of the last command run in the script file when trying to determine if the job succeeded or failed, and we do not want it to incorrectly report the job as succeeding if the application we wanted to run failed but a copy command in our clean-up section was successful.

The special shell variable $? stores the exit code from the last command. Normally it will be 0 if the command was successful, and non-zero otherwise. But it only works for the last command, so we save it in the variable ECODE.

Line 96: Symlink the working directory

This line creates a symlink from the submission directory to the working directory. This line is really only present for the sake of users running the code from within the OnDemand Portal, since the portal's Job Composer only shows the submission directory, and without this line would not see the output in the working directory.

Lines 98-99: Say goodbye

These lines print some useful information at the end of the job. Basically they just say that the job finished, and prints the exit code we stored in ECODE, and then prints the date/time of completion using the date command

Line 102: Exit

This line exits the script, setting the exit code for the script to the exit code of our application that we saved in the environment variable ECODE. This means that the script will have the same exit code as the application, which will allow Slurm to better determine if the job was successful or not. (If we did not do this, the error code of the script will be the error code of the last command that ran, in this case the date command which should never fail. So even if your application aborted, the script would return a successful (0) error code, and Slurm would think the job succeeded if this line was omitted).

Line 103: Trailing blank line

We recommend that you get into the habit of leaving one or more blank lines at the end of your script. This is especially true if you write the scripts in Windows and then transfer to the cluster.

The reason for this is that if the last line does not have the proper line termination character, it will be ignored by the shell. Over the years, we have had many users confused as to why there job ended as soon as it started without error, etc. --- it turns out the last line of their script was the line which actually ran their code, and it was missing the correct line termination character. Therefore, the job ran, did some initialization and module loads, and exited without running the command they were most interested in because of a missing line termination character (which can be easily overlooked).

This problem most frequently occurs when transferring files between Unix/Linux and Windows operating systems. While there are utilities that can add the correct line termination characters, the easy solution in my opinion is to just add one or more blank lines at the end of your script --- if the shell ignores the blank lines, you do not care.

The submission script uses a small helper script (which can be downloaded here). We list it below:


#!/bin/bash

echo "Starting task $SLURM_PROCID, using executable $MYEXE"

$MYEXE -t ${SLURM_CPUS_PER_TASK} -m "'Hello from task $SLURM_PROCID'"

HelloUMD-HybridSrun helper script
Line#	Code
`1`	`#!/bin/bash`
`2`
`3`	`echo "Starting task $SLURM_PROCID, using executable $MYEXE"`
`4`
`5`	`$MYEXE -t ${SLURM_CPUS_PER_TASK} -m "'Hello from task $SLURM_PROCID'"`

Line 1: The Unix shebang

the Bourne shell (/bin/sh) in your shebang (note that this basically just uses bash in a restricted mode)
or one of the C shell variants (/bin/csh or /bin/tcsh)

Line 3: Identify ourselves

We print a simple identification of the wrapper script

Line 5: Run hello-umd

Here we run hello-umd. We use the arguments -t ${SLURM_CPUS_PER_TASK} to have it use the number of threads we declared in the submit.sh script. We also use the -m flag to change the default message of the hello-umd script to identify the pseudo-task.

The environmental variable SLURM_CPUS_PER_TASK was set by Slurm before the submit.sh script ran. Because we set the variable SLURM_EXPORT_ENV to the value ALL in the script, it gets exported to our wrapper script by srun.

The environmental variable SLURM_PROCID is set by srun to an integer (starting at 0) to identify which process within the task is being launched. We use this to identify our "task".

Case 2: Using the `ssh` command

For this case, we use the ssh command to spawn the different instances of hello-umd on all the nodes allocated to the job, once for each task allocated to the node. This information is contained in the environment variables SLURM_JOB_NODELIST and SLURM_TASKS_PER_NODE which are set by Slurm automatically when your job script starts. However, Slurm uses a condensed format for these variables, which is difficult to use. So we make use of an external script /software/acigs-utilities/bin/slurm_hostnames_by_tasks to convert these to a more useful format, listing every node allocated to the job as many times as it has tasks allocated to it.

NOTE: In order for this example using ssh to work, you must have previously enabled password-less ssh between the nodes of the cluster. Instructions for setting up password-less ssh between nodes of the cluster. This only needs to be done once, but the job will not successfully run without this.

The submission script submit.sh can be downloaded as plain text. We present a copy with line numbers for discussion below (click on lines to link to discussion for those lines):


#!/bin/bash
# The line above this is the "shebang" line.  It must be first line in script
#-----------------------------------------------------
#	OnDemand Job Template for Hello-UMD, Hybrid Ssh version
#	Runs a multithreaded hello-world code using ssh to span nodes
#-----------------------------------------------------
#
# Slurm sbatch parameters section:
#	Request 3 "tasks" with 15 CPU cores each
#SBATCH -n 3
#SBATCH -c  15
#	Request 5 minutes of walltime
#SBATCH -t 5
#	Request 1 GB of memory per CPU core
#SBATCH --mem-per-cpu=1024
#	Do not allow other jobs to run on same node
#SBATCH --exclusive
#       Run on debug partition for rapid turnaround.  You will need
#	to change this (remove the line) if walltime > 15 minutes
#SBATCH --partition=debug
#       Do not inherit the environment of the process running the
#       sbatch command.  This requires you to explicitly set up the
#       environment for the job in this script, improving reproducibility
#SBATCH --export=NONE
#

# This job will run the hello-umd with multiple "tasks" each using 
# multiple threads.  The tasks are launched using ssh instead of MPI.
# We create a directory on parallel filesystem from  where we actually 
# will run the job.

# We start by defining a function to launch the program on every
# node allocated to the job (and for each node, as many times as
# the number of tasks allocated to the node).  
# The use of this function makes the code clearer, plus allows us
# to easily capture the output to a single file.
launch_ssh_tasks()
{	# First, get the full path to our hello-umd executable.  It is best
	# to provide the full path of our executable to ssh, etc.
	MYEXE=`which hello-umd`
	echo "Using executable $MYEXE"

	# Now we get a list of hostnames for nodes our job is running
	# on.  We want each hostname to appear in the list exactly as
	# many times as the number of tasks allocated to the node.
	# This information is in the Slurm environmental variables
	# SLURM_JOB_NODELIST and SLURM_TASKS_PER_NODE, but in an abbreviated
	# form.  We call a helper script to convert to the format we want.
	MYNODES=`/software/acigs-utilities/bin/slurm_hostnames_by_tasks \
		-r ' ' -m hosts_by_tasks`

	# Keep track of our task number and the pid for each task
	tasknum=0
	pidlist=()

	# We now use ssh to spawn 3 copies of hello-umd on 
	# the nodes allocated to our job, with 15 threads
	# for each instance of hello-umd.
	for node in $MYNODES
	do
		# We run in background to get parallelism
		ssh -q $node $MYEXE \
			-t ${SLURM_CPUS_PER_TASK} \
			-m "'Hello from task $tasknum'" &
		# Collect the process id (pid)
		pidlist[$tasknum]=$!
		# and increment tasknum
		tasknum=$(( tasknum + 1 ))
	done

	# Wait for all of our background jobs to complete and if any
	# had non-zero exit codes, warn us and set the main script's 
	# exit code to non-zero
	ECODE=0
	tasknum=0
	for pid in "${pidlist[@]}"
	do
		wait $pid
		tmpecode=$?
		if  [ $tmpecode -ne 0 ]; then
			ECODE=$tmpecode
			echo >&2 "Task $tasknum had non-zero exit code $ECODE"
		fi
		tasknum=$(( tasknum + 1 ))
	done
	return $ECODE
}

# Section to ensure we have the "module" command defined
unalias tap >& /dev/null
if [ -f ~/.bash_profile ]; then
	source ~/.bash_profile
elif [ -f ~/.profile ]; then
	source ~/.profile
fi

# Module load section
# First clear our module list 
module purge
# and reload the standard modules
module load hpcc/deepthought2
# Load the desired compiler and package modules
module load gcc/8.4.0
module load hello-umd/1.5

# Section to make a scratch directory for this job
# Because different tasks might be on different nodes, and will need
# access to it, we put it in a parallel file system.  
# We include the SLURM jobid in the directory name to avoid interference 
# if multiple jobs running at same time.
TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"
mkdir $TMPWORKDIR
cd $TMPWORKDIR

# Section to output information identifying the job, etc.
echo "Slurm job ${SLURM_JOBID} running on"
hostname
echo "To run with ${SLURM_NTASKS} tasks of  ${SLURM_CPUS_PER_TASK} threads"
echo "All nodes: ${SLURM_JOB_NODELIST}"
date
pwd
echo "Loaded modules are:"
module list
echo "Job will be started out of $TMPWORKDIR"

# Launch our code using the previously defined function, saving output
# to hello.out
launch_ssh_tasks > $TMPWORKDIR/hello.out 2>&1
# and save exit code
ECODE=$?

# Output from the above command was placed in multiple files in a work 
# directory in a parallel # filesystem.  That parallel filesystem does 
# _not_ get cleaned up automatically.  And it is not normally visible 
# from the Job Composer.
# To deal with this, we make a symlink from the job submit directory to
# the work directory for the job.  
#
# NOTE: The work directory will continue to exist until you delete it.  It will
# not get deleted when you delete the job in Job Composer.

ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir

echo "Job finished with exit code $ECODE.  Work dir is $TMPWORKDIR"
date

# Exit with the cached exit code
exit $ECODE

HelloUMD-HybridSsh job submission script
Line#	Code
`1`	`#!/bin/bash`
`2`	`# The line above this is the "shebang" line. It must be first line in script`
`3`	`#-----------------------------------------------------`
`4`	`# OnDemand Job Template for Hello-UMD, Hybrid Ssh version`
`5`	`# Runs a multithreaded hello-world code using ssh to span nodes`
`6`	`#-----------------------------------------------------`
`7`	`#`
`8`	`# Slurm sbatch parameters section:`
`9`	`# Request 3 "tasks" with 15 CPU cores each`
`10`	`#SBATCH -n 3`
`11`	`#SBATCH -c 15`
`12`	`# Request 5 minutes of walltime`
`13`	`#SBATCH -t 5`
`14`	`# Request 1 GB of memory per CPU core`
`15`	`#SBATCH --mem-per-cpu=1024`
`16`	`# Do not allow other jobs to run on same node`
`17`	`#SBATCH --exclusive`
`18`	`# Run on debug partition for rapid turnaround. You will need`
`19`	`# to change this (remove the line) if walltime > 15 minutes`
`20`	`#SBATCH --partition=debug`
`21`	`# Do not inherit the environment of the process running the`
`22`	`# sbatch command. This requires you to explicitly set up the`
`23`	`# environment for the job in this script, improving reproducibility`
`24`	`#SBATCH --export=NONE`
`25`	`#`
`26`
`27`	`# This job will run the hello-umd with multiple "tasks" each using`
`28`	`# multiple threads. The tasks are launched using ssh instead of MPI.`
`29`	`# We create a directory on parallel filesystem from where we actually`
`30`	`# will run the job.`
`31`
`32`	`# We start by defining a function to launch the program on every`
`33`	`# node allocated to the job (and for each node, as many times as`
`34`	`# the number of tasks allocated to the node).`
`35`	`# The use of this function makes the code clearer, plus allows us`
`36`	`# to easily capture the output to a single file.`
`37`	`launch_ssh_tasks()`
`38`	`{ # First, get the full path to our hello-umd executable. It is best`
`39`	`# to provide the full path of our executable to ssh, etc.`
`40`	MYEXE=`which hello-umd`
`41`	`echo "Using executable $MYEXE"`
`42`
`43`	`# Now we get a list of hostnames for nodes our job is running`
`44`	`# on. We want each hostname to appear in the list exactly as`
`45`	`# many times as the number of tasks allocated to the node.`
`46`	`# This information is in the Slurm environmental variables`
`47`	`# SLURM_JOB_NODELIST and SLURM_TASKS_PER_NODE, but in an abbreviated`
`48`	`# form. We call a helper script to convert to the format we want.`
`49`	MYNODES=`/software/acigs-utilities/bin/slurm_hostnames_by_tasks \
`50`	-r ' ' -m hosts_by_tasks`
`51`
`52`	`# Keep track of our task number and the pid for each task`
`53`	`tasknum=0`
`54`	`pidlist=()`
`55`
`56`	`# We now use ssh to spawn 3 copies of hello-umd on`
`57`	`# the nodes allocated to our job, with 15 threads`
`58`	`# for each instance of hello-umd.`
`59`	`for node in $MYNODES`
`60`	`do`
`61`	`# We run in background to get parallelism`
`62`	`ssh -q $node $MYEXE \`
`63`	`-t ${SLURM_CPUS_PER_TASK} \`
`64`	`-m "'Hello from task $tasknum'" &`
`65`	`# Collect the process id (pid)`
`66`	`pidlist[$tasknum]=$!`
`67`	`# and increment tasknum`
`68`	`tasknum=$(( tasknum + 1 ))`
`69`	`done`
`70`
`71`	`# Wait for all of our background jobs to complete and if any`
`72`	`# had non-zero exit codes, warn us and set the main script's`
`73`	`# exit code to non-zero`
`74`	`ECODE=0`
`75`	`tasknum=0`
`76`	`for pid in "${pidlist[@]}"`
`77`	`do`
`78`	`wait $pid`
`79`	`tmpecode=$?`
`80`	`if [ $tmpecode -ne 0 ]; then`
`81`	`ECODE=$tmpecode`
`82`	`echo >&2 "Task $tasknum had non-zero exit code $ECODE"`
`83`	`fi`
`84`	`tasknum=$(( tasknum + 1 ))`
`85`	`done`
`86`	`return $ECODE`
`87`	`}`
`88`
`89`	`# Section to ensure we have the "module" command defined`
`90`	`unalias tap >& /dev/null`
`91`	`if [ -f ~/.bash_profile ]; then`
`92`	`source ~/.bash_profile`
`93`	`elif [ -f ~/.profile ]; then`
`94`	`source ~/.profile`
`95`	`fi`
`96`
`97`	`# Module load section`
`98`	`# First clear our module list`
`99`	`module purge`
`100`	`# and reload the standard modules`
`101`	`module load hpcc/deepthought2`
`102`	`# Load the desired compiler and package modules`
`103`	`module load gcc/8.4.0`
`104`	`module load hello-umd/1.5`
`105`
`106`	`# Section to make a scratch directory for this job`
`107`	`# Because different tasks might be on different nodes, and will need`
`108`	`# access to it, we put it in a parallel file system.`
`109`	`# We include the SLURM jobid in the directory name to avoid interference`
`110`	`# if multiple jobs running at same time.`
`111`	`TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"`
`112`	`mkdir $TMPWORKDIR`
`113`	`cd $TMPWORKDIR`
`114`
`115`	`# Section to output information identifying the job, etc.`
`116`	`echo "Slurm job ${SLURM_JOBID} running on"`
`117`	`hostname`
`118`	`echo "To run with ${SLURM_NTASKS} tasks of ${SLURM_CPUS_PER_TASK} threads"`
`119`	`echo "All nodes: ${SLURM_JOB_NODELIST}"`
`120`	`date`
`121`	`pwd`
`122`	`echo "Loaded modules are:"`
`123`	`module list`
`124`	`echo "Job will be started out of $TMPWORKDIR"`
`125`
`126`	`# Launch our code using the previously defined function, saving output`
`127`	`# to hello.out`
`128`	`launch_ssh_tasks > $TMPWORKDIR/hello.out 2>&1`
`129`	`# and save exit code`
`130`	`ECODE=$?`
`131`
`132`	`# Output from the above command was placed in multiple files in a work`
`133`	`# directory in a parallel # filesystem. That parallel filesystem does`
`134`	`# _not_ get cleaned up automatically. And it is not normally visible`
`135`	`# from the Job Composer.`
`136`	`# To deal with this, we make a symlink from the job submit directory to`
`137`	`# the work directory for the job.`
`138`	`#`
`139`	`# NOTE: The work directory will continue to exist until you delete it. It will`
`140`	`# not get deleted when you delete the job in Job Composer.`
`141`
`142`	`ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir`
`143`
`144`	`echo "Job finished with exit code $ECODE. Work dir is $TMPWORKDIR"`
`145`	`date`
`146`
`147`	`# Exit with the cached exit code`
`148`	`exit $ECODE`

Line 1: The Unix shebang

the Bourne shell (/bin/sh) in your shebang (note that this basically just uses bash in a restricted mode)
or one of the C shell variants (/bin/csh or /bin/tcsh)

Lines 3-6: Comments

The "shebang" line is a comment to the shell, but is not ignored by the system or the Slurm commands, and controls which shell is used to interpret the rest of the script file.
The various lines starting with #SBATCH are used to control the Slurm scheduler and will be discussed elsewhere.

But other than the cases above, feel free to use comment lines to remind yourself (and maybe others reading your script) of what the script is doing.

Lines 10-24: Sbatch options

NOTE: any #SBATCH must precede any "executable lines" in the script. It is recommended that you have nothing but the shebang line, comments and blank lines before any #SBATCH lines.

Lines 10-11

These lines requests 3 tasks (--ntasks=3 or -n 3) with each task having 15 CPU cores (--cpus-per-task=15 or -c 15). Because we are not using MPI , we use the term "task" in a more general sense as an instance of the (multithreaded) program being run.

Line 13

This line requests a walltime of 5 minutes.The #SBATCH -t TIME line sets the time limit for the job. The requested TIME value can take any of a number of formats, including:

MINUTES
MINUTES:SECONDS
HOURS:MINUTES:SECONDS
DAYS-HOURS
DAYS-HOURS:MINUTES
DAYS-HOURS:MINUTES:SECONDS

In general, you should estimate the maximum run time, and pad it by 10% or so.

In this case, the hello-umd will run very quickly; much less than 5 minutes.

Line 15

This sets the amount of memory to be requested for the job.

For MPI codes, we recommend using --mem-per-cpu instead of --mem since you generally wish to ensure each MPI task has sufficient memory.

The hello-umd does not use much memory, so 1 GB per core is plenty.

Line 17

This line tells Slurm that you do not wish to allow any other jobs to run on the nodes allocated to your job while it is running.

The lines SBATCH --share, SBATCH --oversubscribe, or SBATCH --exclusive decide whether or not other jobs are able to run on the same node(s) are your job.

Again, as a multi-core job, #SBATCH --exclusive is the default, but we recommend explicitly stating this.

Line 20

For real production work, the debug queue is probably not adequate, in which case it is recommended that you just omit this line and let the scheduler select an appropriate partition for you.

Line 24

Lines 37-87: Declaration of helper function

This defines a bash function launch_ssh_tasks to handle launching via ssh the instances of hello-umd on the nodes allocated to our job. We need to launch as many instances per node as there were tasks allocated to the node.

We define this code in a bash function because:

It makes the code clearer.
It allows us to have all of the output redirected to a single file.

This section defines the function; the code here does not actually run until the function is invoked.

Line 40: Find our binary

This line determines the full path to the hello-umd command, and stores it in an environmental variable named MYEXE, and then outputs the path for added diagnostics. We find that most jobs run better when you provide the absolute path to the executable to the ssh command.

Lines 49-50: Get the nodes we are running on

These lines determine the nodes which were allocated to our job. In particular, we generate a list of hostnames of the nodes allocated to our job, wherein each node occurs in the list exactly as many times as the number of tasks allocated to the node. This enables us to launch a hello-umd process for every task we requested.

The information we want is contained in the environment variables set by the Slurm scheduler SLURM_JOB_NODELIST and SLURM_TASKS_PER_NODE, but Slurm uses an abbreviated form in these variables. So we use a helper script installed at /software/acigs-utilities/bin/slurm_hostnames_by_tasks to do the work of converting the contents of these variables to the more useful form described above. The argument -r ' ' causes the node names to be separated by a space (which makes it easy to use in a bash for loop), and the argument -n hosts_by_tasks causes it to print a list of hostnames with the hostnames occuring once for each task allocated to the node. The "\" character at the end of first line is a continuation character --- it allows us to split the command across two lines but still have the shell interpret it as a single command.

Line 53: Initialize the task number

Here we initialize the variable tasknum for our loop. This is mainly so that we can change the default message of hello-umd to show which task is producing the output.

Line 54: Initialize the list of process ids (pids)

Here we initialize the array variable pidlist. We will use this variable to store the process id (pid) for each of our child processes, which we use later on to collect the exit codes.

Lines 59-69: Loop to spawn the instances

Here we loop over all of the nodes allocated to our job (hitting a given node multiple times if multiple tasks were allocated to it). For each iteration of the loop, we use ssh to spawn an instance of hello-umd on the node. We give the -q argument to ssh to suppress the standard ssh output (e.g. the unauthorized use warnings, etc). The "\" characters at the end of the line allow us to split the line to make it more readable, but still have the shell interpret it as a single line.

The ssh will launch the command on the named $node, and will run our hello-umd code (whose path is stored in the variable $MYEXE) with the argument -t ${SLURM_CPUS_PER_TASK} to have it run with the requested number of threads. The argument -m "'Hello from task $taskname'" is also given --- this changes the default hello-umd message so as to identify which task generated the message (we are using a non-MPI build of hello-umd, so the hello-umd code assumes it is task 0 of 1 in all cases, therefore the only way to distinguish is by changing the message text.).

The "&" symbol at the end of the ssh command causes the ssh command to run in the background --- this means that the script continues immediately with the command after the ssh command while the ssh command continues to run. This is needed in order to get the required parallelism. Without it, only one instance of hello-umd would be running at a time --- we would use ssh to spawn one instance of hello-umd on the first node, wait for it to complete, and once it is done continue the loop and spawn an instance on the second node, etc.

Immediately after the ssh command, we store the process id (pid) of the ssh command we just ran in the background. This is done using the special shell variable $!. We need this value so that we can collect the exit code for that process. Finally, we increment the variable tasknum.

Lines 74: Initialize ECODE variable

This line initializes the variable ECODE where we will store the overall exit code for the function. We initialize it to 0, indicating a successful run, but if any of the spawned processes have a non-zero exit code, we change it to match. So it will only remain 0 if all of the spawned processes have zero exit codes (indicating success).

Line 75: Initialize the task number

Here we again initialize the variable tasknum for our second loop.

Lines 76-85: Loop to collect exit codes

Here we loop again, this time over all of the process IDs (pids) of the background processes we spawned. We do this for two reasons:

We want to wait until all of the background processes complete before we continue with the rest of this submission script. Otherwise, the submission script finishes, ending the job, possibly before our background processes finished. For a real job, unlike our trivial hello-umd example, the background processes are likely to run for much longer than the time needed to finish this script, so here it is especially important to wait for the background tasks to complete before exiting the submission script (and terminating the job, which would cause the background tasks to be prematurely terminated).
We want to collect the exit codes for each of the background processes and check if any are non-zero, which in the Unix world typically means that there was an error. If a non-zero exit code is found, we print a warning, and set the exit code of the function to that non-zero exit code (which will later be used to set the exit code of the script to non-zero).

The first reason could be satisfied with a simple wait command without the process ID number --- that would wait until all child processes have completed. But that only returns the exit code of the last process to complete, and if errors occur in some of the processes, those processes would tend to terminate earlier and so therefore be missed.

In this loop, we loop over each of the process IDs of the background ssh processes. The first line in the loop, wait $pid, will cause the processing of the script to halt until the process corresponding to $pid has terminated. The exit code of the wait command is then stored in the variable tmpecode. The wait normally returns the exit code of the specified background process. This exit code is the exit code of the ssh process, but ssh will return the exit code of the hello-umd command it was running on the allocated node. So this will tell us whether the command being run on the node was successful or not.

We then check whether the exit code of the background process was zero (which would indicate success) or not. If not, we set the exit code of the function to non-zero also, and print a warning message.

Line 86: Exit the function

Here we exit the function, returning the exit code from the background processes, as generate in the loop.

Lines 90-95: Reading the bash profile

Lines 99-104: Module loads

These lines ensure that the proper modules are loaded.

We then load the desired compiler, and finally the hello-umd. We recommend that you always load the compiler module first and then any higher level applications. Many packages have different builds for different compilers, etc., and the module command is smart enough to load the correct versions of these. Note that we specify the versions; if you omitted the version the module command will usually try to the load most recent version installed.

Lines 111-113: Creating a working directory

Lines 116-124: Identifiying ourselves

Line 128:Invoke our launch_ssh_tasks function

This actually invokes the launch_ssh_tasks function that we defined back at the start of this script

We run the code so as to save the output in the file hello.out in our work directory. The > operator does output redirection, meaning that all of the standard output goes to the specified file (hello.out in this case). The >&1 operator causes the standard error output to be sent to the standard output stream (1 is the number for the standard output stream), and since standard output was redirected to the file, so will the standard error be.

Line 130: Store the error code

Line 142: Symlink the working directory

Lines 144-145: Say goodbye

Line 148: Exit

Line 149: Trailing blank line

We recommend that you get into the habit of leaving one or more blank lines at the end of your script. This is especially true if you write the scripts in Windows and then transfer to the cluster.

Running the examples

The easiest way to run this example is with the Job Composer of the OnDemand portal, using the HelloUMD-HybridSrun and HelloUMD-HybridSsh templates.

NOTE: In order to successfully run the ssh example, you must have previously enabled password-less ssh between the nodes of the cluster. Instructions for setting up password-less ssh between nodes of the cluster. This only needs to be done once, but the job will not successfully run without this.

To submit the examples from the command line, just

Download the required scripts.
- For the srun example, you need:
  - submit.sh
  - hello-wrapper.sh
- For the ssh example, you only need:
  - submit.sh
For the ssh example, you must enable (or have previously enabled) password-less ssh between the nodes of the cluster.
Run the command sbatch submit.sh. This will submit the job to the scheduler, and should return a message like Submitted batch job 23767 --- the number will vary (and is the job number for this job). The job number can be used to reference the job in Slurm, etc. (Please always give the job number(s) when requesting help about a job you submitted).

Whichever method you used for submission, the job will be queued for the debug partition and should run within 15 minutes or so. When it finishes running, the slurm-JOBNUMBER.out should contain the output from our diagnostic commands (time the job started, finished, module list, etc). The output of the hello-umd will be in the file hello.out in the directory from which you submitted the job. If you used OnDemand, these file will appear listed in the Folder contents section on the right.

The hello-umd file should look something like:

Starting task 0, using executable /software/spack-software/2020.05.14/linux-rhel7-broadwell/gcc-8.4.0/hello-umd-1.5-iimg3e3xjkofyozl4yda7b6mm7tshgvd/bin/hello-umd
Starting task 1, using executable /software/spack-software/2020.05.14/linux-rhel7-broadwell/gcc-8.4.0/hello-umd-1.5-iimg3e3xjkofyozl4yda7b6mm7tshgvd/bin/hello-umd
Starting task 2, using executable /software/spack-software/2020.05.14/linux-rhel7-broadwell/gcc-8.4.0/hello-umd-1.5-iimg3e3xjkofyozl4yda7b6mm7tshgvd/bin/hello-umd
hello-umd: Version 1.5
Built for compiler: gcc/8.4.0
hello-umd: Version 1.5
Built for compiler: gcc/8.4.0
'Hello from task 2' from thread 0 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 8 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 7 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 14 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 13 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 11 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 1 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 5 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 10 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 12 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 4 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 9 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 3 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 6 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
'Hello from task 2' from thread 2 of 15, task 0 of 1 (pid=46349 on host compute-10-1.juggernaut.umd.edu
hello-umd: Version 1.5
Built for compiler: gcc/8.4.0
'Hello from task 0' from thread 4 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 0 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 11 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 12 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 7 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 6 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 5 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 9 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 8 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 1 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 10 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 2 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 3 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 13 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 0' from thread 14 of 15, task 0 of 1 (pid=162280 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 3 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 12 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 9 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 0 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 1 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 13 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 7 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 2 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 14 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 8 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 10 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 5 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 6 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 4 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu
'Hello from task 1' from thread 11 of 15, task 0 of 1 (pid=162279 on host compute-10-0.juggernaut.umd.edu

The lines beginning Starting task ... are from the hello-wrapper.sh script, and so will only be present in the srun case. The rest of the file should be basically the same in both cases: a pair of lines for each of the 3 tasks identifying the hello-umd version and the compiler it was built with. Then there should be 45 lines (15 lines for each of the 3 tasks) identifying the task and thread.

Note that the identifying lines from hello-umd do not state an MPI library --- we are running the non-MPI version of hello-umd. Note also that for each of the lines identifying the thread, after listing the thread number it lists task 0 of 1 --- this is due to the fact that we are running a non-MPI version of hello-umd and so the hello-umd does not know about the other tasks and reports all of them as zero. For that reason, we modified the default message to have it identify the task.

For any particular task number (according to the modified hello message), the pid and the hostname should be the same, but the pids should be different for different task ids (and will likely not match the pids shown above). The hostnames should be different between tasks if run on the Deepthought2 cluster, but might not be on Juggernaut (since most Juggernaut nodes have at least 30 cores , and some have over 45, and so can fit 2 or even 3 of our 15-core tasks on a single node).

Non-MPI Distributed Memory Job Submission Examples

Case 1: Using the srun command

Case 2: Using the ssh command

Running the examples

Case 1: Using the `srun` command

Case 2: Using the `ssh` command