This page provides some examples of submitting a simple MPI job using both the OpenMPI and Intel MPI libraries. It is based on the Basic_MPI_Job and similar job templates in the OnDemand portal.
This job makes use of a simple
Hello World! program called hello-umd
available in the
UMD HPC cluster software library and which supports sequential, multithreaded,
and MPI modes of operation. The code simply prints an identifying message
from each thread of each task --- for this pure MPI case each task consists
of a single thread, so it will print a message from each MPI task.
These examples are being treated together because they are nearly identical.
The examples basically consists of a single file, the job script
submit.sh
(see for a listing and explanation
of the script) which gets submitted to the cluster via the
sbatch
command.
The scripts are designed to show many good practices; including:
Many of the practices above are rather overkill for such a simple job --- indeed, the vast majority of lines are for these "good practices" rather than the running of the intended code, but are included for educational purposes.
This code runs hello-umd
in
MPI
mode, saving the output
to a file in a job specific work directory (which the script creates).
A symbolic link to this work directory is put in the submission directory
to allow users of the OnDemand portal.
to easily access the work directory. We could have forgone all that and
simply have the output of hello-umd
go to
standard output,
which would be available in the slurm-JOBNUMBER.out
file (or whatever file you instructed Slurm to use instead). Doing such is
acceptable as long as the code is not producing an excessive amount (many
MBs) of output --- if the code produces a lot of output having it all sent
to Slurm output file can cause problems, and it is better to redirect to
a file.
We present two cases, one using the library with the GNU compiler suite, and the other using the Intel MPI library with the Intel compiler suite. Afterwards, we give a detailed, line-by-line commentary of the scripts..
As can be seen by the similarities in the two scripts, there is
little difference in the two cases. For the most part, the choice
of which MPI library to use will depend on the code being run ---
the mpirun
or similar command must
be from the same MPI library (down to the version of the MPI library
and even the version of the compiler used to compile the MPI library)
as the MPI application being used was built with. I.e., if your
application was built against the OpenMPI libraries, use the same
build of OpenMPI at runtime. Likewise, if your application was
built against the Intel MPI libraries, use the same build of the
Intel MPI libraries (i.e. the same Intel compiler) at runtime.
For codes built by system staff and made available via the
module
command, the module command will generally
enforce this. (You might be able to get around this, but
it will take some effort. Do not circumvent this.)
These sample jobs use hello-umd
, which has builds
with both Intel MPI and OpenMPI, and the module command automatically
select the appropriate version for you based on the previously loaded
compiler and MPI library.
For codes you are building yourself, the choice of compiler and MPI library to use in the build stage is generally up to you. You should look at the recommendations of the authors of the software for guidance. Very broadly speaking, the Intel compilers and Intel MPI generally will have the best optimizations on Intel processors, but these "bleeding edge" optimizations can sometimes problems. The GNU compilers and OpenMPI are likely to be better supported by most open-source packages, but are usually not quite as highly optimized.
We first look at the submission script for the case when using the GCC compiler suite and OpenMPI You can download the source code as plain text. We also present a copy with line numbers here for discussion:
Line# | Code |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We first look at the submission script for the case when using the Intel compiler suite and Intel MPI . You can download the source code as plain text. We also present a copy with line numbers here for discussion:
Line# | Code |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Like most of our examples, this shebang uses the /bin/bash
interpretter, which is the
bash (Bourne-again) shell.
This is a compatible replacement to and enhancement of the original Unix
Bourne shell.
You can opt to specify another shell or interpretter if you so desire,
common choices are:
/bin/sh
) in your shebang (note that this
basically just uses bash in a restricted mode)/bin/csh
or /bin/tcsh
)However, we recommend the use of the bash shell, as it has the support for scripting; this might not matter for most job submission scripts because of their simplicity, but might if you start to need more advanced features. The examples generally use the bash shell for this reason.
#SBATCH
are used to control
the Slurm scheduler and will be discussed elsewhere.
But other than the cases above, feel free to use comment lines to remind yourself (and maybe others reading your script) of what the script is doing.
#SBATCH
can be used to control
how the Slurm sbatch
command submits the job. Basically, any
command line flags can be provided witha #SBATCH
line in the
script, and you can mix and match command line options and options in
#SBATCH
. NOTE: any #SBATCH
must precede any "executable lines" in the script. It is recommended that
you have nothing but the shebang line, comments and blank lines before any
#SBATCH
lines.
--ntasks=1
or -n 1
) with one
CPU core
for each MPI task
(--cpus-per-task=1
or -c 1
).
Note that we do not specify a number of nodes, and we recommend that you do not for MPI jobs --- by default Slurm will allocate enough nodes to satisfy this job's needs, and if you specify a value which is incorrect it will only cause problems.
We choose 60 MPI tasks as this will require multiple nodes on both Deepthought2 and Juggernaut, so makes a better demonstration, but will still fit in the debug partition.
#SBATCH -t TIME
line sets the time limit for
the job. The requested TIME value can take any of a number of
formats, including:
It is important to set the time limit appropriately. It must be set longer than you expect the job to run, preferable with a modest cushion for error --- when the time limit is up, the job will be canceled.
You do not want to make the requested time excessive, either. Although you are only charged for the actual time used (i.e. if you requested 12 hours and the job finished in 11 hours, your job is only charged for 11 not 12 hours), there are other downsides of requesting too much wall time. Among them, the job may spend more time in the queue, and might not run at all if your account is low on funds (the scheduler will use the requested wall time to estimate the number of SUs the job will consume, and will not start a job if it and all currently running jobs are projected to have sufficient SUs to complete). And if it starts, and excessive walltime might block other jobs from running for a similar reason.
In general, you should estimate the maximum run time, and pad it by 10% or so.
In this case, the hello-umd
will run very quickly; much less
than 5 minutes.
There are several parameters you can give to Slurm/sbatch to specify the
memory to be allocated for the job. It is recommended that you always include
a memory request for your job --- if omitted it will default to 6GB per CPU
core. The recommended way to request memory is with the
--mem-per-cpu=N
flag. Here N is in MB.
This will request N MB of RAM for each CPU core allocated to the job.
Since you often wish to ensure each process in the job has sufficient memory,
this is usually the best way to do so.
An alternative is with the --mem=N
flag. This sets
the maximum memory use by node. Again, N is in MB. This
could be useful for single node jobs, especially multithreaded jobs, as there
is only a single node and threads generally share significant amounts of memory.
But for MPI jobs the --mem-per-cpu
flag is usually more
appropriate and convenient.
For MPI codes, we recommend using --mem-per-cpu
instead of
--mem
since you generally wish to ensure each MPI task has
sufficient memory.
The hello-umd
does not use much memory, so 1 GB per core
is plenty.
The lines SBATCH --share
, SBATCH --oversubscribe
,
or SBATCH --exclusive
decide whether or not other jobs are able to run on the same node(s) are
your job.
NOTE: The Slurm scheduler changed the name of the
flag for "shared" mode. The proper flag is now
#SBATCH --oversubscribe
. You must use the "oversubscribe"
flag on Juggernaut. You can currently use either form on Deepthought2, but
the "#SBATCH --share
form is deprecated and at some point will
no longer be supported. Both forms effectively do the same thing.
In exclusive mode, no other jobs are able to run on a node allocated to your job while your job is running. This greatly reduces the possibility of another job interfering with the running of your job. But if you are not using all of the resources of the node(s) your job is running on, it is also wasteful of resources. In exclusive mode, we charge your job for all of the cores on the nodes allocated to your job, regardless of whether you are using them or not.
In share/oversubscribe mode, other jobs (including those of other users) may run on the same node as your job as long as there are sufficient resources for both. We make efforts to try to prevent jobs from interfering with each other, but such methods are not perfect, so while the risk of interference is small, it is much greater risk in share mode than in exclusive mode. However, in share mode you are only charged for the requested number of cores (not all cores on the node unless you requested such), and your job might spend less time in the queue (since it can avail itself of nodes which are in use by other jobs but have some unallocated resources).
Our recommendation is that large (many-core/many-node) and/or long running jobs use exclusive mode, as the potential cost of adverse interence is greatest here. Plus large jobs tend to use most if not all cores of most of the nodes they run on, so the cost of exclusive mode tends to be less. Smaller jobs, and single core jobs in particular, generally benefit from share/oversubscribe mode, as they tend to less fully utiliize the nodes they run on (indeed, on a standard Deepthought2 node, a single core job will only use 5% of the CPU cores).
The default for the cluster is, unless you specify otherwise, to default single core jobs to share mode, and multicore/multinode jobs to exclusive mode. This is not an ideal choice, and might change in the future. We recommend that you always explicitly request either share/oversubscribe or exclusive as appropriate.
Again, as a multi-core job, #SBATCH --exclusive
is the default, but we recommend explicitly stating this.
For a simple job like this, the debug partition would be a good choice, and we use that on the Deepthought2 cluster. However, on the Juggernaut cluster the debug partition does not have access to the high performance (lustre) filesystem, which this job script uses for the working directory. So for the Juggernaut cluster we specify the debug partition (this is not really needed, as that is the default partition to use on Juggernaut anyway).
For real production work, the restrictions of the debug partition (the limited compute resources and maximum 15 minute walltime) will likely make using it unsuitable. So for production work, on both clusters, you will likely not be able to use the debug partition so it is probably best to just omit this line and let the scheduler default the partition apropriately for you.
sbatch
not to let the job process inherit the
environment of the process which invoked the sbatch
command. This requires the job script to explicitly set up its required
environment, as it can no longer depend on environmental settings you had
when you run the sbatch
command. While this may require a few
more lines in your script, it is a good practice and improves the
reproducibility of the job script --- without this it is possible the job
would only run correctly if you had a certain module loaded or variable set
when you submit the job.
module
command is available
in your script. They are generally only required if the shell specified
in the shebang line does not match your default login shell, in which
case the proper startup files likely did not get invoked.
The unalias
line is to ensure that there is no vestigal
tap
command. It is sometimes needed on RHEL6 systems,
should not be needed on the newer platforms but is harmless when not
needed. The remaining lines will read in the appropriate dot files for
the bash shell --- the if
, then
, elif
construct enables this script to work on both the Deepthought2 and
Juggernaut clusters, which have a slightly different name for the bash
startup file.
SLURM_EXPORT_ENV
to the value ALL
,
which causes the environment to be shared with other processes
spawned by Slurm commands (which also includes mpirun
and similar).
At first this might seem to contradict our recommendation to
use #SBATCH --export=NONE
, but it really does not.
The #SBATCH --export=NONE
setting will cause the
job script not to inherit the environment of
the shell in which you ran the sbatch
command.
But we are now in the job script, which because of the
--export=NONE
flag, has it's own environment which
was set up in the script. We want this environment to
be shared with other MPI tasks and processes spawned by this
job. These MPI tasks and processes will inherit the environment
set up in this script, not the environment from which the
sbatch
command ran.
This important for MPI jobs like this, because otherwise the
mpirun
code might not spawn properly.
We recommend that you always load the compiler module first, then the MPI library if needed, and then any higher level modules for applications, etc. Many packages have different builds for different compilers, MPI libraries, etc., and the module command is smart enough to load the correct versions of these if you load the modules in the aforementioned order.
For codes using the OpenMPI library, you should module load the compiler,
and then the apropriate OpenMPI library, and then your application
(hello-umd
in this case).
For codes using the Intel MPI library, the environment for this is set up automatically when you load the Intel compiler suite. Thus in these cases you do not need to explicitly module load an MPI library.
We recommend that you always specify the specific version you want in your job scripts --- this makes your job more reproducible. Systems staff may add newer versions of existing packages without notification, and if you do not specify a version, the default version may change without your expecting it. In particular, a job that runs fine today using today's default version might crash unexpectedly when you try running it again in six months because the packages it uses were updated and your inputs are not compatible with the new version of the code.
/tmp
is specific to a single node,
so that is usually not suitable for MPI jobs. The lustre filesystem is
accessible by all of the compute nodes of cluster, so it is a good choice
for MPI jobs.
The TMPWORKDIR="/lust/$USER/ood-job.${SLURM_JOBID}">
line
defines an environmental variable containing the name of our chosen work
directory. The ${SLURM_JOBID}
references another environmental
variable which is automatically set by Slurm (when the job starts to run) to
the job number for this job --- using this in our work directory names
helps ensure it will not conflict with any other job. The
mkdir
command creates this work directory, and the
cd
changes our working directory to that directory---
note in those last commands the use of the environmental variable we just
created to hold the directory name.
SLURM_JOBID
, SLURM_NTASKS
,
SLURM_JOB_NUM_NODES
, and SLURM_JOB_NODELIST
which are set by Slurm
at the start of the job to list the job number, the number of MPI tasks, the number of nodes,
and the names of the nodes allocated to the job. It also prints the time and date that
the job started (the date
command), the working directory (the
pwd
command), and the list of loaded modules (the module list
command). Although you are probably not interested in any of that information if the
job runs as expected, they can often be helpful in diagnosing why things did not work
as expected.
Because this sets an OpenMPI parameter, it is only relevant for job scripts using the OpenMPI libaries.
hello-umd
command,
and stores it in an environmental variable named MYEXE
, and then
outputs the path for added diagnostics. We find that MPI jobs run better when
you provide the absolute path to the executable to the mpirun
or
similar command.
mpirun
command with the absolute
path to our hello-umd
executable as the argument. Each MPI
task is running hello-umd
in a single-thread, so no arguments
are needed to the hello-umd
command. (If you needed to pass
arguments, they would go after the path to your executable in the
mpirun
line.
We run the code so as to save the output in the file
hello.out
in the current working directory.
The >
operator does output redirection, meaning that all
of the standard output goes to the specified file
(hello.out
in this case). The >&1
operator
causes the standard error output to be sent to the standard output stream
(1 is the number for the standard output stream), and since standard output
was redirected to the file, so will the standard error be.
For this simple case, we could have omitted the redirection of standard
output and standard error, in which case any such output would end up in the
Slurm output file (usually named slurm-JOBNUMBER.out
.
However, if your job produces a lot (many MBs) of output to standard
output/standard error, this can be problematic. It is good practice
to redirect output if you know your code will produce more than 1000 or so
lines of output.
The special shell variable $?
stores the exit code from the last command.
Normally it will be 0 if the command was successful, and non-zero otherwise. But it only
works for the last command, so we save it in the variable ECODE
.
ECODE
,
and then prints the date/time of completion using the date
command
ECODE
. This
means that the script will have the same exit code as the application, which will allow
Slurm to better determine if the job was successful or not. (If we did not do this, the
error code of the script will be the error code of the last command that ran, in this
case the date
command which should never fail. So even if your application
aborted, the script would return a successful (0) error code, and Slurm would think the
job succeeded if this line was omitted).
The reason for this is that if the last line does not have the proper line termination character, it will be ignored by the shell. Over the years, we have had many users confused as to why there job ended as soon as it started without error, etc. --- it turns out the last line of their script was the line which actually ran their code, and it was missing the correct line termination character. Therefore, the job ran, did some initialization and module loads, and exited without running the command they were most interested in because of a missing line termination character (which can be easily overlooked).
This problem most frequently occurs when transferring files between Unix/Linux and Windows operating systems. While there are utilities that can add the correct line termination characters, the easy solution in my opinion is to just add one or more blank lines at the end of your script --- if the shell ignores the blank lines, you do not care.
The easiest way to run this example is with the
Job Composer
of the OnDemand portal,
using the HelloUMD-MPI_gcc_openmpi
template for
the GNU compiler suite and OpenMPI library, or using the
HelloUMD-MPI_intel_intelmpi
template for the
Intel compiler suite and Intel MPI library case..
To submit from the command line, just
sbatch submit.sh
. This will submit the job
to the scheduler, and should return a message like
Submitted batch job 23767
--- the number will vary (and is the
job number for this job). The job number can be used to reference
the job in Slurm, etc. (Please always give the job number(s) when requesting
help about a job you submitted).
Whichever method you used for submission, the job will be queued for the
debug partition and should run within 15 minutes or so. When it finishes
running, the slurm-JOBNUMBER.out
should contain
the output from our diagnostic commands (time the job started, finished,
module list, etc). The output of the hello-umd
will be in
the file hello.out
in the job specific work directory
created in your lustre directory. For the convenience of users of the
OnDemand portal, a symlink to this directory is created in the submission
directory. So if you used OnDemand, a symlink to the work directory will
appear in the Folder contents
section on the right.
The slurm-JOBNUMBER.out
file will resemble
(from an Intel MPI example):
Slurm job 23868 running on
compute-10-0.juggernaut.umd.edu
To run on 60 CPU cores across 2 nodes
All nodes: compute-10-0
Thu Mar 11 13:29:20 EST 2021
/lustre/jn10/payerle/ood-job.23868
Loaded modules are:
Currently Loaded Modulefiles:
1) hpcc/juggernaut
2) intel/2020.1
3) hello-umd/1.5/intel/2020.1/intelmpi/broadwell(default)
Job will be started out of /lustre/jn10/payerle/ood-job.23868
Most of the details in your file will be different than in the example above, but you should get the drift.
The output in the hello.out
file will resemble
(from an OpenMPI example):
Hello UMD from thread 0 of 1, task 3 of 60 (pid=87441 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 9 of 60 (pid=87447 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 1 of 60 (pid=87439 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 10 of 60 (pid=87448 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 15 of 60 (pid=87454 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 24 of 60 (pid=87464 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 26 of 60 (pid=87466 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 27 of 60 (pid=87467 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 29 of 60 (pid=87470 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 7 of 60 (pid=87445 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 8 of 60 (pid=87446 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 2 of 60 (pid=87440 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 12 of 60 (pid=87450 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 13 of 60 (pid=87451 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 19 of 60 (pid=87458 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 20 of 60 (pid=87459 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 21 of 60 (pid=87461 on host compute-10-0.juggernaut.umd.edu
hello-umd: Version 1.5
Built for compiler: intel/20.0.1
with MPI support( usgin MPI library intel-parallel-studio/cluster.2020.1)
Hello UMD from thread 0 of 1, task 17 of 60 (pid=87456 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 11 of 60 (pid=87449 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 25 of 60 (pid=87465 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 0 of 60 (pid=87438 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 23 of 60 (pid=87463 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 28 of 60 (pid=87468 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 5 of 60 (pid=87443 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 16 of 60 (pid=87455 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 18 of 60 (pid=87457 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 22 of 60 (pid=87462 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 4 of 60 (pid=87442 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 6 of 60 (pid=87444 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 14 of 60 (pid=87453 on host compute-10-0.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 35 of 60 (pid=269472 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 42 of 60 (pid=269479 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 44 of 60 (pid=269481 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 40 of 60 (pid=269477 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 31 of 60 (pid=269468 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 56 of 60 (pid=269493 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 57 of 60 (pid=269494 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 59 of 60 (pid=269496 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 37 of 60 (pid=269474 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 49 of 60 (pid=269486 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 36 of 60 (pid=269473 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 53 of 60 (pid=269490 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 58 of 60 (pid=269495 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 43 of 60 (pid=269480 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 48 of 60 (pid=269485 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 55 of 60 (pid=269492 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 50 of 60 (pid=269487 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 46 of 60 (pid=269483 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 52 of 60 (pid=269489 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 47 of 60 (pid=269484 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 41 of 60 (pid=269478 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 34 of 60 (pid=269471 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 39 of 60 (pid=269476 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 33 of 60 (pid=269470 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 38 of 60 (pid=269475 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 45 of 60 (pid=269482 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 51 of 60 (pid=269488 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 54 of 60 (pid=269491 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 32 of 60 (pid=269469 on host compute-10-1.juggernaut.umd.edu
Hello UMD from thread 0 of 1, task 30 of 60 (pid=269467 on host compute-10-1.juggernaut.umd.edu
Basically, you should see a message from each task 0 to 59, all from thread 0 of 1 (since this is a pure MPI code), in some random order. The identifying comments (version number, compiler and MPI library built for) will appear somewhere in the mix. Because everything is running in parallel, the order will not be constant. Note that the tasks will be divided across multiple nodes (in this case compute-10-0 and compute-10-1). On Juggernaut, the 60 cores will require two nodes, and on Deepthought2 it would require some three nodes.