Matlab
Contents
- Summary and Version Information
- Running a MATLAB script from the command line
- MATLAB and HPC
- Built-in multithreaded functions
- MATLAB Parallel Computing Toolbox
- MATLAB Parallel Server/Distributed Computing Server
- Installing add-ons/packages/etc
- External resources
Summary and Version Information
Package | Matlab |
---|---|
Description | Matlab |
Categories | Numerical Analysis |
Version | Module tag | Availability* | GPU Ready |
Notes |
---|---|---|---|---|
2009b | matlab/2009b | Non-HPC Glue systems All OSes |
Y | |
2010b | matlab/2010b | Non-HPC Glue systems Evergreen HPCC Linux |
Y | |
2011a | matlab/2011a | Non-HPC Glue systems Bswift HPCC Linux |
Y | |
2011b | matlab/2011b | Non-HPC Glue systems Evergreen HPCC Bswift HPCC Linux |
Y | |
2012b | matlab/2012b | Non-HPC Glue systems Bswift HPCC Linux |
Y | |
2013b | matlab/2013b | Non-HPC Glue systems RedHat6 |
Y | |
2014a | matlab/2014a | Non-HPC Glue systems RedHat6 |
Y | |
2014b | matlab/2014b | Non-HPC Glue systems Deepthought HPCC Bswift HPCC Deepthought2 HPCC RedHat6 |
Y | |
2015a | matlab/2015a | Non-HPC Glue systems RedHat6 |
Y | |
2015b | matlab/2015b | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6 |
Y | |
2016a | matlab/2016a | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6 |
Y | |
2016b | matlab/2016b | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6 |
Y | |
2017a | matlab/2017a | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6 |
Y | |
2018a | matlab/2018a | Non-HPC Glue systems RedHat6 |
Y | |
2018b | matlab/2018b | Non-HPC Glue systems RedHat6 |
Y | |
2019a | matlab/2019a | Non-HPC Glue systems RedHat6 |
Y | |
2019b | matlab/2019b | Non-HPC Glue systems RedHat6 |
Y |
Notes:
*: Packages labelled as "available" on an HPC cluster means
that it can be used on the compute nodes of that cluster. Even software
not listed as available on an HPC cluster is generally available on the
login nodes of the cluster (assuming it is available for the
appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters).
This is due to the fact that the compute nodes do not use AFS and so have
copies of the AFS software tree, and so we only install packages as requested.
Contact us if you need a version
listed as not available on one of the clusters.
In general, you need to prepare your Unix environment to be able to use this software. To do this, either:
tap TAPFOO
module load MODFOO
where TAPFOO and MODFOO are one of the tags in the tap
and module columns above, respectively. The tap
command will
print a short usage text (use -q
to supress this, this is needed
in startup dot files); you can get a similar text with
module help MODFOO
. For more information on
the tap and module commands.
For packages which are libraries which other codes get built against, see the section on compiling codes for more help.
Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.
In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.
Running a MATLAB script from the command line
While most people use MATLAB interactively, there are times when you might wish to run a MATLAB script from the command line. Or from within a shell script. Usually in this situation, you have a file containing MATLAB commands, one command per line, and you want to start up MATLAB, run the commands in that file, and save the output to another file, and you do not want the MATLAB GUI starting up (often times, the process will be running in a fashion where there might not be a screen readily available to display the GUI stuff).
|
If you are running Matlab jobs on one of the Deepthought or Juggernaut high-performance
computing clusters, please include a
#SBATCH -L matlab
directive near the top of your job script. This is because we have been
having issues with HPC users depleting the campus Matlab license pool.
The above directive will ask Slurm for a matlab license, which will be used
to throttle the number of simultaneous Matlab jobs running on the clusters.
If all the matlab users on the cluster abide by this policy, hopefully
there will be no more issues with license depletion. If such an issue
occurs, we will regrettably have to kill some matlab jobs (starting with
those that did NOT request a license) to free up licenses.
We are hoping in the next several months to obtain a truly unlimited matlab
license on campus, but until then we ask that HPC users include the above
directive in their matlab jobs.
|
This can be broken down into several distinct parts:
- Get MATLAB to run without the GUI, etc.
- Get MATLAB to start running your script, and exit when your script is done.
- Get the output of the MATLAB command saved to a file.
The first part is handled with the following options to be passed to the
MATLAB command: -nodisplay
and -nosplash
. The
first disables the GUI, the latter disables the MATLAB splash screen that
gets displayed before the GUI starts up.
The second step is handled using the -r
option, which specifies
a command which MATLAB should run when it starts up. You can give it
any valid MATLAB command, but typically you just want to tell it to read
commands from your file. And then you want to tell it to exit; otherwise
it will just sit at the prompt waiting for additional commands. One reason
to keep it simple like that is that the command string has to be quoted to
keep the Unix shell from interpretting it, and that can get tricky for
complicated commands.
Typically, you would give an argument like
matlab -r "run('./myscript.m'); exit"
(and you would include the -nodisplay
and -nosplash
arguments before the -r
if you wanted to disable the GUI as
well); where myscript.m
is your script file, and is located
in the current working directory. The exit
causes MATLAB to
exit once the script completes.
The third part is handled with standard Unix file redirection.
Putting it all together, if you had a script myscript.m
in the directory ~/my-matlab-stuff
, and you want to run it
from a shell script putting the output in myscript.out
in the
same directory, you could do something like
#!/bin/tcsh
module load matlab
cd ~/my-matlab-stuff
matlab -nodisplay -nosplash -r "run('~/myscript.m'); exit" > ./myscript.out
MATLAB and HPC
Mathworks currently provides two products to help with parallelization:
- Parallel Computing Toolkit (PCT): This provides support for
parallel for loops (the
parfor
command), as well some CUDA support for using GPUs. However, without the MATLAB Parallel Server (formerly named Distributed Compute Server, there are limits on the number of workers that can be created, as well as that all workers must be on the same node. - MATLAB Parallel Server (known as Distributed Computing Server (MDCS) prior to 2019): This extends MATLAB desktop workflows to the cluster hardware, and allows you to submit MATLAB jobs to the cluster without having to learn anything about the cluster command line interface.
In addition, some of the built-in linear algebra and numerical functions are multithreaded as well.
|
If you are running Matlab jobs on one of the Deepthought or Juggernaut high-performance
computing clusters, please include a
#SBATCH -L matlab
directive near the top of your job script. (This is NOT needed for Matlab
DSC jobs). This is because we have been
having issues with HPC users depleting the campus Matlab license pool.
The above directive will ask Slurm for a matlab license, which will be used
to throttle the number of simultaneous Matlab jobs running on the clusters.
If all the matlab users on the cluster abide by this policy, hopefully
there will be no more issues with license depletion. If such an issue
occurs, we will regrettably have to kill some matlab jobs (starting with
those that did NOT request a license) to free up licenses.
We are hoping in the next several months to obtain a truly unlimited matlab
license on campus, but until then we ask that HPC users include the above
directive in their matlab jobs.
|
Built-in multithreaded functions
A number of the Matlab built-in functions, especially linear algebra and numerical functions, are multithreaded and will automatically parallelize in that way.
This parallelization is shared memory, via threads, and so is restricted to within a single compute node. So normally your job submission scripts should explicitly specify that you want all your cores on a single node.
For example, if your matlab code is in the file myjob.m
, you might
use a job submissions script like:
#!/bin/bash
#SBATCH -t 2:00
#SBATCH -N 1
#SBATCH -n 12
#SBATCH -mem-per-cpu 1024
#SBATCH -L matlab
. ~/.profile
module load matlab
matlab -nodisplay -nosplash -r "run('myjob.m'); exit" > myjob.out
and your matlab script should contain the line
maxNumCompThreads(12);
MATLAB Parallel Toolbox
The MATLAB Parallel Toolbox allows you to parallelize your MATLAB
jobs, to take advantage of multiple CPUs on either your desktop or on an
HPC cluster. This toolbox provides parallel-optimized built-in MATLAB
functions, including the parfor
parallel loop command.
A simple example matlab script would be
% Allocate a pool
% We use the default pool, which will consist of all cores on your current
% node (up to 12 for MATLABs before R2014a)
parpool
% For MATLAB versions before R2013b, use "matlabpool open"
%Pre-allocate a vector
A = zeros(1,100000)
xfactor = 1/100;
% Assign values in a parallel for loop
parfor i = 1:length(A)
A(i) = xfactor*i*sin(xfactor*i);
end
Assuming the above MATLAB script is in a file ptest1.m
in
the directory /lustre/payerle/matlab-tests
, we can submit it
with the following script to sbatch
:
#!/bin/tcsh
#SBATCH -n 20
#SBATCH -N 1
#SBATCH -L matlab
module load matlab
matlab -nodisplay -nosplash \\
-r "run('/lustre/payerle/matlab-tests/ptest1.m'); exit" \\
> /lustre/payerle/matlab-tests/ptest1.out
You would probably want to add directives to specify other job submission paremeters, including
NOTE: It is important that you specify a single node in all of the above, as without using Matlab Parallel Server/Distributed Computing Server the parallelization above is restricted to a single node.
MATLAB Parallel Server/Distributed Computing Server
The MATLAB Parallel Server (known as Distributed Computing Server (MDCS) before 2019) allows you to extend your MATLAB workflows from your desktop to an High Performance Computing (HPC) cluster without having to learn the details of submitting jobs to the cluster. This tool allows you to run Matlab on your desktop workstation and submit jobs from that Matlab session to the Deepthought2 or Juggernaut HPC cluster to run on its compute nodes, thereby reducing the need to directly interact with the Unix environment on the HPC clusters. Parallel Server/MDCS works with the Parallel Computing Toolbox discussed above, and extends the functionality to allow for jobs spanning multiple compute nodes.
NOTE: The MATLAB Parallel Server/Distributed Computing Server is currently only available on the Deepthought2 and Juggernaut clusters. It is NOT currently available on MARCC/Bluecrab
This section is divided into several subsections:
- Instructions on configuring Parallel Server/MDCS
- Instructions on using Parallel Server/MDCS
- Links to additional information
Instructions on configuring MATLAB Parallel Server/Distributed Computing Server (MDCS)
Because Parallel Server/MDCS allows for one to submit jobs on the UMD HPC clusters right from your desktop workstation, some configuration of Matlab on your workstation is required. This section will discuss the configuration process, starting with the UMD provided configuration scripts and profiles, and then a more general discussion.
This configuration will need to be done once on your workstation before you can use Parallel Server/MDCS. If you are running Parallel Server/MDCS from multiple workstations, you will need to perform this configuration once per workstation. You will also need to redo the configuration if you upgrade the version of Matlab on your workstation. But otherwise, it is an one-time configuration process (although it does not hurt to redo the configuration process).
In all cases, note that you MUST be running the same release of Matlab on your desktop and on the cluster. We have over half a dozen Matlab versions available on the cluster, and generally update at least once a year. If you need a newer version than is available on the cluster, please contact us and we will work on adding the newer version to the cluster. If you have an old version that is not supported on the cluster, please upgrade the version on your workstation.
Configuring Parallel Server/MDCS using UMD configuration scripts/profiles
The UMD setup for Parallel Server/MDCS is distributed in two parts. The main part is a zip file or tarball containing the actual scripts to integrate Parallel Server/MDCS with the Slurm scheduler on HPC cluster. Although these are mostly independent of the version of Matlab that is being run, there was a change in Matlab between versions R2016b and R2017a, and so there are two sets of files for before (versions 1.x) and after (versions 2.x) this change. Also, some changes were made in R2019a that requires use of version 2.1.0 for R2019a and later version (version 2.1.0 is backwards compatible with R2017a-R2018b). In each case (R2016b or before, or R2017a or later), we provide both zip and tar files--- both files contain the same scripts, and you only need one; the two formats are just provided for your convenience. Windows users will likely prefer zip files; Unix users will likely prefer tarballs. The second part is a small file containing the profile settings --- this is dependent on the Matlab version and cluster you wish to connect to.
A note on versioning: Over the years, there have been a number
of tweaks to these scripts. The original version did not have
any version number associated with it, which could lead to
confusion as the settings files sometimes require a specific
version of the scripts. Starting with version 1.1.0 of the
scripts, we have added a text file called
README.Glue.MDCS-Slurm-Integration
containing
some information about the scripts and a version number.
Currently we have the following versions (and release dates):
- Version 0.1: Released Fall 2014. Oldest version, no longer supported.
- Version 1.1.0: Released 30 Mar 2018. Deprecated.
- Version 1.2.0: Released 20 Dec 2018. Supports Matlab version up to R2016b
- Version 2.0.0: Released 20 Dec 2018. Deprecated. Supports Matlab version R2017a through 2018b
- Version 2.1.0: Released 26 Jun 2019. Supports Matlab version R2017a and later (tarball/zipfile renamed from
umd_deepthought2
toumd_mdcs)
The current versions of the scripts files are listed below.
Note: The *.zip
and *.tar.gz
files
have the same contents; you generally should just download the
one which is most convenient for your system. Windows users
will likely want the zip file, Unix users likely the tarball.
- Version 1.2.0: For Matlab versions R2014a-R2016b: zip format tarball
- Version 2.1.0: For Matlab version R2017a or newer zip format tarball
You will need to unzip/untar the contents of the above integration
scripts zipfile/tarball into a directory in your Matlab path on your
workstation. To find out what directories are in your Matlab user path,
you can issue the command userpath
from within Matlab.
Typically, this will be one of:
My Documents\MATLAB
orDocuments\MATLAB
on Windows systems, or~/Documents/MATLAB
or$matlab/toolbox/local
on Linux systems.
If you are reinstalling the scripts (e.g. you are installing a newer
version of the scripts and/or upgraded Matlab version), please delete
(or at least rename) all of the old versions. You should check all
of the directories in the userpath
above. In particular,
make sure that you delete:
configCluster.m
ClusterInfo.m
- The
+profiles/+umd/+deepthought2
and+profiles\+umd\+deepthought2
directories (unless you added other profiles for other clusters, it should be safe to just delete the whole+profiles
directory tree). - The
UMD-Slurm-Integration-Scripts
directory.
To confirm that you successfully deleted the old version, you can restart Matlab and
make sure that the command configCluster
returns an Undefined function or variable
error. You should also find the Create and Manage Clusters
menu entry under the Parallel
menu bar and delete any old UMD cluster profiles (you should NOT delete the local
or
Matlab Parallel Cloud
profiles if present).
Once you determine which directory in your Matlab userpath you wish
to use, you should unzip/untar the zipfile/tarball into that directory.
The configCluster.m
should be at the top level of that
directory. You will also need to place a profile settings file in
this same directory. For version 2.x of the scripts, there is a
directory UMD-Slurm-Integration-Scripts
placed in this
directory as well. If you really wish to, you can move that directory
elsewhere (but then you will need to enter the path to that new
directory in the configCluster
script discussed below).
In addition to the integration scripts in the zipfile or tarball above, you will need a profile settings file. You should choose the version that matches the release of Matlab you are running on your workstation. If the Matlab release on your workstation is newer than any releases listed below, please contact us and we will work to install the new release of Matlab on the cluster. If the version of Matlab you are using is not listed below, but a newer one is, then please upgrade your Matlab. (Depending on your browser, you probably need to do something like right click on the link and use "Save link as ..." to save these to a file):
- For Matlab R2019b on Deepthought2
- For Matlab R2019b on Juggernaut
- For Matlab R2019a on Deepthought2
- For Matlab R2019a on Juggernaut
- For Matlab R2018b on Deepthought2
- For Matlab R2018b on Juggernaut
- For Matlab R2018a on Deepthought2
- For Matlab R2018a on Juggernaut
- For Matlab R2017a on Deepthought2
- For Matlab R2016b on Deepthought2 (Deprecated)
- For Matlab R2016a on Deepthought2 (Deprecated)
- For Matlab R2015b on Deepthought2 (Deprecated)
- For Matlab R2014a on Deepthought2 (Deprecated)
NOTE:The settings files for Matlab releases R2014a through R2016b require 1.x versions of the integration scripts zipfile/tarball.
Once you have downloaded one or more of the settings files above, you need to
place them in the same directory you unzipped/untarred the integration scripts.
I.e., your deepthought2_remote_r20*.settings
or
juggernaut_remote_r20*.settings
files must be in the same
directory as the configCluster.m
script.
|
NOTE: There was an issue in the MDSC-Slurm integration scripts which manifested itself in
the 2016a, 2016b and 2017a releases of Matlab. If you are trying to use Matlab Parallel Server/MDCS with
Matlab version 2016a or newer, you will need to use a newer version of the scripts in the
umd_deepthought2.tar.gz or umd_deepthought2.zip files (version 1.1.0/30 March 2018 or later).
This is an insidious issue: with the older versions of the integration scripts, a multicore job will be submitted to the HPC cluster, but Matlab will only avail itself of a single core. Thus without the newer, fixed, version, your Matlab job will run but poorly and waste CPU and SUs. If in doubt, please upgrade. You do NOT need to update the profile settings files (they are not impacted by this issue). |
If you upgrade the version of Matlab on your workstation, you will need to fetch the corresponding settings file for the new Matlab version. If you are upgrading from a Matlab version before R2017a to R2017a or later, you will need to delete the integration scripts and install the new version of the integration scripts from the zip file/tarball as described above. If you are only upgrading from and to versions older than 2017a, or from and to versions 2017a or newer, then reinstalling the integration scripts from the zipfile/tarball is probably not strictly needed. However, if a newer version of the integration scripts is available, it is recommended that you update.
Once the settings file and the files from the zip file/tarball are installed, you
need to configure things. The configCluster.m
script, included in the
zipfile/tarball, tries to make this easier. It will prompt you for various configuration
parameters (e.g. how to authenticate to the cluster) and save these to make things easier
when you start using Parallel Server/MDCS. You should only need to run this script once per workstation
the first time you wish to use Parallel Server/MDCS with that version of Matlab for a specific cluster. (If you upgrade the
Matlab version on your workstation, you will need to run configCluster
again.)
However, it is safe to run this multiple times if you wish to change parameters; only note
that any settings you made in ClusterInfo will be lost if you rerun the configCluster
command.
- From the Matlab command prompt, type
configCluster
. This will print the version of the configCluster command being used, and search for all profiles matching the version of Matlab you are running. If there are multiple matching profiles (i.e. you have profiles for multiple clusters), it will prompt you as to which one to use. If only one profile is found, it assumes that you want to use it. If for some reason it cannot find any profiles, it will list all the appropriate settings files it can find and ask your help in choosing one. But usually this means you forgot to install the correct settings file. - The version of
configCluster
for Matlab R2017a and later (version 2.x of the configCluster.m code) will then ask for the location of the Slurm integration scripts. Unless you moved theUMD-Slurm-Integration-Scripts
subdirectory after unzipping or untarring the zipfile/tarball, you can just hit return for the default. If you did move the directory, give the path that you moved it to. - Next, the script will prompt for your username on the cluster. Enter your username. Remember that your username is all lowercase.
- Next, the script will ask how you wish to authenticate. There are three options:
password
: If you select this, the first time you use Parallel Server/MDCS in any Matlab session, Matlab will prompt you to enter your password on the cluster. (It will remember the password for the remainder of that Matlab session, so you only need to enter once per session). This is probably the simplest option, and it is recommended for beginning users.identity
: If you select this, you will use an RSA public key authentication. This requires additional setup on both your workstation and the cluster to use. TheconfigCluster
will ask some more questions if this option is chosen. Although additional setup is required, once done, it makes passwordless authentication between your workstation and the cluster possible, and is possibly the most convenient option for advanced users.ask
: If you select this, the choice between password and identity file authentication is deferred until you actually use Parallel Server/MDCS. This mimics the behavior of older versions of the integration scripts, and will prompt if you want to use an identity file whenever you use Parallel Server/MDCS. I only recommend this option if you are experimenting with setting up the identity key authentication but are not sure if you know how to do it properly, as this does not lock you in. Once you figure out how to do it, I recommend rerunningconfigCluster
and chosingidentity
.
- If you selected to use
identity
file based authentication, you will need to have setup a private-public key pair and enable SSH public-key authentication to the HPC cluster using the private key created. TheconfigCluster
will ask you at this time for the path to the private key "identity file" corresponding to the public key you authorized on the HPC cluster. It will also ask you if the private key file is passphrase encrypted. Encrypting the private key is strongly recommended for security reasons --- otherwise anyone who gets access to the private key file can access the cluster as you. If you encrypted the private key, type 'y' and you will be prompted by Matlab for the passphrase once per Matlab session. Otherwise type 'n'. - The script will then proceed to create a profile named
Deepthought2 Remote MATLAB_VERSION
, (orJuggernaut Remove MATLAB_VERSION
for the Juggernaut cluster) where MATLAB_VERSION will reflect the version of Matlab being run (e.g.R2018b
), and make it the default profile. If a profile of that name existed previously, it will be overwritten. It also resets theClusterInfo
structure, setting values as appropriate based on the responses you gave.
To make full use of Matlab DCS, you might also need to open some ports in your workstation firewall. The parpool and related functionality need to be able to communicate with your workstation. The range of ports required to be opened for this to work is complicated, but typically I find enabling the TCP ports 27370 through 27470 should work for most if not all Matlab versions. These ports should be allowed for the subnet containing the Deepthought2 compute nodes (10.103.128.0/19).
You should now be ready to use Matlab DCS. You can open Create and Manage Clusters
in the Parallel
dropdown and see your new profile, and you can "validate" the profile
if you wish. Note that validating the profile submits several jobs to the cluster, requesting up
to the number of workers you selected on the validation page (in newer Matlabs; older versions do
not give this option) or if not selected, the number of workers available to the cluster (NumWorkers
)
in the profile definition. As this number does not appear to limit the number of workers you can
use in real production work, we normally set to 20 so that you validation does not wait forever
in the queue. But be aware that even so it can take a while for all the validation jobs to run.
Manually configuring Parallel Server/MDCS
The Division of Information Technology and the High Performance Computing group do not officially support manual configuration of Parallel Server/MDCS. Only the above scripts are officially supported. However, we provide below some basic pointers should you want to tweak things. We even limit that to Matlab versions R2017a and later (earlier versions use a different profile structure which we do not cover).
There are basically two parts to the configuration, somewhat distinct but
with many interconnections. The first part are a bunch of scripts (mostly
Matlab *.m
scripts, but a couple of Bourne shell scripts), basically
everything in the umd_deepthought2.*.zip
or equivalent tarball
except for configCluster.m
. The Matlab scripts mostly define functions
that are needed by the DCS and Parallel Computing Toolbox codes for interacting with
the batch scheduler, transferring files, etc. There also are basic Bourne shell
scripts which are used for submission to the Slurm sbatch
command.
We use tweaked versions of the
standard Slurm integration scripts from MathWorks. We use the nonshared
variant since we do not expect workstations to have access to the standard Slurm
commands or a shared filesystem with the cluster. The tweaks mostly apply to
having the ClusterInfo
structure save information about the desired
authentication parameters to make things more user friendly.
The various scripts are described in MathWorks documentation, but basically include:
ClusterInfo.m
: this defines a structure allowing one to pass additional parameters for jobs (like allocation account, walltime, etc).communicatingJobWrapper.sh
: this is the job script used for parallel jobsindependentJobWrapper.sh
: this is the job script used for serial jobscancelJobFcn.m
: defines function to cancel a Slurm jobdeleteJobFcn.m
: defines function to delete a Slurm jobgetJobStateFcn.m
: defines function to determine state of a Slurm jobcommunicatingSubmitFcn.m
: defines function to submit a parallel jobindependentSubmitFcn.m
: defines function to submit a serial jobextractJobId.m
: defines function to extract job ids from Slurm outputcreateSubmitScript.m
: defines function to create job scriptsgetSubmitString.m
: defines function to generate the actual sbatch command and argumentsgetCommonSubmitArgs.m
: defines function to generate some of the arguments to sbatchgetRemoteConnection.m
: defines function to get connection to the cluster
The ClusterInfo.m
script must reside somewhere in your Matlab userpath.
The others all go in a directory of your choosing, but which is referenced in the profile.
You probably will need to take either the UMD or the Mathworks variants of the above.
The other part of the configuration is the actual cluster profile. Again, we use a generic scheduler profile because it is expected that the Slurm commands are not available on your workstation. The MathWorks documentation covers the configuration of the profile in more detail, but basically:
JobStorageLocation
: this is the path on the local system where job data should be stored.NumWorkers
: I believe that this is simply the default number of workers to use when validating the profile. I generally use something like 20 to minimize the resources requested during validation (so that validation does not take forever).ClusterMatlabRoot
: this defines the root of the Matlab installation on the cluster. I.e., it should be the value of the environmental variableMATLAB_ROOT
on the cluster after you runmodule load matlab/MATLAB_VERSION
. Remember that you must use the same version of Matlab as is running on your workstation.RequiresMathWorksHostedLicensing
: should be false on our clusters.LicenseNumber
: leave unset on our clusters.OperatingSystem
: should be set to 'unix' (the OS running on the Cluster, not your workstation)HasSharedFilesystem
: should be false, as your workstation does not share a filesystem with the clusterIntegrationScriptsLocation
: this defines where the integration scripts above are locatedAdditionalProperties
: this defines a structure with additional properties for the scheduler. It must define:ClusterHost
: the login host for the cluster, e.g.login.deepthought2.umd.edu
RemoteJobStorageLocation
: this is the path on the cluster where job data should be stored. You need to have write access to the directory specified. We normally use something under your lustre directory.
The following URLs contain additional information you might find useful:
- The Slurm integration scripts from Mathworks
- Mathworks documentation on the integration scripts
- Mathworks documentation on generic profiles
Using Matlab DCS with the DT2 Cluster
The following is a quick guide to using Matlab DCS to submit jobs to the
DT2 cluster. It is assumed that you have already done the setup scripts
above on your workstation at some point (downloading and extracting the
integration scripts to the appropriate location, downloading the correct
profile to the appropriate location, and running configCluster
).
These steps should only need to be done once per user per workstation
(although you will need to repeat at least getting the new profile and
rerunning configCluster
if/when you upgrade Matlab).
- First, you need to define a "cluster" to submit jobs to. This holds the information about the parallel workers, etc. For most cases, it will suffice to enter a command like:
- You will generally need to define additional parameters (e.g.
WallTime
, etc) needed for your job. This is done via theClusterInfo
object in Matlab and is discussed further below. The settings are "sticky", so you probably will not need to update them very often. For this simple example, the default values will work so you can skip this step. - You can then create and submit jobs to be run on the remote cluster.
The following is a simple example:
>> j = c.batch(@pwd, 1, {} ); >> j.wait >> j.fetchOutputs(:) ans = /a/fs-3/export/home/deepthought2/mltrain >> >> j.delete >>
The variable
j
holds the "job"; you can use whatever variable you like. In this case, the "job" is created when we create a batch job on our parclusterc
. For this example, we are simply running the builtinpwd
command; in most cases you would probably be including a string with the name of an user defined function (e.g. the name of a "*.m" file without the ".m" extension). The1
in the batch command means that the function is expected to return 1 argument. The braces{}
contain a list of input values to the function; in this case, thepwd
does not take input argument, so we do not provide any.The first time you submit a job to the HPC cluster in a particular Matlab session, a pop-up message will be displayed asking if you wish to "Use an identity file to login to login.deepthought2.umd.edu?" (or whatever the login node for the cluster is). If you answer "No", you will be prompted for your password on the HPC cluster, and this is the recommended response for new users. Answering "Yes" requires one to setup RSA public key authentication on the HPC login nodes; you will be prompted to provide the location of the identity file and asked if the file requires a passphrase. In all cases, Matlab will remember this information (your password, or the location and/or passphrase to the identity file) for the remainder of your Matlab session.
When you issue the
batch
, a job is submitted to the scheduler to run on the HPC clsuter compute nodes. Depending on how busy the cluster is, the job might or might not start immediately, and even if it starts immediately, it will generally (except in overly simple test cases such as this) take a while to run. Thej.wait
will not return until the job is completed. You might instead wish to use thec.Jobs
command to see the state of all of your jobs. Although you can submit other jobs (be sure to store the job in different variables) and perform other calculations while your job(s) are pending/running, you cannot examine their output until they complete.To examine the results of a job (after it has completed), you can use the
j.fetchOutputs(:)
function as shown in the example. In the above example, you can see that it returned the path to the home directory of the Matlab test login account that it was run from. If the job does not finish successfully, you probably will not be able to get anything useful from thefetchOutputs
function. In such cases, you should look at the error logs (which can be lengthy) using thegetDebugLog
function. There are separate logs for each worker in the job, so you will need to do something like:>> j.Parent.getDebugLog(j.Tasks(1))
Note: The
fetchOutputs
function will only return the values returned from the function you called; data which has been written to files will not be returned. For such data, you will need to manually log into the HPC cluster to retrieve the information.The above example is unrealistically simple. In practice, you will generally need to set some more job parameters --- although Matlab DCS hides some of the complexity of submitting jobs to an HPC cluster from the user, it cannot hide all of it. In general, the settings for your job will be obtained from the
ClusterInfo
object in Matlab. You can use the commandClusterInfo.state()
to see all of the current settings, and in general the commandsClusterInfo.getFOO()
andClusterInfo.setFOO(VALUE)
can be used to query the value of a particular setting FOO, or set such to VALUE. Notable fields are:- WallTime: this sets the maximum wall time for the job.
If not set, the default is 15 minutes, which is probably too short for real
jobs.
This can be given using one of the following formats:
- MINUTES
- DAYS-HOURS:MINUTES
- DAYS-HOURS
- HOURS:MINUTES:SECONDS
- MemUsage: this sets the memory per CPU-core/task to be reserved. This should be given as a number of MB per core.
- ProjectName: this specifies the allocation account to which the job will be charged. Your default allocation account will be charged if none is specified.
- QueueName: this specifies the partition the job
should run on. Normally you will not wish to set this unless you wish
to run on the
debug
orscavenger
partitions. - UseGpu: If you wish for your job to use GPUs, you should set this to the number of GPUs to use. That will cause Slurm to schedule your job on a node with GPUs; additional work may be needed to get Matlab to actually use the GPUs.
- GpusPerNode: If you set UseGpu, the system will by default request a single GPU per node. You can set GpusPerNode to the number of GPUs you wish to request per node. Obviously, on Deepthought2 the only value that makes sense is 2 (as 1 is default, and at most 2 GPUs are avaialble per node).
- EmailAddress: if set, it will cause Slurm to send email to the address provided on all job state changes. The default is not to send any email.
- Reservation: if set, the job will use the specified reservation.
- UserDefinedOptions: This is a catch-all for any other
options you need to provide to Slurm for your job. You should just present
sbatch
flags as you would on the command line. E.g., to specify that you wish to allow other jobs to run on the same node as your job, you can provide the value--share
. You can provide multiple Slurm arguments in this string by just putting spaces between the arguments in the string. - MDCS at UMD A one-page overview of MDCS.
- Getting Started with Serial and Parallel MATLAB on Deepthought2 Instructions on how to set up your workstation to submit MATLAB jobs to Deepthought2.
The following example shows how to set a walltime of 4 hours and request 4 GB/core (4096 MB/core):
>> ClusterInfo.setWallTime('4:00:00') >> ClusterInfo.setMemUsage('4096')
Links with additional information related to MATLAB Parallel Server/Distributed Computing Server (MDCS)
The sections above should provide some basics to help you get started with MATLAB Parallel Server/Distributed Computing Server (MDCS), but a comprehensive discussion is well beyond the scope of this page. We provide here a few links to more information on Parallel Server/MDCS for your convenience.
In the fall of 2014, we had a tutorial on MDCS led by an instructor from MathWorks. The documentation from that is provided below --- although it is a little dated and there have been some minor changes since then, the basic concepts might still be useful.
MathWorks also has a significant amount of web documentation on Parallel Server/MDCS available at https://www.mathworks.com/help/mdce/.
Installing add-ons/packages/etc
The campus MATLAB license includes a fair number of licensed toolboxes. However, there are also has a large number of free and community provided toolboxes --- far too many for the Division of Information Technology to install all of them. For the most part, any individual toolkit/toolbox/package add-on is only used by at most a handful of people, so it is more efficient for the users to install these themselves.
This is relatively simple to do in the more recent MATLAB versions; from your main MATLAB screen, click on the "Add-Ons" drop down, and select "Get Add-ons". This might take a little while to open up due to the large number of add-ons available, but once open there are a number of ways to look for add-ons. If you know what add-on you want, the search bar on the top right might be the easiest way to find the add-on. Find the add-on you desire and click on it.
Once the window for the particular add-on opens, there should be a button labeled "Install" in the upper right. Click on that, and the add-on should be installed into the appropriate location in your home directory.
You will likely need an account with Mathworks/Matlab in order to download the add-ons. You can create such an account at https://www.mathworks.com/mwaccount/register; it is advised that you register with your "@umd.edu" email address to get the full benefits of your association with the University.
External Resources
For your convenience, we provide some links to some non-University resources about MATLAB which you might find useful.
Tutorials from Mathworks
These are free tutorials from Mathworks, the company which produces MATLAB.
- WallTime: this sets the maximum wall time for the job.
If not set, the default is 15 minutes, which is probably too short for real
jobs.
This can be given using one of the following formats:
>> c = parcluster;
You can choose whatever variable you like instead of c
, but if
so be sure to change it in the following examples as well.