Differences between the new Zaratan and old Deepthought2 HPC clusters
On this page we document some of the major differences you may encounter when
going from the old Deepthought2 cluster to the new Zaratan cluster. The
intent is to assist users of Deepthought2 and other previous UMD clusters to
quickly get up to speed on the new cluster.
There are a number of changes we are making with the Zaratan cluster, and
although we believe these will make for an overall better experience for all
HPC users, we wanted to enumerate them to help users with the transition. To
help you navigate, we have divided the changes in the categories below (with
some changes being listed in multiple categories).
- Major Differences
- Hardware Differences
- Storage Differences
- Login/Access/Accounting Differences
- Environment Differences
- Differences pertaining to the Software Library
Major Differences
This is just an overview of the differences we consider most signficant, for
users who do not have time to review the full list. We encourage all users
of the Deepthought2 cluster who are transitioning to Zaratan to try to find
time to look over the full list, but until you have such time here are what
we believe are the most critical differences. (Note: we will update this list
if we receive a lot of questions on items not in it, so it might be worth
returning to this page if you run into issues).
Hardware Differences
Of course, the hardware on the Zaratan cluster is significantly newer than what was available
on the old Deepthought2 cluster. The differences likely to have the most impact to
users are:
-
AMD chipset: The Zaratan cluster will consist of
AMD
CPUs
when it debuts, as opposed
to the Intel
CPUs
used in the Deepthought2 cluster. This choice was made to maximize the
performance per unit cost. This might
require users to change some flags when compiling codes in order to get the best
optimizations.
-
Increased number of cores per node: The standard compute
nodes on the Zaratan cluster have dual AMD Zen3 7763 CPUs, with 64
cores
per
CPU
for a
total of 128 cores per
node
,
whereas the Deepthought2 standard compute
nodes had dual Intel IvyBridge 2680v2 CPUs with 10 cores per CPU, for a
total of 20 cores per node. So the Zaratan nodes will have over 6 times
the number of cores of a standard Deepthought2 compute node. This means
that there are possibilities for
increased scaling of
shared memory parallelization schemes
(e.g.
OpenMP
,
TBB
, and other
thread based paradigms
).
But it also means that users must take care when submitting jobs so that
you do not accidentally reserve an entire node when you only need a
handful of cores. We will be changing the
default exclude mode policy
so that most jobs will default to being shared.
-
Memory per core: The standard compute nodes on Zaratan will
have 512 GB of RAM. While this means that jobs that do not need to use all of
cores on a node can use significantly more memory than on Deepthought2 (with
only 128 GB of RAM per standard compute node), the average memory per core
is only 4 GB /core, as opposed to a bit over 6 GB/core on Deepthought2.
Also note that the charging mechanism on Zaratan has been modified so that
jobs requesting more than the average memory per
CPU core will be charged additionally --- this is to prevent unfair
situations like a job requesting 1 CPU core and 500 GB, thereby
effectively monopolizing the node, but only being charged for 1/128 of a node.
-
GPUs: The Zaratan cluster includes 20 nodes each with
four
Nvidia A100 Tensor Core GPUs (using the Ampere architecture supporting
CUDA compute capability
8.0.
These are much more powerful than the K20 GPUs available on Deepthought2.
Indeed, because the new GPUs might be more powerful than many jobs require,
we are looking into splitting some of the GPUs into multiple, smaller GPUs
(using
NVIDIA Multi-Instance GPU technology) to significantly increase the number
of GPUs available on the cluster. Some of the GPUs will remain unsplit for
jobs requiring the full power of the A100 GPUs --- we will be adjusting the
ratio of split/unsplit GPUs to maximum the utilization of the GPUs as we
observe the jobs being sent to them.
To better utilize the GPUs, we have created a
partition for GPU jobs. You will still want to
specify a GPU GRES when you submit
a job, and will be be creating new
GPU model specific GRESes to allow jobs to specify exactly what GPU
is required. This will allow one to distinguish between the split and full
A100s, as well as any future GPU models added to the cluster.
-
Non infiniband nodes: The Zaratan cluster also includes
about 19 nodes each with dual AMD Zen3 Epyc 7502 CPUs, with 32
core
per
CPU
for a
total of 64 cores per
node
and
1 TB of RAM per node (or 16 GB/core). These nodes do not have
high speed infiniband interconnects, only dual 25 Gb/sec ethernet. As such,
we are targeting these for jobs which do not require the high bandwidth and low
latency of the standard (infiniband) compute nodes. This would include
sequential jobs
and other
high-throughput computing
and similar
jobs which will fit onto a single node. These nodes also have massive amounts
of local flash storage which could be useful in some data intensive jobs.
These nodes will be accessible using the
'nonib' queue.
-
Interconnects: The networking interconnects between most of
the nodes are in general significantly faster than those on Deepthought2.
The standard compute nodes have HDR100 infiniband interconnects, running at
100 Gb/sec. The fileservers and GPU nodes have full HDR interconnects, for
200 Gb/sec. The sequential nodes do not have infiniband, so they use 10 Gb/sec ethernet.
Storage Differences
- High Performance File System/Scratch Storage:
The high performance filesystem on Zaratan will be 2 PB in size, double the size of what was available
on Deepthought2. The high performance filesystem on Zaratan will be using
BeeGFS for the underlying filesystem.
- SHELL (Medium Term) Storage:
Zaratan will include some 10 PB of
SHELL medium-term storage.
This storage space is intended for storage of important files and data which
is part of ongoing research but is not being actively used by jobs running
on the cluster. It will be accessible from the login nodes,
but not accessible from compute nodes. The SHELL storage system leverages
the Auristor File System, basically
an enhanced version of the
AFS
filesystem which has long been used at UMD. This is a global file system
with clients available for all modern operating systems, which makes the
content securely accessible (secured by
Kerberos credentials) wherever it is needed. For more information, see
our SHELL documentation.
- Project Directories:
On Deepthought2, users had individual work directories directly under the
root of the high performance filesystem (HPFS), e.g.
/lustre/job
. On Zaratan, storage will generally be
organized by projects. Users will have a single home directory
regardless of how many projects/allocations they belong to.
Users will also have a personal directory underneath their project's
scratch and/or shell storage directory (assuming the project has
such resources). Symlinks will be created (scratch-$project
and shell-$project) in the users home directory to these personal directories.
NOTE: All data stored by users on the
UMD maintained HPC systems is considered to belong to the
principal investigator (PI) responsible for the allocation/project.
Do not store any information on these clusters
which is not to be shared with your PI.
- Storage Quotas
All storage systems on Zaratan will have enforced quotas, and
will be attached to projects.
Projects and allocations will receive quotas for the scratch and
medium term storage resources on creation of the allocation/project, and
are only valid for the duration of the respective allocation/project.
Note that there are separate quotas for the scratch and shell storage
systems.
These storage quotas are evaluated and assigned in similar fashion
to CPU time --- for allocations from the AAC, this requires the
requestor to specify and justify their storage needs to
the satisfaction of the committee. As with CPU time, the more that
is requested the better justification is required, and after a certain
threshold, payment will be required.
The quotas for the project apply to the combined usage of all members
of the project. Additional quotas may be applied to usage by
individual members of the project --- these can be adjusted on request
of the PI (as long as the combined usage remains within the project
limit).
NOTE: Even though the storage systems have quotas
enforced, users are still expected adhere to University and HPC cluster
policies regarding
the use of storage resources. This includes moving or deleting data
on the scratch filesystem which is no longer needed for active jobs/research
on the cluster, and compressing data on the medium term storage tier when it
is not going to be accessed for a while. We will verify proper use of the
disk resources before any requests for additional storage will be considered,
as well as possible spot checks.
Login/Access/Accounting Differences
With Zaratan, we are introducing a number of changes related to
accessing the system and accounting.
-
Logging into the system:
To access the login nodes on Zaratan, ssh to
login.zaratan.umd.edu
.
-
Multifactor Authentication:
Access to the Zaratan cluster will require multi-factor authentication,
using the standard
campus DUO MFA system. Since the standard
campus VPN does multifactor authentication, if you are on the VPN,
ssh connections to the login nodes will not prompt you for multifactor.
But if you are not on campus VPN, when you ssh to the Zaratan login nodes,
you will first be prompted for your password and then prompted to enter
a passcode (or a single digit for a "push"). See the main documentation about MFA logins to
the cluster.
-
Allocation Re-structuring:
In order to make HPC at UMD more self-sustaining financially, we are
(at the behest of the provost's office) implementing some changes in
the process for creating allocations. Requests for new allocations
and the renewal of existing allocations are still made by the
same application form to the
Allocations and Advisory Committee (AAC), but the allocation levels have
changed. All faculty are eligible to receive a basic allocation, with
additional compute time available from the AAC, up to some threshold.
If your computing needs exceed this threshold, it can be purchased ---
monies from such purchases will be used for upgrading the cluster.
These are all renewable annually. All of these requests can be made
via the aforementioned form; only minimal information is needed for
the basic allocation, with greater information needed as more compute
time is requested. Note that since storage
is now quota-ed, allocations will now also include (and allocation
requests will need to request and justify) storage quotas on the high
performance/scratch and SHELL/medium-term storage systems. Please see the
following page on
allocation levels and costs for more detailed information, including
pricing information that can be included in grants.
In addition, some
colleges (e.g. CMNS and Engineering) may have their own pools of compute
time which they have paid for and which they may award to their faculty.
The allotment of these college level pools is up to the colleges.
Allocations will now consist of a single allocation account,
instead of the two-tier (standard and hi-priority allocation accounts)
system used on Deepthought2. The non-paid allocations from the AAC will
be allotted annually; paid allocations will be alloted quarterly.
-
Allocation Management:
We are introducting a ColdFront portal which
faculty members can use to review and manage their allocations. Faculty
members who have received an allocation on any of the DIT maintained
clusters will be able to login to the portal and see information about
their project and allocations. PIs can see current usage statistics
(both for compute and storage), and see which people have access to their
allocation, as well as manage access. Users of the cluster still must
be in the campus LDAP directory and have active Glue/TerpConnect accounts.
Differences pertaining to the Software Library
-
Versioning in the Software Library:
If you do not specify a version in the module command, you will now
generally get the latest installed version of the package requested
which is compatible with any previously loaded packages (e.g. compiler,
MPI libraries).
- Toolchains:
We plan to support several 'toolchains' consisting of compilers and
basic libraries. We plan to update these toolchains roughly annually,
and will give versions based on the year of update. The basic toolchain
families will be:
- gnu: This is a toolkit of free/opensource
packages based on the GNU Compiler Collection. It
includes OpenMPI, OpenBLAS, FFTW, etc.
- intel: This toolkit is based on the legacy
Intel compilers (icc, ifort, etc) along with the
Intel Math Kernel Library.
- oneapi: This toolkit is based on the new
clang Intel compilers (icx, ifx, etc) along with the Intel
Math Kernel Library.
- aocc: This toolkit is based on the AMD
optimizing compiler (AOCC clang, flang, etc) along with the
AMD optimized math libraries
- nvchpc: This toolkit is based on the compilers
from the NVidia HPC toolkit
(nvc, nvfortran, etc) and their MPI and libraries.
Most installed packages are installed with the gnu toolkit, but a basic set
of libraries (BLAS, NetCDF, etc.) are provided for all of the toolkits.
Environment Differences
-
Change of default shell:
The default login shell on the Deepthought2 cluster was tcsh; this was
inherited from the legacy Glue/TerpConnect environment which chose csh over
the Bourne shell sh as being more user-friendly when the decision was made
some 30+ years ago. Currently, the bash shell is at least as user friendly
as tcsh, is more commonly used in the linux community, and is much better
suited for scripting than the csh variants. As such, we have decided to
change the default login shell on Zaratan to bash.
You can still
change your login shell
to tcsh, etc., but the default will now be bash
in most cases.
If you have a Glue/TerpConnect/Deepthought2 account and your login shell
was something other than csh/tcsh, and the corresponding shell exists on
Zaratan, on the creation of your Zaratan account we will set your login
shell to match what it was on Glue/TerpConnect/Deepthought2. Otherwise
your login shell will be set to bash
.
Unfortunately, we cannot distinguish between those users who do not care
which login shell they used and/or did not know how to change it and those who
actually have a real desire to use the tcsh or csh shells, so if you are in
the latter category, your default shell on Zaratan will not but your desired
tcsh. Although this could present a good opportunity to re-evaluate your
choice of shell, if you really wish to use tcsh, you can
change your login shell.
-
Dot files/startup files:
On Deepthought and Deepthought2, it was recommended that you do not edit
your .cshrc and .bashrc files, but instead edit .cshrc.mine and/or .bashrc.mine
instead. This is no longer the case; feel free to create and edit .cshrc,
.bashrc, etc. to customize your environment. Indeed, by default, any
customizations you place in a .cshrc.mine or .bashrc.mine, etc, will
not be processed on login and so will have no effect.
Back to Top