Skip to main content

Using the HPC clusters: The Basics

This page covers some preliminary information about using the environment on the HPC clusters. I.e., the stuff you need to know how to do before we can even start talking about submitting jobs to the system using the command line.

Note: Users who are uncomfortable or unfamiliar with the Unix/Linux command line should consider using the OnDemand Web portal for accessing the cluster. While the command line is the more powerful interface to the cluster, the OnDemand portal has a gentler learning curve. The OnDemand portal is only currently available for the Zaratan and Juggernaut clusters.

  1. Logging into the system
    1. Multifactor Authentication
    2. Common ssh warning/error messages
    3. Ssh Fingerprints
    4. Setting up ssh public-key authentication
      1. Setting up public-key authentication for passwordless ssh among the nodes of the HPC cluster.
      2. Setting up public-key authentication for passwordless ssh into the HPC cluster from your workstation, etc.
    5. Using ssh public-key authentication
      1. Using public-key authentication on Windows systems
      2. Using public-key authentication on Linux or Mac_systems
    6. Setting kinit for passwordless authentication
      1. Installing/configuring Kerberos client on Windows systems
      2. Installing/configuring Kerberos client on Mac systems
      3. Installing Kerberos client on Linux systems
      4. Configuring Kerberos client on Linux systems
  2. Graphics
  3. Basic unix commands
    1. Changing your default shell
  4. Setting up your environment
    1. Preventing output from your dot files
  5. Files and storage basics
  6. Transferring files to/from the HPCCs and other systems
    1. Using scp/sftp
    2. Using Globus
    3. To/from Cloud Storage Providers
  7. Compiling codes
    1. Optimization of Code
    2. Compiling with OpenMP

Logging into the system

Each cluster has at least two nodes available for users to log into. From these login nodes, you can submit and monitor your jobs, compile codes, look at the results of jobs, etc. These nodes can also be used for transferring files and data to/from the cluster and other systems.. However, for large data transfers, there are data transfer nodes (listed below which should be used instead of the login nodes).

WARNING
DO NOT RUN computationally intensive processes on the login nodes!!!. These are in violation of policy, interfere with other users of the clusters, and will be killed without warning. Repeated offenses can lead to suspension of your privilege to use the clusters.

For most tasks you will wish to accomplish, you will start by logging into one of the login nodes for the appropriate cluster. To log into the cluster, you need to use the Secure Shell protocol (SSH) . This is usually standardly installed as ssh on Unix systems, and clients are available for Windows, Mac, and even Android, however on non-Unix systems you typically must install an SSH client .

Once you have an ssh client is installed on your system, you just tell it you wish to connect to the one of the login nodes for the cluster desired. Assuming your official UMD email address is johndoe@umd.edu,

Cluster Login Node Username
Zaratan login.zaratan.umd.edu johndoe
Juggernaut login.juggernaut.umd.edu johndoe
WARNING
NOTE: Zaratan requires multifactor authentication (MFA) using the standard campus DUO MFA system. Either you must be on the standard campus VPN (which requires MFA to authenticate), or when you ssh you will get prompted to enter your passcode or a single digit to send a "push" to a phone after entering your password,

On a TerpConnect/Glue system, you would just issue the command ssh login.zaratan.umd.edu to connect to Zaratan, or similarly ssh login.juggernaut.umd.edu to connect to Juggernaut. The unix ssh command by default assumes your login name on the remote system is the same as on the local system, which is true for the UMD HPC clusters and for TerpConnect/Glue systems. From other Unix systems, you might need to specify your cluster username, e.g. sshUSERNAME@login.zaratan.umd.edu or ssh -l USERNAME login.zaratan.umd.edu. , where USERNAME is your Zaratan username.

Multifactor Authentication

Starting with Zaratan we are requiring multifactor authentication to access the HPC clusters. We are using the standard campus DUO MFA system.

Since the standard campus VPN does multifactor authentication, if you are on the VPN, ssh connections to the login nodes will not prompt you for multifactor --- you just need to enter your password as before.

If you are not on campus VPN, when you ssh to one of the Zaratan login nodes, you will first be prompted for your password and then you will be prompted to enter a passcode or a single digit from a menu for a "push" or phone call for verification. E.g., you will see something like the session below, and at the passcode prompt you can enter a passcode from the Duo app on your mobile phone, or have Duo send a push or make a phone call to a previously registered device for the second authentication factor.

For more information, see the web page on the campus Duo Multifactor Authentication System.

my-workstation:~: ssh login.zaratan.umd.edu

                              * * * WARNING * * * 

   Unauthorized access to this computer is in violation of Md.
   Annotated Code, Criminal Law Article sections 8-606 and 7-302 and the 
   Computer Fraud and Abuse Act, 18 U.S.C. sections 1030 et seq. The University 
   may monitor use of its computing resources as permitted by state 
   and federal law, including the Electronic Communications Privacy Act, 
   18 U.S.C. sections 2510-2521 and the Md. Annotated Code, Courts and Judicial 
   Proceedings Article, Section 10, Subtitle 4.  Anyone using this system 
   acknowledges that all use is subject to University of Maryland Policy 
   on the Acceptable Use of Information Technology Resources available at 
   http://www.umd.edu/aup.

   By logging in I acknowledge and agree to all terms and conditions
   regarding my access and the information contained therein.

To report problems or request assistance call the Help Desk at 301-405-1500

Password: 
Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-1234
 2. Phone call to XXX-XXX-1234
 3. Phone call to XXX-XXX-4444

Passcode or option (1-3):

Common SSH warning/error messages

The first time you connect to the system via ssh, you might receive a warning message like (color added to assist in discussion below):

The authenticity of host 'login.zaratan.umd.edu' can't be established.
#RSA key fingerprint is e8:41:71:ac:fc:4c:08:c7:bc:0f:f0:33:95:5b:c4:e0
# Are you sure you want to continue connecting (yes/no)?

The message sounds scary, but it is normal. SSH tries very hard to protect you from all sorts of hacking, and one such mechanism is to try to ensure you are talking to the system you think you are talking to. For every server you connect to, it remembers a secret (RSA fingerprint) to prove the identity of the server, and verifies that (for brevity, this is a gross oversimplification; for more information). But it cannot verify it the very first time you connect (unless, as is the case on some campus systems, systems staff have pre-populated SSH with informationa about the system you are connecting to). This message is just to inform you about that.

The IP address and the hostname (in green above) may vary, although the hostname should match the name of the system you want to connect to. The parts in red (the key type and the fingerprint) will depend on the system you are trying to communicate with. To be secure, you should verify that it matches one of the fingerprints listed below:

If the fingerprint does NOT match, you should NOT enter your password and contact system staff ASAP. If you enter your password in such a situation, as it is possible that someone is performing a man-in-the-middle attack and can obtain your password when you enter it.

If you see a message like

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
5c:9b:16:56:a6:cd:11:10:3a:cd:1b:a2:91:cd:e5:1c.
Please contact your system administrator.

you should always refrain from logging in and contact system staff as soon as possible. This means that the server you are connecting to did not know the secret remembered by SSH for that system, as described above, which means either system staff changed the keys, or someone is hacking you. As system staff do not change the keys often, and will send email to everyone well in advance of changing the keys warning you of this if we were to, this likely means someone is attacking you unless you received an email from systems staff. Do NOT enter your password and contact system staff.

Setting up ssh public-key authentication

WARNING

NOTE: If you login to Zaratan using public-key authentication, you will not have Kerberos tickets or AFS tokens, so certain commands will not work. For example, you will not be able to access the contents of your SHELL directory as the underlying filesystem uses AFS tokens for authentication. If you do find yourself needing such, you can just issue the renew command (which will prompt you for your password, defeating the goal of "password-less" ssh, etc).

We recommend using kinit on your workstation to achieve a mostly "password-less" ssh capability which will provide you with Kerberos tickets and AFS tokens after ssh-ing into the cluster.

This section discusses how to setup and use SSH with public-key based authentication to allow for access to the cluster without typing your password for every ssh session. It can also be used to allow passwordless ssh between the various login and compute nodes; this is useful if you are using ssh to spawn processes on the different nodes allocated to your job.

The procedures listed in this section are NOT required for you to access the system, and you can use normal password based authentication instead. It also is NOT required for most multinode jobs (using MPI or srun). It is only for users who wish to set up public-key authentication, either to allow passwordless access to the cluster from your workstation, or to allow passwordless ssh between the nodes of the cluster.

If you are new to Unix/Linux, it is recommended that you skip this section and just use password based authentication.

Public-key authentication uses asymetric encryption for its security. In asymmetric cryptographic systems, there exist distinct private and public keys. Data can be encrypted and/or digitally signed with the private key and can only be decrypted/signature verified with the public key. The private key is kept secure on your workstation, and the public key is copied to the HPC cluster. When you log in with public key encrpytion from your workstation, the ssh client on your workstation will digitally sign a standard message and send this to the sshd server process on the cluster along with the appropriate public key. The remote sshd process verifies that the public key is authorized, and if so verifies if the digital signature is valid. If the signature is valid, that means the ssh client is in possession of the private key corresponding to the public key, and because the public key is authorized, grants it access to the system.

WARNING
NOTE: the public-key authentication process grants login access to anyone/any system/etc. that knows your private key. Thus you need to ensure the security of your private key. Protect your private key at least as strongly as you would your password.

Setting up public-key authentication with ssh

Instructions for setting up public key authentication is discussed below. We break it down into two cases:

  1. for passwordless ssh among the nodes of the cluster
  2. for passwordless ssh from your workstation to the login nodes of the cluster

Although the process is essentially the same in each case, because of the shared home directory among the nodes of the cluster, the first case is a bit simpler and will be treated separately. The second case is slightly more complicated because there are steps which need to be done on both your workstation and on the HPC cluster.

Depending on your needs, you can do none of the steps, either one of the step, or both of them.

Setting up public-key authentication for passwordless ssh among the nodes of the HPC cluster

In certain cases, it might be necessary to enable passwordless ssh between the nodes of the cluster. The most typical such case is if you must use the ssh command to launch processes on multiple nodes as part of your job. (Most multinode jobs use MPI and/or the srun command, and so do not require this, but some may.) Because you cannot feasibly enter a password with a batch script, you need to enable passwordless ssh in such cases.

Because your home directory is shared among all nodes in the HPC cluster, everything for this process can be done on one of the HPC login nodes. Just log into a login node on the cluster, and then:

  1. Generate your host key, if needed.
    1. Run the command ls -l ~/.ssh/id_rsa.
    2. If the ~/.ssh/id_rsa already exists, you should already have the public host keys and should not need to do anything else and can skip the running the ssh-keygen command, although you should still verify the ownership and permission of the file.
    3. If you are having problems with passwordless among the cluster nodes, you can safely regenerate the public host key with the next step. If you arranged for passwordless ssh from the cluster to other systems, you will likely need to fix the authorized_keys file on the other systems if you regenerate the HPC host key.
    4. To (re-)generate the host key, run the ssh-keygen command. Accept all of the default value for the name of the file in which to save the key; the public key will be stored in the same directory, with a .pub extension. Generally, you should not enter a passphrase (just hit return at the two passphrase prompts) --- the most common case is to enable passwordless ssh among nodes in the cluster for use in job scripts, and that will not work if the key has a passphrase.
    5. Repeat the ls -l ~/.ssh/id_rsa command from above. It should now show the existance of the file. Please verify that the file is owned by you, and no one can access it but you. (Permission flags should be -rw-------.).
  2. Add the public host key to the authorized_keys file.
    1. If ~/.ssh/authorized_keys already exists, you should append the contents of ~/.ssh/id_rsa.pub to the file; you can do this with the command 'cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys'.
    2. If the authorized_keys file does not exist, yo can create it with the proper contents with the command: 'cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys'.
WARNING
Please ensure that your id_rsa file is only readable by you. This host key is all that is needed to access any system which has the corresponding public key added to its authorized_keys file.

You should now be able to ssh between nodes in the cluster without entering a password. Note that you do not have access to ssh to compute nodes unless you have a job currently running on the node. Typically this passwordless ssh is only needed to ssh between nodes allocated to a job within your job script.

Setting up public-key authentication for passwordless ssh into the HPC cluster from your workstation, etc.

Atlhough the process is in this case uses basically the same steps as above, because your workstation and the HPC cluster do not share a common home directory, the different steps need to be performed on different systems, and there is a file transfer required.

Some of the steps below are dependent on the ssh application being used on your workstation/desktop/laptop/etc. We are assuming putty on Windows systems and the standard command line openssh client on Macs or Linux systems. If you are using something else, hopefully it has some documentation describing how to set up public key authentication and which together with the instructions belong can be used to figure out what you need to do.

  1. First, you need to generate a private-public key pair. On
    • Windows systems: you should use the PuTTYgen utility to generate the key pair. Open the PuTTYgen utility, and
      1. At the bottom of the PuTTY Key Generator window, in the Parameters section, select the type of key to generate. It is recommended to use "RSA" (perhaps called "SSH-2 RSA" in some older versions) using the default (2048) number of bits.
      2. Click Generate in Actions section. You will be prompted to use your mouse/etc. to generate entropy that will help make the private key secure. Move the cursor around until the utility has generated the key (it will display in the area under Key).
      3. If desired, you can enter a passphrase to be used in encrypting the private key in the Key passphrase and Confirm passphrase boxes. Encrypting the private key with a passphrase will increase security, and with the Pageant utility provided with PuTTY you only need to enter the password once per login session on your workstation.
      4. Save the generated public key by clicking Save public key under Actions (next to Save the generated key). Enter the name (e.g. putty_public_key) and folder, and click Save.
      5. Save the generated private key by clicking Save private key under Actions (next to Save the private key). If you did not opt to encrypt the private key, it will ask if you are sure that you wish to save an unencrypted private key. "Yes" will proceed to save; "No" will allow you to go back and specify a passphrase to use. Use the default format/Save as type (PuTTY Private Key Files (*.ppk)), enter the name (e.g. putty_private_key) and folder, and click Save.
    • Mac or Linux systems: you should use something like ssh-keygen -t rsa.
      1. This will prompt you for a filename to save the private key to: the default is id-rsa in the .ssh subdirectory of your home directory. It is recommended that you use this default otherwise you need to tell the ssh or ssh-agent command which identity file to use when logging in.
      2. The public key will be placed in the same directory as the private key, with the same name but with a .pub suffix added.
      3. You will also be prompted for a passphrase to encrypt the private key with. Encrypting your private key with a passphrase will increase security, and with the use of ssh-agent (described below) you only need to enter the passphrase once per login session on your workstation. If you opt not to use a passphrase, just hit return without entering anything to leave the private key unencrypted.
      4. You will be prompted a second time for the passphrase (to ensure it was entered correctly). Type the passphrase from the previous step, or just hit return again if you opted for no encryption of private key.
  2. Next, you need to authorize that key to log into the HPC cluster (e.g. Zaratan) as your identity. This will require logging into the cluster using your password because the public key authentication is not set up yet.
    1. Use a scp or sftp client to transfer the public key file created in the previous step to a Deepthought2 login node. For linux or Mac users that kept the default name, the public key file will be ~/.ssh/id_rsa.pub. For Windows users, it will be the name used when you saved the public key (e.g. putty_public_key). DO NOT transfer the private key file --- that should remain in a protected spot on your workstation.
    2. Ssh to a Deepthought2 login node using password authentication.
    3. Use the command mkdir -p ~/.ssh to create the .ssh directory under your home directory if it does not already exist (the command will not harm anything if it does already exist).
    4. Use the command touch ~/.ssh/authorized_keys. If there is no authorized_keys file in the .ssh subdirectory of your home directory, this command will create an empty file. If there was one, it does not change the contents of the file.
    5. Use the command chmod 600 ~/.ssh/authorized_keys to ensure the proper permissions on the file. No one but you should be able to read or write to the file.
    6. Use the command cat PUBLIC_KEY_FILE >> ~/.ssh/authorized_keys to append the public key file to the authorized_keys file. Be sure to use TWO > characters without space between them in the above command (otherwise you might overwrite the file and lose previous contents). The PUBLIC_KEY_FILE in the above command should be replaced by the name of the public key file you just copied to the cluster; e.g. id_rsa.pub for Linux or Macs or putty_public_key or whatever you saved as on Windows.
    7. Delete the public key file you copied to the cluster using rm PUBLIC_KEY_FILE as it is no longer necessary.
WARNING
NOTE: Your private key is all that someone needs to access the cluster as you. KEEP THE PRIVATE KEY FILE SECURE. It is strongly suggested that you encrypt the private key file with a passphrase, so that both the passphrase and the file are needed to access the cluster.

You should now be able to ssh to the Zaratan cluster from your workstation using public key authentication, as described below.

Using public-key authentication with ssh

In this section we discuss how to use public-key authentication with ssh. Because it is good practice to encrypt your private key, typical usage involves starting an agent that runs in the background. You start the agent when you log into your workstation, and it loads your un-encrypted private key into memory at that time (it will need to ask you for the passphrase to decrypt it if it was encrypted). Then for the rest of the time you are logged into your workstation, it will provide the key to the ssh client so that you can connect to the cluster without providing a password. The agent differs according to what ssh client you use.

WARNING

NOTE: If you login to Zaratan using public-key authentication, you will not have Kerberos tickets or AFS tokens, so certain commands will not work. For example, you will not be able to access the contents of your SHELL directory as the underlying filesystem uses AFS tokens for authentication. If you do find yourself needing such, you can just issue the renew command (which will prompt you for your password, defeating the goal of "password-less" ssh, etc).

We recommend using kinit on your workstation to achieve a mostly "password-less" ssh capability which will provide you with Kerberos tickets and AFS tokens after ssh-ing into the cluster.

a name="ssh-pubkey-win">

Using public-key authentication on Windows systems

On windows systems, should be a Pageant SSH authentication agent installed with PuTTY.

  1. First, ensure that PuTTY is configured to use Pageant. This is the default, but just to be certain:
    1. Open PuTTY and in the left hand "Category" panel click on the small plus (+) next to SSH under Connection to expand the SSH options.
    2. Click on the Auth subtree that now appears under SSH
    3. Ensure that the box "Attempt authentication using Pageant" is clicked.
    4. Exit PuTTY
  2. Open the Pageant SSH authentication agent. It runs in the background, so when it is open you will just see a new icon (a computer wearing a hat) for it in the Windows notification tray. Double click on that icon to open up the Pageant window.
  3. The main window will list the keys Pageant is holding, which is probably none at this time. Press the Add Key to add the private key created in the previous section. This will bring up a file dialog, so find the private key you created (e.g. putty_private_key.ppk) and "open" it.
  4. If the key was encrypted using a passphrase, you will be prompted to enter the passphrase now. You will not need to enter it again as long as Pageant is running.

You can now use PuTTY to login into the Deepthought2 login nodes as before, and it will use public key authentication and not ask for your password on the cluster.

You might wish to have Pageant start up whenever you login to your Windows workstation. To do that:

  1. Right-click inside the Startup folder, and select New and Shortcut.
  2. In the Type the location of the item box, you should enter the path to pageant.exe followed by a space and the full path to your private key (e.g. putty_private_key.ppk). You should put both paths (the executable and the private key) in double quotes.
  3. Then click Next and enter a name for the shortcut (e.g. Pageant).

E.g.

"C:\Program Files (x86)\PuTTy\pageant.exe" "C:\Users\user_profile\ssh_key\putty_private_key.ppk"
All of that should be on a single line.

Then, every time you log into the Windows workstation, it will start Pageant (prompting you for the encryption key for the private key if it is encrypted) and you can use PuTTY to ssh to the Deepthought2 login nodes without entering any additional passwords for the remainder of your workstation session.

Using public-key authentication on Linux or Mac systems

On Linux or Mac systems, there are ssh-agent and ssh-add commands that should be standard. You can start the background agent by issuing the command ssh-agent. You can then add private keys to the agent using the ssh-add command. If you use the default location for your private key (e.g. ~/.ssh/id_rsa) you can just issue the command ssh-add and it will add any private keys in the default locations. If you added the key in a non-standard path, just use ssh-add PATH_TO_PRIVATE_KEY where PATH_TO_PRIVATE_KEY is the path to the private key file to use. If the private key is encrypted, the ssh-add will prompt you for the passphrase to decrypt it.

At this point, you can just ssh USERNAME@login.zaratan.umd.edu (where USERNAME is your username on the Zaratan cluster) and you will be logged into a login node using public key authentication; i.e. no additional password needed.

You can combine the ssh-agent and ssh-add commands into a single shell script if you want, or add to your .cshrc or .bashrc initialization scripts if you want them to start automatically when you login to your workstation.

Using "kinit" for (mostly) passwordless authentication

We recognize that people will often start multiple ssh sessions to the cluster, and that typing in your password for each such session is annoying. While SSH public key authentication will allow you to log into the cluster without typing a password, you will not get Kerberos tickets or AFS tokens that way, and therefore you will not be able to access any data on the SHELL filesystem unless you subsequently issue the renew command, which will require you to enter a password, defeating the goal of passwordless login.

The recommended approach is to install a Kerberos client on your workstation and configure your SSH client to use GSSAPI/Kerberos authentication when connecting to the cluster. Then when you log onto your workstation each day you can issue a command to obtain a new set of Kerberos tickets, and when you ssh into the login nodes of the cluster it will not request a password, and the system will automatically obtain AFS tokens for accessing your SHELL storage based on the Kerberos tickets.

The configuration process depends on the operating system of your workstation:

Installing/configuring Kerberos client on Windows systems

WARNING
This section is still under construction. The instructions which follow have not been fully tested, but we expect that they should at least mostly work. Please let us know if you experience any difficulties.

On Windows systems, you will need to install a Kerberos client in order to get valid Kerberos tickets on your workstation. We recommend installing the Auristor OpenAFS client, as that will provide both a suitable Kerberos client but also allow you (if so desired) to access SHELL storage from your Windows workstation.

  1. Download the latest version of the OpenAFS Client Installer from https://www.auristor.com/openafs/client-installer. Browse to the installer page, find the section titled "Windows Installer (64-bit)" and click on the yellow button with a label starting with "yfs-openafs".
  2. Clicking the button will open a registration page. You can skip this.
  3. Download the installer, and then run it. If it does not start automatically, you open the File Explorer application and go to your Downloads folder, and double click on the file (which should start with "yfs-openafs").
  4. The installer will start with a license agreement. You should read the agreement, and if you have no objections, accept the agreement in order to continue with the installation.
  5. The next page will ask for some options. Please set
    1. Default Cell should be set to shell.umd.edu (for the SHELL storage tier on Zaratan)
    2. Integrated logon should be set to Disable
    3. Cache size: keep the default
  6. The next page (Custom Setup) gives you options for what to install. Just use the defaults.
  7. The next page is to confirm that you really wish to install. Click the Install button to proceed.
  8. The Windows OS might also pop up a confirmation window asking if you wish to install new software. If so, click the Yes button to proceed.
  9. The package should be installing, and a window with a progress bar will be displayed. When done, you can click on the Finish button to exit the setup wizard.
  10. You will be prompted to restart the system to complete the installation.

After the system reboots, you can open a command prompt from the Start Menu and issue the command: kinit MYUSERNAME@UMD.EDU replacing MYUSERNAME with your login name on Zaratan (which should be the part of your @umd.edu or @terpmail.umd.edu email address to the left of the "at" sign (@), and will normally be all lowercase). The @UMD.EDU must be all uppercase. This will give you Kerberos tickets on your Windows workstation. This kinit step will need to be repeated every time you reboot your workstation (at least if you plan to use password-less ssh in that session), or when your Kerberos tickets expire (typically one day).

Although the above kinit step will obtain Kerberos tickets for you, you still need to configure your ssh client to authenticate to the remote system using these tickets. The steps to accomplish this depends on the specific ssh client you are using.

For the putty ssh client, do the following:

  1. Start putty, and go to the configuration menu.
  2. In the configuration menu, select SSH, then the Auth, and then the GSSAPI pane. On this pane, make sure the two boxes Attempt GSSAPI authentication and Allow GSSAPI credential delegation.
  3. The find the Connection and Data in the configuration menus, and in the field Auto-login username enter your username on the Zaratan cluster.
  4. Save the configuration

The above ssh client configuration should only need to be done once. After doing that, and assuming you have valid Kerberos tickets, you should be able to ssh into the Zaratan login nodes without an additional password prompt (although you will see a multi-factor prompt if not on the campus VPN).

Installing/configuring Kerberos client on Mac systems

The Kerberos client should already be installed on recent MacOS systems, so you should not need to do anything to install it.

The process to configure SSH to use Kerberos for authentication is the same on Macs as it is for linux systems.

Installing Kerberos client on Linux systems

Many modern linux distributions come with Kerberos clients automatically installed. You can verify this by issuing the command which kinit --- if that returns a path to the kinit command, you should have the required packages already installed.

If the which kinit command returns an error saying something like kinit: Command not found, then a kerberos client is not properly installed on your system. You should use whatever packaging system is appropriate for your distribution (e.g. dnf or yum for RedHat and Fedora-like systems, dpkg, apt or similar commands for Debian, Ubuntu, Mint and related systems) to find and install the appropriate package. The package names are distribution dependent, but Typical names are:

The proper package name will usually be something like one of the above; you probably just need one package, and likely will not find all of the above packages. You should not need to install any packages with server in the name for this. The krb5 packages typically use =MIT's implementation of Kerberos, and the heimdal named packages use the Heimdal implementation --- for this purpose you can use either implementation, and we recommend using whichever one is best supported for your distribution.

Configuring Kerberos client on Linux and MacOS systems

Once a Kerberos client is installed, you need to configure the ssh to send your Kerberos credentials to the login nodes of the cluster. To do that, you can edit (or create if needed) a file named config in the directory .ssh under your home directory and add the following lines:

Host *.umd.edu
	GSSAPIAuthentication yes
	GSSAPIDelegateCredentials yes

The first line Host *.umd.edu specifies which hosts the configuration is restricted to, and the next two lines instruct the ssh to attempt to login via the Kerberos tickets on your workstation when connecting to the named hosts. If the attempt to login via Kerberos tickets fails, it will fall back to requesting a password. If it can authenticate using Kerberos, it will also (due to the GSSAPIDelegateCredentials directive) forward you Kerberos tickets to the system you are logging into; this means that your login session on the remote system will have valid Kerberos tickets..

You should not forward your Kerberos tickets to untrusted systems; although Kerberos tickets are encrypted and do not contain your password, if a malicious user can access your Kerberos tickets, they can impersonate you until the tickets expire. Since your Kerberos credentials should not grant you access to any non-UMD system, in the snippet above we restrict ssh to only do Kerberos authentication and only forwardi tickets when connecting to a system in the umd.edu domain. You could change the string after the Host directive to e.g. tighten it only to a specific HPC cluster, but leaving at *.umd.edu should be safe and will cover all UMD HPC clusters as well as the Glue/TerpConnect/GRACE systems.

If a section for the desired host expression already exists, you can jsut add the GSSAPIAuthentication and GSSAPIDelegateCredentials lines into the existing section. You might also with to add the lines:

ServerAliveInterval 60
ForwardAgent yes
ForwardX11 yes
ForwardX11Trusted yes

The first line will help reduce SSH timeouts due to inactivity of the terminal. The various Forward* have to do with forwarding X11 graphical connections and similar. None of these lines are not required to do "passwordless" authentication, but you might find them useful.

Graphics

This section discusses running graphical applications on the login nodes of the cluster with the graphics appearing on your desktop using the network features of the X11 Windowing System. In order for this to work, however, you need to be running an X11 server on your desktop, which we discuss below. However, while the procedures described below should still work, we recommend instead that you look into using the Interactive Desktop of the OnDemand portal; this will allow you to start up an interactive grpahical job on a compute node with the graphics displaying in a window on your web browser. This is usually significantly easier to use than setting up X11 as described below.

The exact procedure for installing and running an X11 server on your local workstation depends on what your desktop, etc. is, and you might wish to contact the UMD helpdesk for assistance.

Some software packages have their own protocols for remote visualization, including:

Basic unix commands

This is a very short list of some very basic Unix commands to help you get started in the HPC environment.

Hopefully, the above list will get you through most of the basic tasks you need to complete. A full tutorial on using Unix is beyond the scope of this document. However, there are many tutorials for beginning to use Unix on the web. A few are tutorials we recommend are:

Changing your default shell

The shell is the command line prompt that you interact with. It also is what processes shell scripts. Although there are a number of shells available on DIT maintained Unix systems, the most common are tcsh and bash. Your default shell is the shell you get when you log into a system; by default on the Zaratan cluster it is set to /bin/bash. (NOTE: This is a change from the previous clusters, e.g. Deepthought2)

To change your default shell on Zaratan, currently you need to submit help ticket to HPC staff; be sure to include your username on the cluster and the new login shell you wish to use. At some point in the future, we hope to implement a process to allow you to change your shell without administrator intervention.

Setting Up Your Environment

In order to provide an environment that can support a wide variety of users, many software packages are available for your use. However, in order to simplify each individual user's environment, a large number of those packages are not included in your default PATH.

Your account as provided gives you access to the basic tools needed to submit and monitor jobs, access basic Gnu compilers, etc. If you need to make modifications to your login environment, you can do so by modifying your .bashrc file as necessary. (If you changed your default shell to tcsh or csh you should edit your .cshrc file instead.)

If you do change these files, it is HIGHLY advised that you DO NOT remove the part near the beginning that sources the global definitions, e.g. the part like

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
as doing so will cause you to lose any site wide set up.

For packages that are not included in your default environment, (which is almost everything beyond basic Unix commands like ls, cat, and editors) the module command is provided. When run, this command will modify your current session by adding the appropriate entries to your PATH and whatever other variables are necessary to ensure the proper functioning of the package in question. Note that these changes are temporary and only exist until you log out. If you want to have code run for you automatically, add the command module load PACKAGE to your shell initialization scripts (e.g. .bashrc or .cshrc).

Because many research codes are comples and depend on many libraries, and loading different versions of the same library in a given package can cause serious or subtle errors, the software library often has multiple builds of packages to ensure a consistent set of libraries. The module command has some intelligence built into it to try to ensure that a consistent set of packages are loaded. In general, it works best if you first load the desired compiler (and version), then the MPI library (if you will be using MPI versions of packages), and then any other packages you wish to use.

The full names of the module files to load are actually rather long, and include the compiler and some other dependencies or variants of the build. However, usually it is sufficient to just give the package name in a module load; although you might wish to specify a version, especially in job scripts, as without a version the module load command will normally load latest version (compatible with a previously loaded compiler and/or MPI library) installed on the cluster, which can change without notice.

The following additional subcommands for the module command are often useful:

The software page contains listing of the various packages available. If you click on the package name, you will get detailed information as to which versions are available, and whether they are available on the compute nodes or not of a particular cluster.

For example, if you want to run Matlab, you'll want to do the following. Notice that Matlab is not available until after the module load command has been run.

f20-l1:~: matlab
matlab: Command not found.
f20-l1:~: module whatis matlab
matlab               : Matlab 2014b
f20-l1:~: module load matlab
f20-l1:~: matlab

                              < M A T L A B >
		  Copyright 1984-2014 The MathWorks, Inc.
                   R2014b (8.4.0.150421) 64-bit (glnxa64)
                             September 15, 2014

 
  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 
>> 
WARNING
If you are running a bourne or bash script under sbatch but your default shell is csh or tcsh, remember that you must include a . ~/.bashrc (on the Zaratan cluster) or a . ~/.bash_profile (on the Juggernaut cluster) in your script to enable the module load and/or tap commands.

For more information, see the section on the module command.

Preventing output from your dot files

It is strongly recommended that your dot files (e.g. .cshrc and/or .bashrc) do not produce any terminal output, or at least, only do so when run in an interactive shell. Output in non-interactive shells can cause problems, most notably with some file transfer protocols like scp and sftp --- in most cases, the stray output will confuse the file transfer protocol usually causing it to abort.

WARNING
If commands in your .cshrc, .bashrc, or other dot files might produce output to the terminal, you should take measures to ensure such output is only produced in interactive shells. Otherwise, you might break file transfer protocols like sftp or scp.

If you have commands in your dot files which might produce output, you should consider running them only in interactive shells so as not to confuse file transfer protocols. The method varies by the type of shell; for csh and/or tcsh style shells, something like:

if ( $?prompt ) then
	#Only execute the following in interactive shells
	echo "Hello, today is "
	date
endif

#Here we redirect output (if any) to /dev/null
some_command >& /dev/null

For Bourne-like shells (e.g. sh and/or bash) something like:

if [ ! "x$PS1" = "x" ]; then
	#Only execute the following in interactive shells
	echo "Hello, today is "
	date
fi

#Here we redirect output (if any) to /dev/null
some_command > /dev/null 2> /dev/null

You can also redirect output to /dev/null, as is done in the case of some_command in the above examples. Be sure to remember to redirect both stdout and stderr. In some cases, the command has options to silent it, which can also be used (e.g. the old tap has a -q option), but this generally will still get output from errors.

Files and Storage Basics

On each cluster, you have several options available to you regarding where files are stored. A whole section is devoted to this on another page, but it is important and basic enough that a short discussion is merited here.

Files and Storage Basics: the Zaratan cluster

With access to the cluster, you are granted a home directory, which is the directory you see when you first log in. It is distinct from your standard TerpConnect/Glue home directory. Also, if you have access to more than one HPC cluster, your home directory on each is distinct from the other(s).

Your home directory is, by default, private to you, and should be used as little as possible for data storage. In particular, you should not run jobs out of your home directory --- run your jobs from a scratch filesystem; these are optimized to provide better read and write performance to improve the speed of your job. After the job is finished, you might wish to copy the more critical results files back to your home directory, which gets backed up nightly. (The scratch filesystems are NOT backed up.)

You should run jobs out of a scratch filesystem. On Zaratan, you have two choices of where to locate your data. The first space is shared with the rest of the users in your project, the second is private to you. All users will have at least these two spaces, and users that are part of more than one project may have additional spaces.

where USERNAME is your Zaratan username and PROJECT is your Zaratan project name. A link in your home directory has been provided to give you easy access to your private scratch space. You can access it via the ~/scratch symbolic link (which just provides a different name with which you can access the contents of the directory). If you are a member of multiple projects, you'll have multiple links of the form ~/scratch.PROJECT.

In addition to your home and scratch spaces, you also have SHELL spaces. These spaces are intended for medium-term storage of data, and are part of a networked filesystem that is also made available to machines outside the cluster. You may install a client on your local workstation or laptop that will provide you direct access to these space. More information on how to access your SHELL data remotely. The SHELL filesystem is not optimized for high performance, and for that and other reasons it is not mounted on the compute nodes, so Your SHELL directories are not accessible from within jobs. For each project you belong to, You normally will have symbolic link ~/SHELL.PROJECT pointing to your personal subfolder of PROJECT's SHELL tree.

All users should belong to at at least one project, and so will have at least a personal scratch directory and personal SHELL directory for the project.

Transferring files to/from the HPC clusters and other systems

Before long, you will need to transfer files to the HPC clusters from your computer, or from one of the HPC clusters to your computer, or otherwise move data to/from one of the HPC clusters to another location. There are several options for this.

Note: The compute nodes of the cluster are not able to access the public internet. This is by design, both for security reasons and because the inherent latencies in using resources off the public internet generally are contrary to the goals of "high performance" computing. The login and data transfer nodes are generally able to access the public internet. It is recommended that you transfer the data using the login or data transfer nodes, and then run any computations on the compute nodes against local copies of the data. The compute nodes generally will have access to the campus network.

Using scp or sftp protocols

The "standard" unix-like utilities for transferring data between nodes are the scp and sftp programs, which implement the Secure Copy (SCP) and Secure File Transfer Protocol (SFTP), respectively. These are typically preinstalled on Unix-like systems. Windows or Mac users might need to install a scp/sftp client on their machine.

If you are transferring between clusters, generally both sides have the server processes running, and you can initiate the transfer from either side. When dealing with your workstation, it most likely does NOT have the server running, so you regardless of which way you wish to transfer data, you will likely want to initiate the transfer from your workstation.

Assuming you are on your workstation and your have the client installed, just open up the client and point it to the Zaratan login nodes as described in the section on logging into the clusters.. I. e.,

unix-prompt> scp -o User=payerle myfile login.zaratan.umd.edu:
Password:
myfile                               100%  867     0.9KB/s   00:00
unix-prompt>
unix-prompt> sftp -o User=payerle login.zaratan.umd.edu:
Connecting to login.zaratan.umd.edu...
Password:
sftp> put morefiles* 
Uploading morefiles1 to /home/payerle/morefiles1
morefiles1                              100%  132     0.1KB/s   00:00    
Uploading morefiles2 to /home/payerle/morefiles2
morefiles2                              100%  308     0.3KB/s   00:00    
sftp> quit
unix-prompt>

The above example shows how to transfer files using scp and sftp for the user payerle; you will obviously need to replace that username with your own.

This will by default allow you to move files to and from your home directory. For larger data sizes (more than a few GB) you almost certainly wish to place them in scratch space or in your SHELL space.

Some more detailed information regarding the use of the scp command can be found in the section on basic Unix commands

WARNING
The compute nodes are not on the public internet. File transfers should be to or from the login or data transfer nodes.

Using globus

For large amounts of data, you might wish to use Globus for transferring files. Globus can automatically user multiple streams (speeding up the transfers) and can automatically restart failed transfers (very useful when dealing with many GBs of data), and is supported by most HPC clusters.

You can also install the free Globus Connect Personal; see instructions here.

To use globus, go to the login page https://globus.org/login, and log in. You can select "University of Maryland College Park" as your organization and login with your University ID and password. The select your endpoints (you might need to provide a username and password for the selected endpoint) and start transferring files.

More detailed instructions re using Globus

Transferring files to/from Cloud Storage Providers

The use of Cloud Storage providers for storing data is becoming increasingly popular. The University provides free access to

These Cloud Storage providers can be particularly useful for storing archival data. Please see the service catalog entries above for more information on the services, and in particular for a discussion of what data is permissible to store there.

Each Cloud storage provider backend has slightly different procedures for uploading and downloading data. However, the rclone utility is a nice utility that can access many different Cloud service providers (including both Google Drive and Box) with a common command line interface.

More information about using rclone can be found here.

WARNING
The compute nodes are not on the public internet. File transfers should be to or from the login or data transfer nodes.

Compiling codes

Compiling codes in an HPC environment can be tricky, as they often involve MPI and other external libraries. Furthermore, the various HPC clusters offer multiple compiler familiess, and multiple versions of the compilers within each family, and multiple versions of many libraries. All of this can make this compilation even more complicated.

Note: By design, the compute nodes of the cluster are not able to access the public internet. You should download source codes, etc. from the login or data transfer nodes. In general, we allow the compilation of codes on the login nodes, the only exception to the policy of no compute intensive processes on the login nodes. We do ask that you restrict any compilations to use no more than 8 threads.

We have simplified this a bit with the introduction of "toolchains". These are collections of compilers and related libraries. The various toolchains are:

Once you decide which toolchain you want to use, you should module load it. You can module load the entire toolchain, or individually module load the various components it includes (the toolchain module does not do anything special, it is just a shortcut for loading a bunch of modules).

The following compiler commands are available on the Zaratan HPC cluster:

Compiler FamilyMPI library C compilerC++ compilerFortran77 compiler Fortran90 compiler
GNUnonegccg++ gfortrangfortran
GNUOpenMPImpiccmpic++ mpifort mpifort
GNUIntel MPI *** Illegal combination ***
Intel (legacy)noneiccicpc ifortifort
IntelIntel MPImpiiccmpiicc mpiifort (NOTE the doubled i) mpiifort (NOTE the doubled i)
Intel (new)noneicxipcx ifxifx
IntelIntel MPImpiiccmpiicc mpiifort (NOTE the doubled i) mpiifort (NOTE the doubled i)

If you have any external libraries you need to use, you need to module load or tap these as well. Some libraries have specific versions compiled with and for a specific compiler/MPI library combination; in such cases you need to pick a version which matches what you are using. Not all combinations exist; if yours does not you can submit a help ticket requesting that combination. We generally try to avoid doing this for old versions of compilers/packages/etc. unless there are extraordinary reasons, so you are generally advised to try to use the latest versions available on the system. Fortran90 codes are particularly sensitive to this, and the *.mod files between different versions of the same compiler might not be compatible (and are definitely not compatible across compilers).

For packages which are libraries used by other packages (e.g. LAPACK, NETCDF, etc), the module load command is not enough. The package being compiled needs to know where to find the libraries, and the way to inform the package of that depends on the build system used by the package.

Doing a module load or tap for an external library generally only defines some environmental variables for you, you still need to instruct the compiler where to find any needed include files and where to find the libraries. Generally, "module help MODULENAME" or "tap TAPNAME" will give a brief usage summary giving the names of these variables. Typically there is a variable with a name like "*INC" or "*INCDIR", and another one with a name like "*LIB" or "*LIBDIR". E.g. the netcdf package defines NETCDF_INCDIR and NETCDF_LIBDIR. The package fftw/3 on the Deepthought clusters defines FFTWINC and FFTWLIB.

The "INC" variable provides the location of the include directory for the package. You will generally want to either add arguemnts with these variables preceded by the -I flag to your compilation command. The -I flag takes a single path, so you should repeat it if you have multiple library packages that you are using.

login-1:~: gcc -c -I$NETCDF_INCDIR -I$FFTWINC my_netcdf_code.c
or to the CFLAGS and FFLAGS variables in your Makefile, e.g.
	CFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
	FFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)

The "LIB" variables work similarly, except these are needed when the compiler/linker creates the executable. For small codes, the compilation and linking usually occur in a single step; larger codes, especially those with makefiles, usually break this into separate steps. Also, the dynamic linker needs to know where the libraries files are when you run the code, so it is usually easiest to set the "rpath" during compilation. (Otherwise you will need to set the environmental variable LD_LIBRARY_PATH each time before you run the code). To tell the compiler where to find the library for linking, you provide the "LIB" variables as arguments to the "-L" flag. To set the rpath, you provide them as arguments to the "-Wl,-rpath," flag. Both the -L and -Wl,-rpath, take a single path, so you should repeat these flags for each library package.

For the simple case, you would do something like

login-1:~: ifort -o mycode -I$NETCDF_INCDIR -I$FFTWINC 
	-L$NETCDF_LIBDIR -L$FFTWLIB \
	-Wl,-rpath,$NETCDF_LIBDIR  -Wl,-rpath,$FFTWLIB my_netcdf_code.f90

Since this compiles and links my_netcdf_code.f90 in a single step, we need to provide both the compile stage (-I) flags and the link stage (-L and -Wl,-rpath) flags.

More complicated cases typically use makefiles, and here you typically will just do something like:

	CFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
	FFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
	LDFLAGS= -L$(NETCDF_LIBDIR) -Wl,-rpath,$(NETCDF_LIBDIR)
	LDFLAGS+= -L$(FFTWLIB) -Wl,-rpath,$(FFTWLIB)

Here we have the CFLAGS and FFLAGS definitions from above (which will be used in the compilation stage), and we put the -L and -Wl,-rpath flags in the LDFLAGS variable (we did this in two steps to make it more readable).

If you opt not to go the "rpath" route, and instead compile the code with something like

	CFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
	FFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
	LDFLAGS= -L$(NETCDF_LIBDIR) -L$(FFTWLIB)

(note that the -Wl,-rpath arguments are missing in the above), then before running the resulting binary (which we will call myprog) you will need to set LD_LIBRARY_PATH appropriately. This (and the module loads which precede it) will need to be done in every interactive or batch session which plans to run the code. E.g.

login-1: echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH: Undefined variable.
login-1: ./myprog
./myprog:  error while loading shared libraries: libfftw.so.2: cannot open shared object file: No such file or directory
login-1: module load fftw/2.1.5
login-1: module load netcdf
login-1: setenv LD_LIBRARY_PATH "${FFTWLIB}:${NETCDF_LIBDIR}"
login-1: ./myprog
Program runs successfully

If you do NOT use the "rpath" arguments shown earlier, every time you run the program the variable LD_LIBRARY_PATH must be properly defined to point to the library locations, or you will get an error like shown above. In general, you will need to load the modules and issue the setenv command once in every interactive login session in which you will use it, and once in every batch script. And you MUST set the directories correctly; if you, e.g., give the $FFTWLIB path for a different version of FFTW than the one the code was compiled with, the binary might run, but it might crash with a difficult to debug error at some seemingly arbitrary place. Or perhaps even worse, it might run to a seemingly successful conclusion but produce erroneous output.

We strongly recommend that for your own code, or for code that you are compiling, that you use the rpath arguments shown earlier. The LD_LIBRARY_PATH option is, in our opinion, best used when you do not have the option of compiling the code with the rpath settings.

Optimization of Code

THIS SECTION STILL NEEDS TO BE UPDATED FOR ZARATAN

Since you are using a High Performance Computing cluster, you most likely have long, compute intensive jobs. These can generally benefit greatly from various optimization techniques. The general topic of code optimization is quite broad, far too large to give more than a cursory discussion here. You are encouraged to look at the myriad of resources on the topic that exist on the web. Here we just discuss some specific details related to the UMD clusters.

A major performance feature of modern processors is the ability to vectorize certain operations. In this mode, an instruction is decoded once and operates on a stream of data. This allows for significant performance boosts. There are various vectorization command sets available on Intel processors, known by names like SSE and AVX (with numerical suffixes to specify the version of the command sets). Selecting the correct optimization level can be tricky, as different processors support different vectorization command sets. This is further complicated by the mix of Intel and AMD processors on the cluster.

Currently the Zaratan cluster is fairly homogeneous, although we expect that this could change in the future as more nodes are added to the cluster, and consists of AMD Epyc chips with 128 cores per node which support AVX and AVX2, but not AVX512.

Compiling with OpenMP

OpenMP is an API for shared memory parallelization, i.e. for spliting a code into multiple threads running on the same node. Because the parallelization is limited to a single node, it is less powerful than other APIs (e.g. MPI) which can span multiple nodes, but is also much easier to code for.

Indeed, OpenMP is implemented by the compiler, and is generally invoked through various compiler directives and/or pragmas. E.g., if you have a C code with a for loop, you can add a pragma just before the start of the for loop which will tell the compiler to try to split the loop into multiple threads each running in parallel on a different processor core.

If you have a code with OpenMP directives, you need to tell the compiler to compile it with the OpenMP extensions enabled. The exact mechanism is compiler dependent:

Compiler FamilyFlag to use OpenMPDefault number of threads
(if OMP_NUM_THREADS not set, etc)
GNU compiler suite-fopenmpnumber of available cores
Intel Compiler Suite-openmpnumber of available cores
PGI Compiler Suite-mp1 thread

NOTE: If you are using the Intel compiler suite and the Math Kernel Libraries (MKL), some of the MKL routines might use OpenMP even if you did not request OpenMP in the main compilation. You can set the environmental variable OMP_NUM_THREADS to 1 to effectively disable that if really desired.

By default, OpenMP will attempt to use as many threads as there are cores on the system (except with PGI cmopilers, which default to one thread). This can be problematic in some cases. At runtime, you can set the environmental variable OMP_NUM_THREADS to an integer to control the maximum number of threads that OpenMP will use.

NOTE: Be sure to use the -openmp or -fopenmp flag (as appropriate for the compiler using) on all; of the compilation AND link stages for your code.






Back to Top