Monitoring and Managing Your Jobs, etc.

  1. Seeing what jobs are running/queued
  2. When will my job run
  3. Detailed information about your jobs
  4. Viewing output of jobs in progress
  5. Cancelling your jobs
  6. Monitoring the cluster

Seeing what jobs are running/queued

The slurm command to list what jobs are running is squeue, e.g.
login-1: squeue
           1243530 standard  payerle  R   18:47:23      2 compute-b18-[2-3]
           1244127 standard    kevin  R    1:15:47      1 compute-b18-4
           1230562 standard  payerle PD       0:00      1 (Resources)
           1244242 standard  payerle PD       0:00      2 (Resources)
           1244095 standard   kevin PD       0:00      1 (ReqNodeNotAvail)

The ST column gives the state of the job, with the following codes:

  • R for Running
  • PD for PenDing
  • TO for TimedOut
  • PR for PReempted
  • S for Suspended
  • CD for CompleteD
  • CA for CAncelled
  • F for FAILED
  • NF for jobs terminated due to Node Failure

The NODELIST(REASON) field will tell you on which nodes jobs that are currently running are running on. If the job is pending (i.e. not running), it will give a short explanation for why the job is not running (as of the last time the scheduler examined the job). Typically one might see something like:

  • (Resources) if the scheduler is unable to find sufficient idle resources to run your job (i.e. the cluster is too busy to run your job at this time. The job should run once resources become available (i.e. some currently running jobs complete, freeing resources)
  • (Priority) if their are other jobs with higher priority ahead of yours in the queue. The job should run once the jobs ahead of it get scheduled.
  • (AssocGrpCPUMinsLimit) or (AssociationJobLimit): these generally mean that your allocation account has insufficient funds available to complete this job and all currently running jobs charging against that allocation account. See the relevant FAQ entry for more information. This job will only run if the currently running jobs complete using much less SUs than predicted (based on their wall time limit) and/or if the allocation account gets replenished.
  • (QOSResourceLimit) generally occur only if you have submitted a large number of jobs. Some of those jobs will be held in a pending state to prevent adverse impact on the rest of the cluster. These jobs will typically run once the job count is reduced (by currently running jobs completing). See the relevant FAQ entry for more information.

Typically, if you see something note in the above list, there is a problem and you will want to contact systems staff to assist.

The squeue command also takes a wide range of options, including options to control what is output and how. See the man page (man squeue) for more information.

For example: if you add the following to your ~/.aliases file (assuming you are using a C-shell variant):

alias sqp 'squeue -S -Q -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %Q %R"'
when you next log in the command sqp will list jobs in the queue in order of descending priority.

When will my job start?

The scheduler tries to schedule all jobs as quickly as possible, subject to cluster policies, available hardware, allocation priority (contributers to the cluster get higher priority allocations), etc. Typically jobs run within a day or so, but this can vary and usage of the cluster can vary widely at times.

The command squeue command, with the appropriate arguments, can show you the scheduler's estimate of when a pending/idle job will start running. It is, of course, just the scheduler's best estimate, given current conditions, and the actual time a job starts might be earlier or later than that depending on factors such as the behavior of currently running jobs, the submission of new jobs, and hardware issues, etc.

To see this, you need to request that squeue show the %S field in the output format option, e.g.

login-1> squeue -o "%.9i %.9P %.8j %.8u %.2t %.10M %.6D %S"
      473  standard  payerle PD       0:00      4 2014-05-08T12:44:34
      479  standard    kevin PD       0:00      4 N/A
      489  standard tptest1.  payerle PD       0:00      2 N/A

Obviously, the times given are estimates. The job could start earlier if other jobs ahead of it in the queue do not use their full walltime, or could get delayed if jobs with a higher priority than yours are submitted before your start time.

Detailed information about your jobs

To get more detailed information about your job, you can use the scontrol show job JOBNUMBER command. This command provides much detail about your job, eg.

login-2> scontrol show job 486
   UserId=payerle(34676) GroupId=glue-staff(8675)
   Priority=33 Account=test QOS=normal
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:03:00 TimeMin=N/A
   SubmitTime=2014-05-06T11:20:20 EligibleTime=2014-05-06T11:20:20
   StartTime=Unknown EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=standard AllocNode:Sid=pippin:31236
   ReqNodeList=(null) ExcNodeList=(null)
   NumNodes=2 NumCPUs=8 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)

Viewing output of jobs in progress

Slurm outputs the stdout and stderr streams for your job to the files you specified on the shared filesystem in real time. There is no need for an extra command like qpeek under the PBS/Moab/Torque environment.

Cancelling Your Jobs

Sometimes one needs to kill a job. To kill/cancel a job that is waiting in the queue, or is already running, use the scancel command:

login-1> scancel -i 122488
Cancel job_id=122488 partition=standard [y/n]? y

Monitoring the Cluster

Notices of scheduled and unscheduled outages, issues, etc on the clusters will be announced on the appropriate mailing lists (e.g. HPCC Announce for the Deepthought* clusters) --- users are automatically subscribed to these lists when they get access to the cluster. MARCC/Bluecrab users can also look at the MARCC Twitter feed.

Sometimes you want a broader overview of the cluster. The squeue command can give you information on what jobs are running on the cluster. The sinfo -N command can show you attributes of the nodes on the cluster. But both of these use a text orientated display, which while providing fairly dense amount of information, is often difficult to digest.

The smap tries to present this more graphically. While still text based, the display starts with a representation of the nodes in the cluster, showing letters indexed to running jobs in a list below. More informantion can be found on its man page.

The sview command is even prettier, but as it uses real (not text mode) graphics it requires an X server running on the computer you are sitting at. This will present a graphical overview of the nodes in the cluster and their state, as well as the job queue.

PLEASE SET THE REFRESH INTERVAL to something like 300 seconds (5 minutes). Select Options| Set Refresh Interval. The application default is far too frequent and causes performance issues.

For an even prettier view, there are HPC dashboards available on line for the clusters at:

The above items show the current state of the cluster, but sometimes one wishes a more historical perspective. E.g, how was my allocation used over the past year. Historical metrics for Deepthought clusters are available from the Open XDMoD (XD Metrics on Demand) package.

Finally, sometimes you wish to look in more detail at how a node or bunch of nodes are performing. I.e., you wish to get a better idea of how much memory your job is using. We provide various metrics for the nodes in the cluster at the ganglia sites: