showq
. For example:
login-1:~: showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
4178 kevin Running 4 00:01:00 Mon Jan 22 11:13:09
1 Active Job 4 of 236 Processors Active (1.69%)
1 of 59 Nodes Active (1.69%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 1 Active Jobs: 1 Idle Jobs: 0 Blocked Jobs: 0
If your job shows up in the ACTIVE JOBS section as shown above,
your job should be off and running.
If your job shows up in the
IDLE JOBS section, that means that there currently are
insufficient resources available to run your job. Check to make sure
you haven't requested more processors than you need, and that you've
specified a reasonable walltime. If you see lots of jobs in the
ACTIVE JOBS section, it's probable that you'll just need to
wait for someone else's job to finish before yours can start.
If your job shows up in the BLOCKED JOBS section, it most
likely means that you did not have a sufficient amount of time
remaining in your CPU allocation to run the job. Either specify a
smaller walltime, or obtain an additional allocation. See the
section Diagnosing Job Problems for further
information.
The scheduler tries to schedule all jobs as quickly as possible, subject to cluster policies, available hardware, allocation priority (contributers to the cluster get higher priority allocations), etc. Typically jobs run within a day or so, but this can vary and usage of the cluster can vary widely at times.
The command showstart
can give you a general idea of
when your job will start. Use it as:
login-2:~/work/hpcc-tests/test: showstart 1770833
job 1770833 requires 16 procs for 8:00:00
Estimated Rsv based start in 3:15:58 on Fri Feb 7 21:50:13
Estimated Rsv based completion in 11:15:58 on Sat Feb 8 05:50:13
Best Partition: deepthought
Obviously, the times given are estimates. The job could start earlier if other jobs ahead of it in the queue do not use their full walltime, or could get delayed if jobs with a higher priority than yours are submitted before your start time.
To find out more detailed information about your job, use the
checkjob
command. This command will show you which
specific nodes were allocated to your job, and it will also show you
the job requirements you specified when you submitted the job.
login-1:~: checkjob 4209
checking job 4209
State: Running
Creds: user:kevin group:wheel account:kevin class:serial qos:serial
WallTime: 00:00:00 of 00:01:00
SubmitTime: Tue Jan 23 10:33:55
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
StartTime: Tue Jan 23 10:33:56
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [prod]
Allocated Nodes:
[compute-2-39.deeptho:1]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Flags: RESTARTABLE PREEMPTEE PREEMPTOR
Attr: PREEMPTEE
Reservation '4209' (00:00:00 -> 00:01:00 Duration: 00:01:00)
PE: 1.00 StartPriority: 200
If you want to view the output of your job while it is running, you
can use the command qpeek
. This command can be used to view
both the standard output and standard error streams from your job, and
can also be used to follow the output as it occurs.
login-1:~: qpeek
qpeek: Peek into a job's output spool files
Usage: qpeek [options] JOBID
Options:
-c Show all of the output file ("cat", default)
-h Show only the beginning of the output file ("head")
-t Show only the end of the output file ("tail")
-f Show only the end of the file and keep listening ("tail -f")
-f Show only the last lines and keep listening ("tail -f")
+0f Show all of the file and keep listening ("tail +0f")
-# Show only # lines of output ("tail -")
-e Show the stderr file of the job
-o Show the stdout file of the job (default)
-? Display this help message
login-1:~: qpeek 4209
...this is sample output from job 4209...
login-1:~: qpeek -e 4209
...this is sample error messages from job 4209...
login-1:~: qpeek -f 4209
...this is sample output from job 4209, the command will not exit, and
will continue to show output as it is generated...
To cancel your job before it completes, use the canceljob
command.
login-1:~: canceljob 7274
job '7274' cancelled