Idl
Contents
NOTE: You might also wish to look at the page for the ENVI package.
Summary and Version Information
Package | Idl |
---|---|
Description | IDL interactive data language |
Categories | Progamming/Development, Research |
Version | Module tag | Availability* | GPU Ready |
Notes |
---|---|---|---|---|
8.3 | idl/8.3 | Non-HPC Glue systems Bswift HPCC Linux sun4x_510 |
N |
Notes:
*: Packages labelled as "available" on an HPC cluster means
that it can be used on the compute nodes of that cluster. Even software
not listed as available on an HPC cluster is generally available on the
login nodes of the cluster (assuming it is available for the
appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters).
This is due to the fact that the compute nodes do not use AFS and so have
copies of the AFS software tree, and so we only install packages as requested.
Contact us if you need a version
listed as not available on one of the clusters.
In general, you need to prepare your Unix environment to be able to use this software. To do this, either:
tap TAPFOO
module load MODFOO
where TAPFOO and MODFOO are one of the tags in the tap
and module columns above, respectively. The tap
command will
print a short usage text (use -q
to supress this, this is needed
in startup dot files); you can get a similar text with
module help MODFOO
. For more information on
the tap and module commands.
For packages which are libraries which other codes get built against, see the section on compiling codes for more help.
Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.
In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.
Running IDL in batch mode
Although for short computations on a personal workstation the interactive mode of IDL is nice, sometimes one wishes to have IDL work in a batch-style mode. This is a requirement for using IDL on the HPC clusters, where computationally intensive processes must be submitted to the scheduler for running on the compute nodes.
The easiest method is probably to invoke idl in batch mode on a simple
script file which then uses the .RUN
IDL executive command
to run a main program file containing the real code you wish to run. I.e.,
create a main program file the code you wish to run. For example, the
following is a simple main program to print factorials, which we will place
in a file called factorial_test.pro
:
f = 1
max = 7
for k=1,max do begin
f = k * f
print, k, f
endfor
end
You should also create a simple IDL batch script to invoke this program,
e.g. the batch_test.pro
file below:
.run factorial_test.pro
exit
|
Be sure to include the exit command if you wish for IDL to terminate when
the main program is finished. This is especially important if submitting
to the HPC compute nodes via sbatch, as otherwise your job will not terminate
when the calculation is finished, but wait until it hits the wall time limit,
wasting CPU cycles (and charging your allocation account)
|
You can then invoke your main program from the unix command line with
something like idl batch_test.pro
(NOTE:
you can omit the .pro
extension in the idl
and
.run
commands if so desired; the .pro
extension
will be used by default.) To use with sbatch, a job script like
#!/bin/bash
#SBATCH -ntasks=1
#SBATCH -t 15
#SBATCH --mem-per-cpu=1
. ~/.profile
module load idl
echo "Starting job ..."
idl batch_test
The two separate .pro
files are required in general because
in IDL batch mode, which batch_test.pro
runs under, each line is
read and executed immediately. Control statements, e.g. the for loop in
our factorial_test
example, often span multiple lines ---
although you can use ampersands and dollar signs to continue lines, this
quickly becomes messy. In main program parsing mode, such as used for
factorial_test
, the entire program unit is read and compiled as
a single unit. Since IDL code run in batch mode or submitted to the HPC
compute nodes is assumed to be complicated, it is probably best to use this
two file approach.
Multithreading and IDL
Recent versions of IDL support multi-threading, at least to some degree. This means that on systems with more than one CPU and/or multiple cores per CPU socket, IDL will use multiple threads to do work in parallel when the application determines it is advantageous to do so. This is automatic, and invisible to the user except for hopefully improved performance.
By default, when IDL encounters a calculation that would benefit from multi-threading, it will generate a thread for every CPU core on the system it is running. This is probably the desired behavior when IDL is the only (or the main) program running on a system. E.g., on a desktop or dedicated compute node.
But on some HPC systems, the available cores per node can be somewhat large (e.g. nodes on DT2 have at least 20 cores), and might be more than the optimal degree of parallelization for certain problems. On these systems, for certain problems, you might not wish to allocation all the cores on the node to the system (since you will be charged for the time on those cores). However, if you restrict the number of cores that you are requesting, you must tell this restriction to IDL as well, because otherwise IDL will just try to use all the cores anyway, which could adversely affect performance.
E.g., assume that you have determined through trial and error that for
the particular type of problem you are working on, the greatest efficiency
occurs for 6 cores (e.g., a 6 core job is 40% faster than a 4 core job, but
an 8 core job is only a few percent faster than a 6 core job). So you submit
a bunch of 6 core idl jobs, with the --share
flag so that
you are not charged for all
the cores on the node. And suppose that 3 of your jobs end up on the
same 20 core node (since you told slurm these are 6 core jobs, three will fit
on a 20 core node). If you do not tell IDL to restrict itself to 6 cores,
each jobs will determine that there are 20 cores on the node, and assume it
can use all of them, and multithreaded calculations will be split into 20
tasks. In addition to any performance hit from using too many tasks for each
job, you also have up to 60 tasks trying to run on a 20 core system, which
will further degrade performance.
To tell IDL to use less than all the cores it finds on the system, you
need to set the IDL_CPU_TPOOL_NTHREADS
environmental variable.
By default, it is 0, which means use all cores on the system. You should
set it equal to the number of cores your requested from Slurm; we recommend
that you set it to the SLURM_NTASKS
to ensure consistency between
what you requested from Slurm and what you tell IDL. E.g.,
#!/bin/tcsh
#SBATCH -n 8
#SBATCH --share
#SBATCH -t 2:00
#SBATCH --mem=8000
module load idl
setenv IDL_CPU_TPOOL_NTHREADS $SLURM_NTASKS
idl -e "myprogram.idl"
If you change the number of cores (the value after -n
), the
value of IDL_CPU_TPOOL_NTHREADS
will automatically agree.