Python
Contents
- Summary and Version Information
- Installing modules
- Assorted Tips and Tricks
- Numba and GPU Support
- Using python with MPI
Summary and Version Information
Package | Python |
---|---|
Description | Python |
Categories | Progamming/Development |
Version | Module tag | Availability* | GPU Ready |
Notes |
---|---|---|---|---|
2.4.1 | python/2.4.1 | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes |
N | |
2.7.3 | python/2.7.3 | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes |
N | Deprecated. Use 2.7.8 instead |
2.7.8 | python/2.7.8 | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes |
N | Python/packages built using gcc/4.6.1 openmpi/1.6.5 |
3.2.3 | python/3.2.3 | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes |
N | Python/packages built using gcc/4.6.1 openmpi/1.6.5 |
3.5.1 | python/3.5.1 | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6 |
Y | Python/packages built using gcc/4.9.3 openmpi/1.8.6 Has numba |
3.7.3 | python/3.7.3 | Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6 |
N |
Notes:
*: Packages labelled as "available" on an HPC cluster means
that it can be used on the compute nodes of that cluster. Even software
not listed as available on an HPC cluster is generally available on the
login nodes of the cluster (assuming it is available for the
appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters).
This is due to the fact that the compute nodes do not use AFS and so have
copies of the AFS software tree, and so we only install packages as requested.
Contact us if you need a version
listed as not available on one of the clusters.
In general, you need to prepare your Unix environment to be able to use this software. To do this, either:
tap TAPFOO
module load MODFOO
where TAPFOO and MODFOO are one of the tags in the tap
and module columns above, respectively. The tap
command will
print a short usage text (use -q
to supress this, this is needed
in startup dot files); you can get a similar text with
module help MODFOO
. For more information on
the tap and module commands.
For packages which are libraries which other codes get built against, see the section on compiling codes for more help.
Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.
In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.
When using in conjunction with your own code, you might wish to
note the compiler and MPI libraries used when the python binaries and
packages were built. MPI in particular can be fussy and generate strange
errors if the different parts of the code are linked against different
MPI libraries (even different versions of OpenMPI or the same version of
OpenMPI built with a different compiler), or if the mpirun
command used to start the code is from a different MPI version or was built
with a different compiler. In general, it is best to ensure everything is
built with the same compiler and, if used, the same MPI library.
Installing modules
Python's capabilities can be significantly enhanced through the addition
of modules. Code can import
a module to enable its functionality.
The supported python interpretters on the system have a selection of modules preinstalled. If a module you are interested in is not in that list, you can either install a personal copy of the module for yourself, or request that it be installed site wide. We will make reasonable efforts to accomodate such requests as staffing resources allow.
Installing modules yourself
The mechanism for installing a module is of course dependent on the module
being installed, but most modern python modules support the
setup.py
mechanism described below. Assuming that is the case,
the standard procedure for installing your own copy of a module is:
module load python/X.Y.Z
to select the version of python you wish to use.- Create a directory to contain your python module, if not already done.
Typically, you will want one directory to house all of the modules you are
installing, so something like
mkdir ~/.mypython
will work. You should also createlib
andlib/python
directories beneath it, e.g.mkdir ~/.mypython/lib ~/.mypython/lib/python
. - You will need to tell python where to look for your modules. Assuming
you are putting your modules under
~/.mypython
, something likesetenv PYTHONPATH ~/.mypython/lib/python
(bash/bourne shell users should doPYTHONPATH=~/.mypython/lib/python; export PYTHONPATH
). You probably want to add this to your.cshrc.mine
or.bashrc.mine
. - Download and unzip/untar/etc the module sources. Cd to the main module
source directory (it should contain the file
setup.py
- Run
python setup.py install --home ~/.mypython
If all goes well, the module should now be installed under
~/.mypython
or wherever you specified. If there are executables
associated with it, they should be in ~/.mypython/bin
. You
should be able to import the module in python now (this assumes that
PYTHONPATH is set as indicated above).
Of course, not all modules install easily. Unfortunately, the install process can fail in far too many ways than can be reasonably enumerated. If you are comfortable with building modules, you might find reasonable guidance from error messages to assist you in getting the module to build, but it is probably easiest to just request the module be installed to the system libraries.
Installing modules yourself: virtualenv and pip method
Although the standard procedure described above works for most cases, there
are cases where more separation is required. The virtualenv
scheme allows you to create a fully independent virtual python environment,
copying the python executables and standard and system libraries to your own
directory, and allowing you to add/update/delete from there. This has the
advantage that the virtualenv is almost completely isolated; so changes made
in the system installation of python are unlikely to impact your virtualenv.
This can be important if you have a code or application which requires e.g.
version 1.6 of the foo package, but will break if it is upgraded to 1.7 (it
appears that when using standard scheme above using PYTHONPATH, the system
library directories are ALWAYS searched before PYTHONPATH, meaning that method
can be used to add modules, but not to upgrade or downgrade modules).
However, the virtualenv takes up a significant amount of diskspace, and the isolation from the system python can be a negative as well as upgrades and/or new modules added to the system python will NOT be visible --- this is good when as in the example above it breaks something, but most of the time the upgrades are desirable.
The choice of which mechanism to use is up to the user. Note, however, that
the virtualenv
mechanism is a recent addition on Glue systems,
and so might have some issues.
To install a package with the virtualenv
mechanism, you
must first create a virtual python environment.
module load python/X.Y.Z
to select the version of python you wish to use in this virtual environment.- You should select a directory where the virtual python environment should
live. Each virtual environment you create will be a subdirectory of this
directory. Create the directory if needed, and cd to it. E.g.
mkdir ~/.python-venvs; cd ~/.python-venvs
- Important: The module load command sets the PYTHONHOME environmental
variable; this is incompatible with the
virtualenv
mechanism. So you must delete it. I.e.unsetenv PYTHONHOME
for cshell types, andunset PYTHONHOME
for bourne shell flavors. virtualenv NewEnvName
OR
virtualenv --system-site-packages NewEnvName
The latter variant gives your virtual environment access to the system installed packages like numpy; this is probably useful in most cases, but is optional (and you may need to omit it if there are conflicts).
In order to use this virtual python environment, you must activate. You need to activate the environment before installing modules to the virtual python environment, as well as before running python to take advantage of the modules you installed, etc. To activate the environment, you must:
unsetenv PYTHONHOME
(orunset PYTHONHOME
if you are using the Bourne shell or bash). This is IMPORTANT because otherwise the virtual python environment will not see the libraries for the virtual environment.- You must then either
source ~/.python-venvs/NewEnvName/bin/activate.csh
(for c-style shell users) or. ~/.python-venvs/NewEnvName/bin/activate
(for Bourne shell/bash users).
Once the virtual environment is created and activated, installation is
usually relatively simple using the pip
command. You should
just be able to do pip install NameOfPackage
. Pip
should take care of downloading the package and installing it for you.
Of course, not all modules install easily. Unfortunately, the install process can fail in far too many ways than can be reasonably enumerated. If you are comfortable with building modules, you might find reasonable guidance from error messages to assist you in getting the module to build, but it is probably easiest to just request the module be installed to the system libraries.
Assorted Tips and Tricks
Matplotlib Tricks
- Using matplotlib in batch jobs/without an X server:
By default, the matplotlib package in Python expects to work with a
graphical user interface (GUI), which on Unix-like systems means an X server
running. This can be problematic if one wishes to use matplotlib in batch
jobs (e.g. on an HPC cluster) because typically a display will not be available.
The easiest way to do this is to specify a non-interactive backend. There
are several ways to do this, but since you probably want to continue using
an interactive backend when using python interactively, the best approach is
to have your batch code select a non-interactive backend.
A common choice for such is
Agg
(for Anti-Grain Geometry engine) which can producePNG
files,Cairo
andGdk
are other options. Use would be something like:import matplotlib # This needs to be done *before* importing pyplot or pylab matplotlib.use('Agg') import matplotlib.pyplot as plt #Do your plotting, e.g. fig = plt.figure() ax = fig.add_subplot(111) ax.plot(range(10)) fig.savefig('test.png')
Numba and GPU Support
The most recent versions of Python installed (e.g. 3.5.1) provide a python module called "numba". Numba allows for certain portions of python code to be compiled to a lower-level machine code to improve performance, in many cases simply by adding the directive "@jit" before the function to compile. Depending on the function, one might achieve order of magnitude sized performance gains. E.g. (example taken from wikipedia)
from numba import jit
@jit
def sum1d(my_array):
total = 0.0
for i in range(my_array.shape[0])
total += my_array[i]
return total
Here, the addition of the "@jit" (for just-in-time compilation) can result in code running 100-200 times faster than the original on a long Numpy array, and up to 30% faster than Numpy's builtin "sum()" function, on standard CPU cores.
Some codes can perform even better on GPUs, and Numba can make this fairly simple by importing "cuda" from numba and using "cuda.jit" in place of "jit". There are constraints imposed when using GPUs, so not every code can be easily converted for GPU use.
To use Numba with GPUs on the Deepthought clusters, you will need to
- Request a GPU-enabled node
- Load an appropriate version of CUDA. Currently, cuda/7.0.28 or cuda/7.5.18 will work with Numba.
The details of using Numba, and especially using Numba with CUDA, is well beyond the scope of this document. Some useful links for more information are:
- SciPy 2016 Tutorial (video) (Git repo of files for the tutorial)
- SciPy 2015 Tutorial (video)
- Intel HPC Tutorial
- Official Numba introduction
- Numba for CUDA GPUs
- NYU's Intro to Numba including a section on CUDA programming with Numba
Using python with MPI
If you wish to take advantage of the multiple cores and even many nodes available on High Performance Computing (HPC) clusters, it is useful to use the Message Passing Interface (MPI) for coordinate and communicate among the various processes, a standard and ubiquitous programming methodology for distributed memory parallelism.
There is a package mpi4py
available on all Pythons installed
system-wide on the Deepthought clusters which basically makes the various
MPI calls available to python code. Because mpi4py basically mimics the
function calls in the standard MPI library/API, it makes the task of
transcribing algorithms from python to/from C much easier.
When you have python code (e.g. my-mpi4py-script.py
)
designed to use MPI via mpi4py, you will normally
wish to execute the python code using the mpirun
command.
It is important that you use the mpirun
command from the SAME
MPI library as was used to build mpi4py
for the python version
you are running --- typically this will mean using module load
to load the correct gcc compiler and openmpi version as used in building
the python interpretter and modules, as listed in the
version information table at the top of this
document. E.g., a job submission script to launch
my-mpi4py-script.py
on 40 cores using python/3.5.1 might look
like:
#!/bin/bash
#Assume will be finished in no more than 8 hours
#SBATCH -t 8:00:00
#Launch on 40 cores distributed over as many nodes as needed
#SBATCH -n 40
#Assume need 6 GB/core (6144 MB/core)
#SBATCH --mem-per-core=6144
#Make sure module cmd gets defined
. ~/.profile
#Load required modules
module load python/3.5.1
#Load correct gcc (4.9.3) and mpi (openmpi/1.8.6) for python/3.5.1
module load gcc/4.9.3
module load openmpi/1.8.6
#Normally do not need to give -n 40, as openmpi will determine from Slurm
#environment variables
mpirun mp-mpi4py-script.py
Although exploring mpi4py is beyond the scope of this document, we do provide some on-line tutorials, etc., to help if you wish to explore mpi4py further:
- TACC HPC Python Tutorial
- MPI4PY Documentation
- HowToForge tutorial on MPI4PY
- mpi4py example/quick start