Python

Summary and Version Information
Installing modules
Assorted Tips and Tricks
1. Matplotlib Tricks
Numba and GPU Support
Using python with MPI

Summary and Version Information

Package	Python
Description	Python
Categories	Progamming/Development

Version	Module tag	Availability^*	GPU Ready	Notes
2.4.1	python/2.4.1	Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes	N
2.7.3	python/2.7.3	Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes	N	Deprecated. Use 2.7.8 instead
2.7.8	python/2.7.8	Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes	N	Python/packages built using gcc/4.6.1 openmpi/1.6.5
3.2.3	python/3.2.3	Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC All OSes	N	Python/packages built using gcc/4.6.1 openmpi/1.6.5
3.5.1	python/3.5.1	Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6	Y	Python/packages built using gcc/4.9.3 openmpi/1.8.6 Has numba
3.7.3	python/3.7.3	Non-HPC Glue systems Deepthought HPCC Deepthought2 HPCC RedHat6	N

Notes:
^*: Packages labelled as "available" on an HPC cluster means that it can be used on the compute nodes of that cluster. Even software not listed as available on an HPC cluster is generally available on the login nodes of the cluster (assuming it is available for the appropriate OS version; e.g. RedHat Linux 6 for the two Deepthought clusters). This is due to the fact that the compute nodes do not use AFS and so have copies of the AFS software tree, and so we only install packages as requested. Contact us if you need a version listed as not available on one of the clusters.

In general, you need to prepare your Unix environment to be able to use this software. To do this, either:

tap TAPFOO

module load MODFOO

where TAPFOO and MODFOO are one of the tags in the tap and module columns above, respectively. The tap command will print a short usage text (use -q to supress this, this is needed in startup dot files); you can get a similar text with module help MODFOO. For more information on the tap and module commands.

For packages which are libraries which other codes get built against, see the section on compiling codes for more help.

Tap/module commands listed with a version of current will set up for what we considered the most current stable and tested version of the package installed on the system. The exact version is subject to change with little if any notice, and might be platform dependent. Versions labelled new would represent a newer version of the package which is still being tested by users; if stability is not a primary concern you are encouraged to use it. Those with versions listed as old set up for an older version of the package; you should only use this if the newer versions are causing issues. Old versions may be dropped after a while. Again, the exact versions are subject to change with little if any notice.

In general, you can abbreviate the module tags. If no version is given, the default current version is used. For packages with compiler/MPI/etc dependencies, if a compiler module or MPI library was previously loaded, it will try to load the correct build of the package for those packages. If you specify the compiler/MPI dependency, it will attempt to load the compiler/MPI library for you if needed.

When using in conjunction with your own code, you might wish to note the compiler and MPI libraries used when the python binaries and packages were built. MPI in particular can be fussy and generate strange errors if the different parts of the code are linked against different MPI libraries (even different versions of OpenMPI or the same version of OpenMPI built with a different compiler), or if the mpirun command used to start the code is from a different MPI version or was built with a different compiler. In general, it is best to ensure everything is built with the same compiler and, if used, the same MPI library.

Installing modules

Python's capabilities can be significantly enhanced through the addition of modules. Code can import a module to enable its functionality.

The supported python interpretters on the system have a selection of modules preinstalled. If a module you are interested in is not in that list, you can either install a personal copy of the module for yourself, or request that it be installed site wide. We will make reasonable efforts to accomodate such requests as staffing resources allow.

Installing modules yourself

The mechanism for installing a module is of course dependent on the module being installed, but most modern python modules support the setup.py mechanism described below. Assuming that is the case, the standard procedure for installing your own copy of a module is:

module load python/X.Y.Z to select the version of python you wish to use.
Create a directory to contain your python module, if not already done. Typically, you will want one directory to house all of the modules you are installing, so something like mkdir ~/.mypython will work. You should also create lib and lib/python directories beneath it, e.g. mkdir ~/.mypython/lib ~/.mypython/lib/python.
You will need to tell python where to look for your modules. Assuming you are putting your modules under ~/.mypython, something like setenv PYTHONPATH ~/.mypython/lib/python (bash/bourne shell users should do PYTHONPATH=~/.mypython/lib/python; export PYTHONPATH). You probably want to add this to your .cshrc.mine or .bashrc.mine.
Download and unzip/untar/etc the module sources. Cd to the main module source directory (it should contain the file setup.py
Run python setup.py install --home ~/.mypython

If all goes well, the module should now be installed under ~/.mypython or wherever you specified. If there are executables associated with it, they should be in ~/.mypython/bin. You should be able to import the module in python now (this assumes that PYTHONPATH is set as indicated above).

Of course, not all modules install easily. Unfortunately, the install process can fail in far too many ways than can be reasonably enumerated. If you are comfortable with building modules, you might find reasonable guidance from error messages to assist you in getting the module to build, but it is probably easiest to just request the module be installed to the system libraries.

Installing modules yourself: virtualenv and pip method

Although the standard procedure described above works for most cases, there are cases where more separation is required. The virtualenv scheme allows you to create a fully independent virtual python environment, copying the python executables and standard and system libraries to your own directory, and allowing you to add/update/delete from there. This has the advantage that the virtualenv is almost completely isolated; so changes made in the system installation of python are unlikely to impact your virtualenv. This can be important if you have a code or application which requires e.g. version 1.6 of the foo package, but will break if it is upgraded to 1.7 (it appears that when using standard scheme above using PYTHONPATH, the system library directories are ALWAYS searched before PYTHONPATH, meaning that method can be used to add modules, but not to upgrade or downgrade modules).

However, the virtualenv takes up a significant amount of diskspace, and the isolation from the system python can be a negative as well as upgrades and/or new modules added to the system python will NOT be visible --- this is good when as in the example above it breaks something, but most of the time the upgrades are desirable.

The choice of which mechanism to use is up to the user. Note, however, that the virtualenv mechanism is a recent addition on Glue systems, and so might have some issues.

To install a package with the virtualenv mechanism, you must first create a virtual python environment.

module load python/X.Y.Z to select the version of python you wish to use in this virtual environment.
You should select a directory where the virtual python environment should live. Each virtual environment you create will be a subdirectory of this directory. Create the directory if needed, and cd to it. E.g. mkdir ~/.python-venvs; cd ~/.python-venvs
Important: The module load command sets the PYTHONHOME environmental variable; this is incompatible with the virtualenv mechanism. So you must delete it. I.e. unsetenv PYTHONHOME for cshell types, and unset PYTHONHOME for bourne shell flavors.
virtualenv NewEnvName
OR
virtualenv --system-site-packages NewEnvName
The latter variant gives your virtual environment access to the system installed packages like numpy; this is probably useful in most cases, but is optional (and you may need to omit it if there are conflicts).

In order to use this virtual python environment, you must activate. You need to activate the environment before installing modules to the virtual python environment, as well as before running python to take advantage of the modules you installed, etc. To activate the environment, you must:

unsetenv PYTHONHOME (or unset PYTHONHOME if you are using the Bourne shell or bash). This is IMPORTANT because otherwise the virtual python environment will not see the libraries for the virtual environment.
You must then either
1. source ~/.python-venvs/NewEnvName/bin/activate.csh (for c-style shell users) or
2. . ~/.python-venvs/NewEnvName/bin/activate (for Bourne shell/bash users).

Once the virtual environment is created and activated, installation is usually relatively simple using the pip command. You should just be able to do pip install NameOfPackage. Pip should take care of downloading the package and installing it for you.

Assorted Tips and Tricks

Matplotlib Tricks

Using matplotlib in batch jobs/without an X server: By default, the matplotlib package in Python expects to work with a graphical user interface (GUI), which on Unix-like systems means an X server running. This can be problematic if one wishes to use matplotlib in batch jobs (e.g. on an HPC cluster) because typically a display will not be available. The easiest way to do this is to specify a non-interactive backend. There are several ways to do this, but since you probably want to continue using an interactive backend when using python interactively, the best approach is to have your batch code select a non-interactive backend. A common choice for such is Agg (for Anti-Grain Geometry engine) which can produce PNG files, Cairo and Gdk are other options. Use would be something like:
```
	import matplotlib
	# This needs to be done *before* importing pyplot or pylab
	matplotlib.use('Agg')
	import matplotlib.pyplot as plt

	#Do your plotting, e.g.
	fig = plt.figure()
	ax = fig.add_subplot(111)
	ax.plot(range(10))
	fig.savefig('test.png')
```
For more information, see: Matplotlib Documentation on running without a GUI

Numba and GPU Support

The most recent versions of Python installed (e.g. 3.5.1) provide a python module called "numba". Numba allows for certain portions of python code to be compiled to a lower-level machine code to improve performance, in many cases simply by adding the directive "@jit" before the function to compile. Depending on the function, one might achieve order of magnitude sized performance gains. E.g. (example taken from wikipedia)

from numba import jit
@jit
def sum1d(my_array):
	total = 0.0
	for i in range(my_array.shape[0])
		total += my_array[i]
	return total

Here, the addition of the "@jit" (for just-in-time compilation) can result in code running 100-200 times faster than the original on a long Numpy array, and up to 30% faster than Numpy's builtin "sum()" function, on standard CPU cores.

Some codes can perform even better on GPUs, and Numba can make this fairly simple by importing "cuda" from numba and using "cuda.jit" in place of "jit". There are constraints imposed when using GPUs, so not every code can be easily converted for GPU use.

To use Numba with GPUs on the Deepthought clusters, you will need to

Request a GPU-enabled node
Load an appropriate version of CUDA. Currently, cuda/7.0.28 or cuda/7.5.18 will work with Numba.

The details of using Numba, and especially using Numba with CUDA, is well beyond the scope of this document. Some useful links for more information are:

SciPy 2016 Tutorial (video) (Git repo of files for the tutorial)
SciPy 2015 Tutorial (video)
Intel HPC Tutorial
Official Numba introduction
Numba for CUDA GPUs
NYU's Intro to Numba including a section on CUDA programming with Numba

Using python with MPI

If you wish to take advantage of the multiple cores and even many nodes available on High Performance Computing (HPC) clusters, it is useful to use the Message Passing Interface (MPI) for coordinate and communicate among the various processes, a standard and ubiquitous programming methodology for distributed memory parallelism.

There is a package mpi4py available on all Pythons installed system-wide on the Deepthought clusters which basically makes the various MPI calls available to python code. Because mpi4py basically mimics the function calls in the standard MPI library/API, it makes the task of transcribing algorithms from python to/from C much easier.

When you have python code (e.g. my-mpi4py-script.py) designed to use MPI via mpi4py, you will normally wish to execute the python code using the mpirun command. It is important that you use the mpirun command from the SAME MPI library as was used to build mpi4py for the python version you are running --- typically this will mean using module load to load the correct gcc compiler and openmpi version as used in building the python interpretter and modules, as listed in the version information table at the top of this document. E.g., a job submission script to launch my-mpi4py-script.py on 40 cores using python/3.5.1 might look like:

#!/bin/bash
#Assume will be finished in no more than 8 hours
#SBATCH -t 8:00:00
#Launch on 40 cores distributed over as many nodes as needed
#SBATCH -n 40
#Assume need 6 GB/core (6144 MB/core)
#SBATCH --mem-per-core=6144

#Make sure module cmd gets defined
. ~/.profile

#Load required modules
module load python/3.5.1
#Load correct gcc (4.9.3) and mpi (openmpi/1.8.6) for python/3.5.1
module load gcc/4.9.3
module load openmpi/1.8.6

#Normally do not need to give -n 40, as openmpi will determine from Slurm
#environment variables
mpirun mp-mpi4py-script.py

Although exploring mpi4py is beyond the scope of this document, we do provide some on-line tutorials, etc., to help if you wish to explore mpi4py further: