IT-Service: Using Python with Linux

Quick guide

The programming language Python is installed on all Linux machines at GSI/FAIR. It can be used by anyone who has access to such a machine, which only requires a Linux account.

Working with Python oftentimes requires installing third-party code called modules. GSI/FAIR takes IT security very seriously and therefore does not install every code onto its farm machines, but his page gives you methods to satisfy the needs of your program.

Service description

If you require Python modules on farm machines which are not yet available there are four possible ways to get this fixed. See the sections below for each method.

  1. Check if the module has been packaged by Debian and if it is available for the current release.
  2. Install it in your home directory and use the current interpreter from Debian.
  3. Use virtual environments to install your modules independently from the ones already available on your Linux installation.
  4. Use a complete Python distribution to manage the used interpreter together with all modules.

Python module packages in Debian are named python-MODULENAME, e.g. python-sphinx. As a general rule the IT department can only provide modules and versions that are available in the official Debian package repositories. If the module you want is available as a Debian package you can check the available version by running apt-cache policy PACKAGENAME.

$ apt-cache policy python-sphinx
python-sphinx:
  Installed: 1.2.3+dfsg-1
  Candidate: 1.2.3+dfsg-1
  Version table:
 *** 1.2.3+dfsg-1 0
        500 http://mirror.gsi.de/distrib/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status

The output of the command states that Debian package version 1.2.3+dfsg-1 of python-sphinx is currently installed which corresponds to upstream version 1.2.3. In the following case Debian package version 0.11-1 is available but not installed; 0.11-1 corresponds to upstream 0.11.

$ apt-cache policy python-sphinx-issuetracker
python-sphinx-issuetracker:
  Installed: (none)
  Candidate: 0.11-1
  Version table:
     0.11-1 0
        500 http://mirror.gsi.de/distrib/debian/ jessie/main amd64 Packages

If you require a python library that is available but not installed, send an e-mail to linux-service @ gsi.de and include the name of the package and your local machine.

If you decide to install python libraries to your home directory you will probably need to extend your python load path. This can be done via the environment variable PYTHONPATH which by default is not set. As long as it is not set, your python interpreter will usually only look for modules in certain system directories. You can find out what these are by running the following command:

$ python -c "import sys; print sys.path"
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
  '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
  '/usr/lib/python2.7/lib-dynload', 'HOMEDIR/.local/lib/python2.7/site-packages',
  '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages',
  '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0',
  '/usr/lib/pymodules/python2.7']

If you install modules to e.g. ~/mypythonmodules and want them to be included in your load path you can do it like this:

$ PYTHONPATH=~/mypythonmodules python -c "import sys; print sys.path"
['', 'HOMEDIR/mypythonmodules', '/usr/lib/python2.7',
  '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk',
  '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload',
  'HOMEDIR/.local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/dist-packages',
  '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL',
  '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7']

To avoid setting PYTHONPATH before every command you can set it in your current shell by running export PYTHONPATH=~/mypythonmodules or set it for every new shell by adding that line to your ~/.profile or ~/.bashrc.

If you install modules in your home directory and forget to set PYTHONPATH you might experience errors like ImportError: No module named mymodule.

Install via pip

The default way to run pip would be to execute pip install MODULENAME however this will probably fail because without further options pip will use system directories. By using the switch --user pip will instead install to ~/.local/lib/pythonX.X/site-packages which is already included in PYTHONPATH. The directory can be changed with the option --install-option=--prefix=DIRECTORY or just -prefix DIRECTORY on newer versions.

$ pip install --prefix=~/python sphinxcontrib-bibtex
Downloading/unpacking sphinxcontrib-bibtex
  Downloading sphinxcontrib-bibtex-0.3.4.tar.gz (50Kb): 50Kb downloaded
  Running setup.py egg_info for package sphinxcontrib-bibtex

...skipped output ...

Requirement already satisfied (use --upgrade to upgrade): PyYAML >=3.01 in /usr/lib/python2.7/dist-packages (from pybtex>=0.17->sphinxcontrib-bibtex)

...skipped output ...

    Skipping installation of HOMDEDIR/python/lib/python2.7/site-packages/sphinxcontrib/__init__.py (namespace package)

...skipped output ...

Successfully installed sphinxcontrib-bibtex latexcodec pybtex pybtex-docutils six oset
Cleaning up...

This example demonstrates a few things:

  • pip will start downloading the package and then check its dependencies
  • If pip recognized that some dependencies are already installed (in the example PyYAML) it will skip installation
  • pip will not install __init__.py files for so called namespace packages. This will cause import sphinxcontrib.bibtex to fail, since sphinxcontrib does not have a __init__.py file and will therefore not be recognized as a module
  • ~/python/lib/python2.7/site-packages is the directory you will need to add to your PYTHONPATH

After installation you will need to do the following steps:

  • Set PYTHONPATH
  • Add an empty __init__.py for all namespace packages
  • Optional: If the modules also install scripts to ~/python/bin you might want to extend your PATH
$ export PYTHONPATH=~/python/lib/python2.7/site-packages
$ touch $PYTHONPATH/sphinxcontrib/__init__.py
$ # Optional: set PYTHONPATH in .bashrc
$ echo 'export PYTHONPATH=~/python/lib/python2.7/site-packages' >>~/.bashrc
$ # Optional: set PATH
$ export PATH=~/python/bin:$PATH

The main purpose of Python virtual environments is to create isolated environments for Python projects. This means that each project can have its own dependencies, regardless of what dependencies every other project has.

On GSI Linux desktops the virtualenv command is available for Python version 2 and 3. The following three packages should be installed on your machine:

  • python-virtualenv
  • python3-virtualenv
  • virtualenv

You can use the following commands to install a module (NAME) into a virtual environment.

  1. Create environment directory: mkdir /data.local1/NAME
  2. Create environment: virtualenv --system-site-packages /data.local1/NAME
  3. Activate environment: source /data.local1/NAME/bin/activate (for Bash or Zsh)
    The command has to be called every time you want to use the environment. It should have modified your shell prompt into something like (NAME)$.
  4. Install module: pip install --upgrade NAME
    This can take a while because pip has to download all the dependencies and compile everything.
  5. Leave environment: deactivate

References

When it's not possible to install new version of modules due to an old version of the interpreter, the only remaining possibility is to use a Python distribution like Anaconda. It basically provides an archive distribution comprising of the latest version of Python (both 3.x and 2.x), multiple modules and a package management system called Conda. We highly recommend to use a reduced version of Anaconda, called Miniconda which just include the Python interpreter and the Conda package manager.

Install Miniconda

The following steps can be followed to install Miniconda with Python 3.7 and create your virtual environments with it.

  1. Download the Miniconda installer (for Linux) from here.
  2. Start the installer in your terminal (bash Miniconda3-latest-Linux-x86_64.sh) and accept the license agreement
  3. By default the installer will install Miniconda under $HOME/miniconda3, but you are free to choose another path, for example /data.local if available on your desktop.
    If possible, avoid to install Conda and related modules under your Lustre directory. Depending on the modules needed, the installation will be broken down in hundreds, sometimes thousands of small files and this will greatly hamper the performance of Lustre for all the users.
  4. The installer will then proceed to install additional packages. It will then ask if you want to initialize your environment. If you choose to do so, the installer will add the following lines at the very end of your .bashrc file:
    # >>> conda initialize >>>
    # !! Contents within this block are managed by 'conda init' !!
    __conda_setup="$('/home/myuser/miniconda3/bin/conda' 'shell.bash' 'hook' 2>/dev/null)"
    if [ $? -eq 0 ]; then
      eval "$__conda_setup"
    else
      if [ -f "/home/myuser/miniconda3/etc/profile.d/conda.sh" ]; then
        . "/home/myuser/miniconda3/etc/profile.d/conda.sh"
      else
        export PATH="/home/myuser/miniconda3/bin:$PATH"
      fi
    fi
    unset __conda_setup
    # <<< conda initialize <<<
  5. For the changes to take effect, it is better if you close your terminal window and open a new one. A simple test would be to run the Conda package manager on the command line in the new terminal session (the version may be different for you):
    :~$ conda --version
    conda 4.7.12

Install Python modules

  • List all installed packages in current virtual environment: conda list
  • Install or uninstall modules: conda install MODULENAME, conda uninstall MODULENAME
  • Update and search for a specific module: conda update MODULENAME, conda search MODULENAME

Use environments with Conda

By default, when Miniconda is installed, it will automatically define a Python environment called base, which you should also see added to your terminal prompt in the console window, unless you have customized it in a different way. To avoid polluting your Miniconda environment, you should create a separated one with the following commands. Instead of mytest you are of course free to use whichever name you see fit. Again, the versions reported below may change depending on the version of Conda you are using.

:~$ conda create --name mytest python=3.7
[...]
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/myuser/miniconda3/envs/mytest

  added / updated specs:
    - python=3.7

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.11.27 |                0         124 KB
    certifi-2019.11.28         |           py37_0         153 KB
    ld_impl_linux-64-2.33.1    |       h53a641e_7         568 KB
    openssl-1.1.1d             |       h7b6447c_3         2.5 MB
    pip-19.3.1                 |           py37_0         1.6 MB
    python-3.7.6               |       h0371630_2        44.9 MB
    setuptools-44.0.0          |           py37_0         520 KB
    sqlite-3.30.1              |       h7b6447c_0         1.1 MB
    wheel-0.33.6               |           py37_0          42 KB
    ------------------------------------------------------------
                                           Total:        51.4 MB
                                   
The following NEW packages will be INSTALLED:
                             
  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  ca-certificates    pkgs/main/linux-64::ca-certificates-2019.11.27-0
  certifi            pkgs/main/linux-64::certifi-2019.11.28-py37_0
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
  libedit            pkgs/main/linux-64::libedit-3.1.20181209-hc058e9b_0
  libffi             pkgs/main/linux-64::libffi-3.2.1-hd88cf55_4
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  ncurses            pkgs/main/linux-64::ncurses-6.1-he6710b0_1
  openssl            pkgs/main/linux-64::openssl-1.1.1d-h7b6447c_3
  pip                pkgs/main/linux-64::pip-19.3.1-py37_0
  python             pkgs/main/linux-64::python-3.7.6-h0371630_2
  readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
  setuptools         pkgs/main/linux-64::setuptools-44.0.0-py37_0
  sqlite             pkgs/main/linux-64::sqlite-3.30.1-h7b6447c_0
  tk                 pkgs/main/linux-64::tk-8.6.8-hbc83047_0
  wheel              pkgs/main/linux-64::wheel-0.33.6-py37_0
  xz                 pkgs/main/linux-64::xz-5.2.4-h14c3975_4
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3

Proceed ([y]/n)? y

Downloading and Extracting Packages
[...]

The new environment is now installed but it is not activated yet. The following commands are available:

  • Activate an environment: conda activate mytest.
  • Deactivation and returning to the base environment: conda deactivate
  • Remove an environment and all its installed modules:
    conda remove --name mytest --all
  • Installing and removing modules works with the same Conda commands explained above but they will be confined to the environment.

References

Availability and support


Loading...