Chapter09 Testing Matrix Tox

If you find this content useful, consider buying this book:

If you enjoyed this book considering buying a copy

Chapter 9: Testing matrix with Tox #

Alfredo Deza

Tox is a project that comes from the same creators of the Pytest framework. Its first release was announced a few years ago by Holger Krekel. My first reaction was that I hoped the tool would create different Python installations. It was back when most Python software was developed for 2.5 and 2.6, while 2.7 was in the near horizon.

I didn’t understand why it was so useful and didn’t quite grasp what it was trying to solve. The language differences in the Python 2.x series wasn’t drastic, and Python 3 was not a reality. And then, one day, I was releasing a new version of a library I was maintaining and released it on the Python Package Index (PyPI): the library was uninstallable. As soon as someone tried to install it, the tool raised an exception from its file, making it unable to complete the installation.

The file imported the library itself to determine the version. By importing the library before its installation, it meant that the dependencies for the library were not installed yet, which broke the tool. “How is this possible? I have good coverage! Why didn’t my tests catch this problem?" I thought to myself.

This is how the file was setting the version:

from setuptools import setup, find_packages

import remoto

version = remoto.__version__

    name = 'remoto',
    description = 'Execute remote commands or processes.',
    packages = find_packages(),
    author = 'Alfredo Deza',
    author_email = '',
    version = version,
    url = '',
    keywords = "remote, commands, unix, ssh, socket, execute, terminal",

The issue went undetected because the workflow for running the tests involved installing the library itself. The steps are similar to:

  1. Create a virtual environment with virtualenv and activate it
  2. Run python develop
  3. Run the tests which import the library components

The catastrophic failure went undetected because part of the development process involved installing the project at the beginning of development, and the tests don’t re-install the library on each run. Looking back, this can be prevented with some continuous integration. Still, I didn’t have a server to run Jenkins, and this was back when online platforms offering continuous integration or continuous delivery weren’t that common.

This is part of the problem that the Tox project solved for me: by default, it creates a brand new virtual environment where the project gets installed, and then the tests run. What a great concept, which builds on two pillars of robustness: verify that software can work on isolation (virtualenvs are exclusive), and results are repetitive and verifiable.

Just with that feature, the tox project is tremendously useful. I don’t think I’ve ever developed a project without it, and I don’t release without running tox before. But the tool, which can also be thought of as a framework, offers much more than creating a virtual environment to install software and run tests.

If you are curious, to improve the robustness of the library, and get the version without importing the module, the file got updated to parse the version string with a regular expression:

import re

module_file = open("remoto/").read()
metadata = dict(re.findall(r"__([a-z]+)__\s*=\s*['\"]([^'\"]*)['\"]", module_file))

from setuptools import setup, find_packages

    name = 'remoto',
    description = 'Execute remote commands or processes.',
    packages = find_packages(),
    author = 'Alfredo Deza',
    author_email = '',
    version = metadata['version'],
    url = '',
    keywords = "remote, commands, unix, ssh, socket, execute, terminal",

The regular expression relies on the fact that the version format will not change (since I am in control of it) and looks for any variable in remoto/ that exists with a double underscore, creating a dictionary mapping. The setup() function uses this mapping to set the version:

version = metadata['version']

By not having an import statement, it avoids any requirements that exist today or may exist in the future of the library, preventing a problem when installing. And by testing its installation process with tox, the release workflow increases its robustness.

Testing different Python versions #

After the initial release of tox, Python 3 started to become a critical version to support, and being able to test backwards compatibility in projects was crucial. Although some projects decide to have separate releases for Python 2 and 3, a lot of other projects went with a single code base to support both major versions at the same time. Support both soon became what all projects did, and there was no easier way to test every single version than to do it with tox.

The project consumes a configuration file that uses an INI-style to define what it needs to do. In its most simple definition, this is how it looks for a small Python project that wants to support Python versions 2.7, 3.5, and 3.6:

envlist = py27, py35, py36

The [tox] section defines configurations and settings for Tox itself, and for the first example, there is no need to define anything else. On the command-line, if running on the same directory where the tox.ini file is defined, you can list what environments are possible:

$ tox -l

Defining the Python versions in envlist and then listing them in the terminal is fine, but doesn’t do much. A testenv section has to exist to define the steps to run validation:

deps =
commands = pytest -v remoto/tests

The [testenv] section configures the steps necessary for installing test dependencies via the deps variable and running the tests with the commands variable, whatever those tests may be. These dependencies and steps to run tests are repeated for each version defined in envlist. You might not see it yet, but this is effectively a test matrix, although it is a flat one at the moment: it tests one version of Python after the other until it completes.

Now that most of the Python ecosystem lives in Python 3 land, you might think testing multiple versions is that important. Although the differences between, say 3.5 and 3.6 aren’t that big, you would be surprised as to what things you can uncover by exercising the code under different versions. I tend to think that tox makes it too easy to test, and it is worthwhile to include other versions.

Recently, while working in production code, there was a dictionary used to collect some metadata, then it would go through a loop to retrieve stored values. The code relied on a little unknown fact of Python dictionaries: until recently, the order of insertion was kept so it was common to assume this would always be the case. It was the case in Python 2, and on the first few versions of Python 3, it suddenly wasn’t. To add to this confusion, the language reverted this randomness in early versions of Python 3 (up until 3.5) and introduced order-keeping dictionaries from 3.6 onwards.

As a Python developer working with a large codebase, you might not be aware of every corner that might hit issues like this. So even if you do not think that testing against Python 2 versions is useful for your application, testing between Python 3 versions has value.

Take a close look at this code snippet:

data = {'alfredo': 1, 'noah': 5, 'pytest': 2, 'tox': 3}


Save the file as and create a new file ( to run it with different versions. I happen to have different Python versions installed locally:


set -x


Running the bash script demonstrates the behavior with dictionaries: from ordered, to randomized ordering, to ordered again:

$ bash
+ python2.7
[('alfredo', 1), ('tox', 3), ('pytest', 2), ('noah', 5)]
+ python3.5
[('noah', 5), ('alfredo', 1), ('tox', 3), ('pytest', 2)]
+ python3.6
[('alfredo', 1), ('noah', 5), ('pytest', 2), ('tox', 3)]
+ python3.8
[('alfredo', 1), ('noah', 5), ('pytest', 2), ('tox', 3)]

Although Python 3.6 has the ordered dictionary, the language has mandated this behavior as part of the specification starting on 3.7.

Expanding the testing matrix #

So far, the examples have shown a rather flat matrix with testing different Python versions against the same Python project. It isn’t an easy feat to attempt a multidimensional matrix while trying to keep a flat and easy to understand configuration file. The Tox project has found a reasonable middle ground with a way of defining variables that allows expanding a matrix at runtime.

Install the tox package and create a new configuration file, by default, this has to be named tox.ini so the tool can find it by the naming convention. You can always rename it later.

This is the tox.ini for the remoto project. It tests a few different (and some dated) Python versions:

envlist =  py26, py27, py33, py36

deps =
commands = py.test -v remoto/tests

The testenv section explains what tox should install for testing purposes and what commands it needs to run. In this case, it is a single one: it just runs pytest.

Install the tox package and create a new configuration file that looks the same as the remoto project. We need to make it multidimensional and support a few use cases. The idea is that I don’t want to support Python 2.6 or 2.7 any longer, so I need a specific release version and branch associated with that, and then continue releasing for newer Python versions.

The latest release is 1.1.4, so a new branch is created called 1.1, and the master branch will now be bumped to a major version. I am calling that a 2.x. I want to test the branch for version 1 with Python 2 and the rest with Python 3. The configuration file needs to have some variables, these are called factors by Tox, and curly brackets are used to create them. Modify the envlist so it looks like this:

envlist = py{27,35,36,37,38}-release_{1.1, 2.x}

Now run tox -l, which will list every combination possible for testing:

$ tox -l

5 different Python versions and two different remoto branches create 10 combinations possible, and these are all created automatically by tox. The initial idea here was to test specific Python versions individually with each branch and then the newer (Python 3) versions for the 2.x series, not test everything against everything. There is an essential aspect of these configurations that needs a thorough understanding to move forward: dashes separate variables, and curly brackets define multiple combinations of those variables.

The last line in the previous tox output showed this:


There are two factors (variables) there: py38 and release_2.0. These can be used later to define a specific configuration that pertains just to that environment alone. A common mishap is to forget that dashes separate variables, which may cause confusion later on. If I want to configure something that relates to release_2.0, but the lines were defined as py38-release-2.0 then there would be three variables: py38, release, and 2.0.

To improve the configuration and test Python 2 against the 1.1 branch, the envlist needs to change:

envlist = py27-release_1.1, py{35,36,37,38}-release_2.x

Run tox again, and the matrix should be reduced:

$ tox -l

Testing 2.7 with the 1.1 branch and then all the newer Python versions against the 2.x branch is exactly what I was looking for.

Understanding variables better #

After years of using Tox in many projects, I found myself confused sometimes about variables. In the previous section, the tox.ini file was improved to support a big matrix, and curly brackets helped here. These innovations didn’t come to the framework until later, and it took me a while to get used to them. Let’s reuse the same example to understand further what is going on:

envlist = py27-release_1.1, py{35,36,37,38}-release_2.x

That format allows an abbreviated way of describing each environment, and it is equal to the following (expanded) form:

envlist =
    py27-release_1.1, py35-release_2.x, py36-release_2.x,
    py37-release_2.x py38-release_2.x

The abbreviated form is great when mixing at least two dimensions, and it truly shines when there is a third (or more!) variable in the mix.

Using factors #

As I’ve shown, factors are an essential way to describe the type of environment a test is going to run in. But these factors haven’t seen anything remarkable as to why or when they could be useful. They are incredibly handy when a particular action or configuration is specific to that factor. This is a widespread problem to solve when testing different environments in a matrix.

One of the common issues for trying to maintain older versions of Python in libraries is that other libraries are stopping support and maintenance. In the case of Pytest, for example, the support for Python version 2.7 has stopped in version 4.6. This means that if trying to test a project with Pytest on Python 2.7, it must install a specific version and not just the latest.

If a project is testing in many different Python versions, factors can solve this. Once again, reusing the tox.ini in this chapter, we have the following environment list:

envlist = py27-release_1.1, py{35,36,37,38}-release_2.x

Remember that every dash separates a factor, and Pytest needs a specific version installed for the Python 2.7 version, which is associated with the py27 factor. In the [testenv] section, the dependency list needs to be updated to install the required Pytedt version but only for the py27 factor:

deps =
  py27: pytest<4.7
  py35,py36,py37,py38: pytest==5.3.4

Now on py27 the Pytest version will always be less than 4.7. I do it this way so that if the Pytest team releases newer minor versions I can install them without having to be too specific. The Pytest project said that 4.6 is the last version supporting 2.7, but they often release minor bug fixes which are useful to consume.

Since the py27 environment defines a specific Pytest version, the other environments need to be updated to define what version to install. The configuration allows grouping them, separated by a comma, to define this.

Another way to set these different behaviors is by leaving the [testenv] section with the defaults for every environment, and add a separate one for the environments that need modification. The example would then change the testenv section to this:

deps =

deps =

Linting and other validations #

There are two non-testing things I usually include in a tox.ini file: documentation and linting. These examples should open the door for many other execution environments that aren’t necessarily tied to Python or unit testing at all.

Although I have a preference for the Flake8 linter, these examples can be applied to any. I don’t want to lint the project under every Python version, so I add a new environment to the envlist called flake8:

envlist = py27-release_1.1, py{35,36,37,38}-release_2.x, flake8

Next, I add a new section to the tox.ini file which defines its own set of dependencies and commands to run:

commands=flake8 --select=F,E9

This allows the tox run to selectively use the flake8 environment only for linting:

$ tox -e flae8
tox -e flake8
GLOB sdist-make:
flake8 inst-nodeps:
flake8 installed:
flake8 run-test-pre: PYTHONHASHSEED='1857083419'
flake8 runtests: commands[0] | flake8 --select=F,E9 remoto
___________________________________ summary ____________________________________
  flake8: commands succeeded
  congratulations :)

Similarly, the documentation can be built by expanding the configuration file to add another new section. The documentation needs to be built with a few more specifics because the executing directory needs to change, and some paths need to be specified. This is how I use Tox to build the documentation using Sphinx:

    sphinx-build -W -b html -d {envtmpdir}/doctrees .  {envtmpdir}/html

There are a few elements that are new that control the execution environment in this example. Just like the flake8 environment, the section follows the pattern of adding the new name after testenv:. Then the changedir directive is added, which points to what looks like a path (docs/source in this case). That means that before running any command, the framework changes its working directory to docs/source to run.

Specifying a relative path works well because wherever the tox.ini file lives is where the working directory will be, and docs/source happens to be the path relative to the tox.ini where Sphinx needs to run.

Next, the dependencies are installed, and in this case, it is just sphinx that needs to be available so that the documentation builds.

Finally, the one command that runs is sphinx-build which attempts to build the documentation for the project, using a few flags to control its execution. The -W flag for example, treats warnings as errors, so that it can return a non-zero exit status if there is any warning - this is crucial for continuous integration builds. The documentation produces html and the {envtmpdir} variable is used which points to the root of the virtual environment where the project has been installed.

Just like building documentation and linting has been shown in this section, you can extend Tox to run lots of other tests or validations. Linting and building documentation are barely scratching the surface. Just recently, I was involved in creating a Tox workflow for testing a multi-service application using containers. All possible with a few tweaks to an existing tox.ini.

I hope that you find using Tox as powerful and crucial to a project as I do, and manage to extend it beyond its seemingly Python-only approach.