If you enjoyed this book considering buying a copy

Chapter 2: Testing Conventions #

Alfredo Deza

There are specific unwritten rules for testing in Python that are followed by most projects and are supported by tooling. Most tooling (like test runners) allows you to configure them to adapt to whatever structure you want, but understanding the conventions makes it easier for the tools and project collaborators. The least amount of surprises, the better!

In this chapter, I go into details on file layouts, directories, and naming conventions. Test runners like pytest benefit from these conventions for automatic discovery, which is how tests are found and later executed.

The first time I found myself writing tests for a small script I created, I wasn’t entirely sure where to place them and if I needed any structure at all. Because I didn’t follow any conventions, tests wouldn’t run unless I explicitly pointed the tool to a file, and even then, some tests were skipped. It was a frustrating experience that didn’t help me get into the habit of writing more tests.

Directories #

Python projects can place the tests directory either inside of the project or at the top level. There are caveats in both, but I usually recommend placing the tests directory inside the project. This is how a typical package structure looks like:

.
├── setup.py
└── skeleton
    ├── __init__.py
    └── tests
        ├── __init__.py
        └── test_units.py

2 directories, 4 files

In this case, the tests directory is a sub-directory of the skeleton project.

{blurb, class: warning} Do not use the singular test directory name because it overrides Python’s test module. It is best practice to avoid overwriting built-in modules since it may create a problem that is hard to debug when it happens. I recommend using tests. {/blurb}

The skeleton example project doesn’t have any modules or sub-modules at all. If it did, it is nice to see the layout of the tests mimic that of the project. For example, if the module skeleton/api.py existed, it would be nice to find a tests/test_api.py file. This isn’t required, but it makes it easier to understand where things are and to what place in the code it correlates to. For sub-directories and other modules, a similar convention is followed. For example, if skeleton/utilities/system.py existed, the test file for system.py would exist in tests/utilities/test_system.py.

Having a convention and following an ordering removes the guesswork, and allows you and contributors to make progress, rather than running grep to find out where a particular module is tested or spend time trying to figure out where to place tests for a new module.

Another step in the tests directory layout is if different types of tests are added. Usually, small projects tend to have unit tests but if the project is something that may require functional tests (for example an API exposed over HTTP) then separating the tests within their sub-directories makes sense:

.
├── setup.py
└── skeleton
    ├── __init__.py
    └── tests
        ├── functional
        │   └── __init__.py
        ├── __init__.py
        └── unit
            └── __init__.py

4 directories, 5 files

In this case, it adds two directories, one is functional, and the other one is unit. Functional testing is usually more involved and requires lots of different extras to run, so it makes sense to separate them from the more straightforward (more granular) unit tests.

Using a tests directory is not required for collection, but it is good practice as it keeps same-purpose files together (test files in the tests directory!).

Tests outside of the package #

Having the tests directory outside of the package has the benefit of being able to test without having to install the package itself. If you aren’t familiar with Python packaging, this means that a package that does not have a setup.py or setup.cfg with packaging configuration can still be tested if the test runner executes from the top directory. This is possible because Python puts the current working directory in sys.path which in turn, allows importing modules at that level.

For example, this project doesn’t have a setup.py file, so you can’t install it with standard Python tooling:

.
├── tests
│   └── test_import.py
└── uninstallable
    └── __init__.py

2 directories, 2 files

The single import test that checks if the module imports without trouble looks like this:

def test_imports_uninstallable():
    import uninstallable

Running from the top-level directory, the test runner picks up the uninstallable module, but only if executed from that place and only if done by executing the test runner with Python:

python -m pytest -v
[...]
collected 1 item

tests/test_import.py::test_imports_uninstallable PASSED                 [100%]

============================== 1 passed in 0.01s ==============================

If the test runner executes from within the tests directory, it fails:

python -m pytest -v
[...]
collected 1 item

test_import.py::test_imports_uninstallable FAILED                       [100%]

================================== FAILURES ===================================
_________________________ test_imports_uninstallable __________________________

    def test_imports_uninstallable():
>       import uninstallable
E       ModuleNotFoundError: No module named 'uninstallable'

test_import.py:3: ModuleNotFoundError
============================== 1 failed in 0.03s ==============================

I tend to run tests from different places and dislike having to be tied to run tooling from a specific place, but if a package is not installable, this would have to be the way to work with.

Tests inside of the package #

My preference is to have the tests directory inside the package. This is not always possible, however, and in cases like in the previous section where a package is not installable with standard Python tooling, it forces you to move them to the top level.

Some developers prefer to distribute their tests as part of the package itself, so that makes a compelling argument for having them all inside the package. If there aren’t any extensive functional tests that require lots of different files (a typical case for highly functional tests), then it is OK to have the tests directory inside the package. As you will see in many sections of this book, the critical aspect here is to be consistent, follow conventions, and keep up the order and cleanliness.

Files #

By using tests as the directory name, you are one step closer to automatic detection and collection. The next step is following another convention for naming files. Files should be prefixed with test_ so that they can be collected. The prefix has to include the underscore; without it, a test runner will skip it.

If you are trying out these conventions in the terminal, you might be surprised to see nothing gets picked up yet. This is because it isn’t enough to have a tests directory with a test_example.py file in it. You also need actual tests! And those tests, depending on what they are, also have other conventions that must be followed.

It is also possible to use a less popular convention for files, which is the _test.py suffix. This is relatively uncommon to see in Python projects, but pytest (our recommended test runner) supports it. Other tools may have support for different conventions, but following these recommendations ensures that your tests work regardless of tooling.

Functions, Classes, and test methods #

When testing, it is possible to create tests that are functions, or test methods (which belong to a class). Test functions, as well as test methods, need to be prefixed with test. I prefer to include an underscore, which is still valid for test collection as it helps readability. This is an example of a test function that verifies using a float which gives expected results:

def test_passing_a_float():
    result = util.str_to_int('1.99')
    assert result == 1

The function is ensuring that the str_to_int function that lives in the util module of the package takes a string that contains a float, and it correctly returns an integer. The internals of that utility doesn’t matter here, but by using underscores, it makes it easier to read.

If writing tests in classes, these need to be prefixed with Test (note how it’s capitalized). Within the class, test methods need to follow the same convention as test functions: names need to be prefixed with test. If the previous example moves inside a class, it looks like this:

class TestFloat():

    def test_passing_a_float(self):
        result = util.str_to_int('1.99')
        assert result == 1

Not very different from just a plain function except for being inside a class that has a name prefixed with Test.

For simple modules or functions, it can be enough to use functions for testing. If you have never written tests, I encourage you to start with test functions (not classes!) and slowly transition to classes when needed. Being able to use functions for testing is one of the best arguments to use a tool like pytest to run tests. Python’s standard library uses the unittest module, which forces you to use classes through inheritance. Having to use (and understand) class inheritance to write tests, regardless of how simple they might be, is a problem; writing tests should be as easy as possible, and tooling should not get in the way.

Special test class methods #

Aside from test naming conventions, there are other special names that you should be aware of when using classes. These names should be considered reserved, and should only be used for their intended purpose. These are all the special methods:

setup : This method allows to provide attributes or anything else that is used by tests. By convention, it is called once before every test method in the class is executed. If lots of tests are using some sample data, it can be defined once here instead of having it referenced over and over in every single test method.
teardown : This method allows to perform any cleanup actions needed by tests. Just like the setup method, it gets called once, but after every test method in the class is executed. For example, if a test is always leaving behind artifacts like files that shouldn’t be present, this special method could remove them, so that the next test doesn’t have a polluted environment.
setupclass : Similar to setup, but instead of running before every test is executed it runs once before starting a test in a class.
teardownclass : This method runs once after all tests in the class have been executed.

If you are wondering why __init__ is not mentioned in this list, it is because you should not have one for test classes. Historically, unittest.TestCase (Python’s standard library for testing) didn’t have them, and it relied on setup and teardown class methods, and tools like pytest skips collection of test classes if it detects an __init__ method in them.

Utilities in classes #

If you are using a class to group together a few tests, and you find that there is some repetitive action you need, then you have a couple of options depending on what you want. For example, if you are dealing with a very long string that needs to change one or two items, it is probably fine to define it as part of the setup method. This test method is a good example:

    def test_all_osds_are_up_and_in(self, node, host):
        cmd = "sudo ceph --cluster={cluster} --connect-timeout 5 \
        --keyring /var/lib/ceph/bootstrap-osd/{cluster}.keyring \
        -n client.bootstrap-osd osd tree -f json".format(cluster="ceph")
        output = json.loads(host.check_output(cmd))
        assert node["num_osds"] == 1

The cmd variable is very long. If other tests are doing other assertions for differences in the command, then this sort of repetitive addition in the tests is not going to be good. As soon as more than one test is using the same long string, changing a couple of things in it, it is time to extract it into the setup method. Extracting it looks like this:


class TestFunctionalOSDS:

    def setup(self):
        self.cmd = "sudo ceph --cluster={cluster} --connect-timeout 5 \
        --keyring /var/lib/ceph/bootstrap-osd/{cluster}.keyring \
        -n client.bootstrap-osd osd tree -f json"

    def test_all_osds_are_up_and_in(self, node, host):
        cmd = self.cmd.format(cluster="ceph")
        output = json.loads(host.check_output(cmd))
        assert node["num_osds"] == 1

If what you are looking for is more behavioral, like something a function would do better, then extracting that into a utility method is the way to go. Utility methods are more straightforward than what they sound like. All that is needed is to create a method with the behavior needed and avoid prefixing it with test_. Now the method can be used everywhere, with optional parameters.

In a different test suite, I find that some tests keep comparing the result of an exception by calling __str__() which might be fine, except I think it could be improved. This is how that class looks like:


class TestInvalid(object):

    def test_include_the_path_in_str(self):
        error = exceptions.Invalid('key', ['path'])
        assert 'path' in error.__str__()

    def test_include_the_key_in_str(self):
        error = exceptions.Invalid('key', ['path'])
        assert 'key' in error.__str__()

Extract the repetitive parts, and make it look cleaner with a utility method (also called helper method):


class TestInvalid(object):

    def create_error(self, path_items):
        error = exceptions.Invalid('key', path_items)
        return error.__str__()

    def test_include_the_path_in_str(self):
        error = self.create_error(['path'])
        assert 'path' in error

    def test_include_the_key_in_str(self):
        error = self.create_error(['path'])
        assert 'key' in error

By introducing the create_error utility, the repetitive behavior of introspecting the exception’s magic __str__ method moves elsewhere, and tests become much more straightforward. This type of extraction and refactoring makes the test suite cleaner and easier to read. When different behavior is needed, it can either be added to the utility or, if too complicated, create a new class.

That is the key to lots of testing guidelines in this chapter: if it gets too complicated and unreadable, it is time to extract, refactor, and group tests differently. There is no need to pile up tests in the same class unless they are all testing a similar component, and they can share some utility or unique setup (or teardown) method.

Good naming patterns #

A while ago, I was working in a rather large engineering group. It was about thirty engineers working together on a content management system (CMS), probably the largest monolithic application I’ve ever been a part of. The test suite for the project was extensive, and because the team was large, it had different styles and conventions, which made it hard to grasp where to add new tests or find existing ones.

One particular problem with the test suite is that it had many tests that weren’t descriptive of what the test was about, and it had several (sometimes more than half a dozen!) assertions. Let’s assume that the application has some tests that validate responses from a remote API over HTTP. This is how a bad test would look like:


class TestAPI:

    def test_simple(self):
        response = requests.get(
            'https://example.com/api/',
             allow_redirects=True
        )
        json = response.json()
        assert response.code == 200
        assert json['message'] == "OK"
        assert json['error'] == None
        assert response.url == 'http://api.example.com'

The test is not very good because when it fails, the first thing you are going to see is that test_simple failed in the TestAPI class. What does that mean? If the class is testing many different types of requests like GET, POST, and HEAD then it is impossible to tell what this test is about. Grouping tests by its similarities allows better naming. In this case, we are dealing with a class, but this is applicable to file names and directory structure. Let’s improve this test:


class TestAPIGetRequests:

    def test_is_redirected(self):
        response = requests.get(
            'https://example.com/api/',
             allow_redirects=True
        )
        json = response.json()
        assert response.code == 200
        assert json['message'] == "OK"
        assert json['error'] == None
        assert response.url == 'http://api.example.com'

With the name change, a failure would now indicate that the TestAPIGetRequests class had a failure in the test_is_redirected test. It is now clear that a redirect fails on GET requests to the API. This is much better, and it is a core principle of writing good tests: when it fails, it should not take much effort to understand what expectation is failing. The test is still suffering from another common problem: it is asserting to many different things.

Asserting many different things in a single test is a problem because the first assertion that fails prevents other assertions from being executed. If the code before the assertions has any issue, then none of the other assertions gets touched. Finally, and probably worse for the developer that has to fix this failing test: once you fix the failing assertion, you might find the next assertion in the test fails, robbing the developer the assurance that the fixes are making progress. Let’s improve the test once more:


class TestAPIGetRequests:

    def test_is_redirected(self):
        response = requests.get(
            'https://example.com/api/',
             allow_redirects=True
        )
        assert response.url == 'http://api.example.com'

    def test_base_url_responds(self):
        response = requests.get(
            'https://example.com/api/',
             allow_redirects=False
        )
        assert response.code == 200

    def test_ok_json_message(self):
        response = requests.get(
            'https://api.example.com/',
             allow_redirects=False
        )
        json = response.json()
        assert json['message'] == "OK"
        assert json['error'] == None

At first glance, the three tests might look repetitive, and I even left two assertions in the last test instead of separating them further into another test.

Even though they look repetitive and, to some extent, they are, it provides excellent granularity and good naming convention that hints what the test is about. If the test_base_url_responds fails because the redirect is not being followed, then that is something that doesn’t have anything to do with checking JSON messages in the response. If you look closely, the last test is going directly to the API URL, avoiding the redirect path. If the tests weren’t separated like this as in the first example, it wouldn’t be evident if the failure is that the JSON is unexpected, invalid, or if the redirect doesn’t work.

In some cases, it is perfectly fine to group assertions like I did in the last test. In this case, the message should be "OK", and error should always be None. The test is still distinct enough that goes to the point: it is focusing only on the JSON response when everything is "OK".