Unit Testing

Last updated on 2025-09-28 | Edit this page

Overview

Questions

What is unit testing?
Why is unit testing important?
How do you write a unit test in Python?

Objectives

Explain the concept of unit testing
Demonstrate how to write and run unit tests in Python using pytest

Unit Testing

We have our two little test files, but you might imagine that it’s not particularly efficient to always write individual scripts to test our code. What if we had a lot of functions or classes? Or a lot of different ideas to test? What if our objects changed down the line? Our existing modules are going to be difficult to maintain, and, as you may have expected, there is already a solution for this problem.

What Makes For a Good Test?

A good test is one that is: - Isolated: A good test should be able to run independently of other tests, and should not rely on external resources such as databases or web services. - Repeatable: A good test should produce the same results every time it is run. - Fast: A good test should run quickly, so that it can be run frequently during development. - Clear: A good test should be easy to understand, so that other developers can easily see what is being tested and why. Not just in the output provided by the test, but the variables, function names, and structure of the test should be clear and descriptive.

It can be easy to get carried away with testing everything and writing tests that cover every single case that you can come up with. However having a massive test suite that is cumbersome to update and takes a long time to run is not useful. A good rule of thumb is to focus on testing the most important parts of your code, and the parts that are most likely to break. This often means focusing on edge cases and error handling, rather than trying to test every possible input.

Generally, a good test module will test the basic functionality of an object or function, as well as a few edge cases.

Discussion

What are some edge cases that you can think of for the hello function we wrote earlier?

What about the Document class?

Show details

Some edge cases for the hello function could include:

Passing in an empty string
Passing in a very long string
Passing in something that is not a string (e.g. a number or a list)

Some edge cases for the Document class could include:

Passing in a file path that does not exist
Passing in a file that is not a text file
Passing in a file that is empty
Passing in a word that does not exist in the document when testing get_word_occurrence

pytest

pytest is a testing framework for Python that helps to write simple and scalable test cases. It is widely used in the python community, and has in-depth and comprehensive documentation. We won’t be getting into all of the different things you can do with pytest, but we will cover the basics here.

To start off with, we need to add pytest to our environment. However unlike our previous packages, pytest is not required for our module to work, it is only used by us as we are writing code. Therefore it is a “development dependency”. We can still add this to our pyproject.toml file via uv, but we need to add a special flag to our command so that it goes in the correct place.

BASH

uv add pytest --dev

If you open up your pyproject.toml file, you should see that pytest has been added under a new called “dependency groups” (your version number may be different):

TOML

[dependency-groups]
dev = [
    "pytest>=8.4.2",
]

Now we can start creating our tests.

Writing a pytest Test File

Part of pytest is the concept of “test discovery”. This means that pytest will automatically find any test files that follow a certain naming convention. By default, pytest will look for files that start with test_ or end with _test.py. Inside these files, pytest will look for functions that start with test_.

Now, our files already have the correct names, so we just need to change the contents. Let’s start with test_say_hello.py. Open it up and replace the contents with the following:

PYTHON

from textanalysis_tool.say_hello import hello

def test_hello():
    assert hello("My Name") == "Hello, My Name!"

In our previous test file, we had to add the path to our module each time. Now that we are using pytest, we can use a special file called conftest.py to add this path automatically. Create a file called conftest.py in the tests directory and add the following code to it:

PYTHON

import sys

sys.path.insert(0, "./src")

Now, we just need to run the tests. We can do this with the following command:

BASH

uv run pytest

Callout

Note that we are using uv run to run pytest, this ensures that pytest is run in the correct environment with all the dependencies we have installed.

You should see output similar to the following:

============================= test session starts ==============================
platform win32 -- Python 3.13.7, pytest-8.4.2, pluggy-1.6.0
rootdir: D:\Documents\Projects\textanalysis-tool
configfile: pyproject.toml
collected 1 item

tests\test_say_hello.py .                                          [100%]
============================== 1 passed in 0.12s ===============================

Why didn’t it run the other test file? Because even though the file is named correctly, it doesn’t contain any functions that start with test_. Let’s fix that now.

Open up test_document.py and replace the contents with the following:

PYTHON

from textanalysis_tool.document import Document

def test_create_document():
    Document.CONTENT_PATTERN = r"(.*)"
    doc = Document(filepath="tests/example_file.txt")
    assert doc.filepath == "tests/example_file.txt"


def test_document_word_count():
    Document.CONTENT_PATTERN = r"(.*)"
    doc = Document(filepath="tests/example_file.txt")
    assert doc.get_line_count() == 2


def test_document_word_occurrence():
    Document.CONTENT_PATTERN = r"(.*)"
    doc = Document(filepath="tests/example_file.txt")
    assert doc.get_word_occurrence("test") == 2

Callout

Our example file doesn’t exactly look like a Project Gutenberg text file, so we need to change the CONTENT_PATTERN to match everything. This is a class level variable, so we can change it on the class itself, rather than on the instance.

Let’s run our tests again:

BASH

uv run pytest

You should see output similar to the following:

============================= test session starts ==============================
platform win32 -- Python 3.13.7, pytest-8.4.2, pluggy-1.6.0
rootdir: D:\Documents\Projects\textanalysis-tool
configfile: pyproject.toml
collected 4 items

tests\test_document.py ...                                      [ 75%]
tests\test_say_hello.py .                                       [100%]
============================== 4 passed in 0.15s ===============================

You can see that all of the tests have passed. There is a small green pip for each test that was performed, and a summary at the end. Compare this to the test file we had before. We got rid of all of the if statements, and just use the assert statement to check if the output is what we expect.

Testing Edge Cases / Exceptions

Let’s add an edge case to our tests. Open up test_say_hello.py and add a test case for an empty string:

PYTHON

from textanalysis_tool.say_hello import hello

def test_hello():
    assert hello("My Name") == "Hello, My Name!"

def test_hello_empty_string():
    assert hello("") == "Hello, !"

Run the tests again:

BASH

uv run pytest

We get passing tests, which is what we expect. But we are the ones in charge of the function, what if we say that if the user doesn’t provide a name, we want to raise an exception? Let’s change the test to say that if the user provides an empty string, we want to raise a ValueError:

PYTHON

import pytest

from textanalysis_tool.say_hello import hello


def test_hello():
    assert hello("My Name") == "Hello, My Name!"


def test_hello_empty_string():
    with pytest.raises(ValueError):
        hello("")

Run the tests again:

BASH

uv run pytest

This time, we get a failing test, because the hello function DID NOT raise a ValueError. Let’s change the hello function to raise a ValueError if the name is an empty string:

PYTHON

def hello(name: str = "User"):
    if name == "":
        raise ValueError("Name cannot be empty")
    return f"Hello, {name}!"

Running the tests again, we can see that all the tests pass.

Fixtures

One of the great features of pytest is the ability to use fixtures. Fixtures are a way to provide data, state or configurations to your tests. For example, we have a line in each of our tests that creates a new Document object. We can use a fixture to create this object once, and then use it in each of our tests. That way, if we need to change the way we create the object in the future, we only need to change it in one place.

Let’s create a fixture for our Document object. Open up test_document.py and add the following import at the top:

PYTHON

import pytest

Then, add the following code below the imports:

PYTHON

@pytest.fixture
def doc():
    return Document(filepath="tests/example_file.txt")

Now, we can use this fixture in our tests. Update the test functions to accept a parameter called doc, and remove the line that creates the Document object. The updated test file should look like this:

PYTHON

import pytest

from textanalysis_tool.document import Document

@pytest.fixture
def doc():
    Document.CONTENT_PATTERN = r"(.*)"
    return Document(filepath="tests/example_file.txt")

def test_create_document(doc):
    assert doc.filepath == "tests/example_file.txt"

def test_document_word_count(doc):
    assert doc.get_line_count() == 2

def test_document_word_occurrence(doc):
    assert doc.get_word_occurrence("test") == 2

Callout

Because our Documents are validated by searching for a starting and ending regex pattern, our test files will not have that. We could ensure that our test files would, or we can just temporarily alter the search pattern for the duration of the test. CONTENT_PATTERN is a class level variable, so we need to modify it before the instance is created.

Let’s run our tests again. Nothing changed in the output, but our code is now cleaner and easier to maintain.

Monkey Patching

Another useful feature of pytest is monkey patching. Monkey patching is a way to modify or extend the behavior of a function or class during testing. This is useful when you want to test a function that depends on an external resource, such as a database, file system or web resource. Instead of actually accessing the external resource, you can use monkey patching to replace the function that accesses the resource with a mock function that returns a predefined value.

In our use case, we have a file called example_file.txt that we use to test our Document class. However, if we wanted to test the Document class with files that have different contents, would need to create a whole array of different test files. Instead, we can use monkey patching to replace the open function, so that instead of actually opening a file, it returns a string that we define.

Let’s monkey patch the open function in our test_document.py file. First, we need to import the monkeypatch fixture from unittest.mock (a python built-in module). Add the following import at the top of the file:

PYTHON

from unittest.mock import mock_open

Then, we can create a new fixture that monkey patches the open function. Add the following code below the doc fixture:

PYTHON

@pytest.fixture(autouse=True)
def mock_file(monkeypatch):
    mock = mock_open(read_data="This is a test document. It contains words.\nIt is only a test document.")
    monkeypatch.setattr("builtins.open", mock)
    return mock

The other difference you’ll notice is that we added the parameter autouse=True to the fixture. This means that, within this test file, this specific fixture will be automatically applied to all tests, without needing to explicitly include it as a parameter in each test function.

Go ahead and delete the example_file.txt file, and run the tests again. Your tests should still pass, even though the file doesn’t exist anymore. This is because we are using monkey patching to replace the open function with a mock function that returns the string we defined.

Edge Cases Again / Test Driven Development

Let’s go back and think about other things that might go wrong with our Document class. What if the user provides a file path that doesn’t exist? What if the user provides a file that is not a text file? Or a file that is empty of content? Rather than write these into our class object, we can first write tests that will check for the behavior we expect or want in these edge cases, see if they fail, and then update our class object to make the tests pass. This is called “Test Driven Development” (TDD), and is a common practice in software development.

Let’s add a test for a file that is empty. In this case, we would want the initialization of the object to fail with a ValueError. However for this test, we can’t use our fixtures from above, so we’ll have to code it into the test. Add the following to test_document.py:

PYTHON

def test_empty_file(monkeypatch):
    # Mock an empty file
    mock = mock_open(read_data="")
    monkeypatch.setattr("builtins.open", mock)

    with pytest.raises(ValueError):
        Document(filepath="empty_file.txt")

Callout

Because we are monkeypatching the open function, we don’t actually need to have a file called empty_file.txt in our tests directory. The open function will be replaced with our mock function that returns an empty string. We are providing a file name here to be consistent with the Document class initialization, and we are using the name to act as additional information for later developers to clarify the intent of the test.

Run the tests again:

BASH

uv run pytest

It fails, as we expect. Now, let’s update the Document class to raise a ValueError if the file is empty. Open up document.py and update the get_content method to the following:

PYTHON

...
    def get_content(self, filepath: str) -> str:
        raw_text = self.read(filepath)
        if not raw_text:
            raise ValueError(f"File {filepath} contains no content.")

        match = re.search(self.CONTENT_PATTERN, raw_text, re.DOTALL)
        if match:
            return match.group(1).strip()
        raise ValueError(f"File {filepath} is not a valid Project Gutenberg Text file.")
...

Challenge

Challenge 1: Write a simple test

Create a file called text_utilities.py in the src/textanalysis_tool directory. In this file, paste the following function:

PYTHON

def create_acronym(phrase: str) -> str:
    """Create an acronym from a phrase.

    Args:
        phrase (str): The phrase to create an acronym from.

    Returns:
        str: The acronym.
    """
    if not isinstance(phrase, str):
        raise TypeError("Phrase must be a string.")

    words = phrase.split()
    if len(words) == 0:
        raise ValueError("Phrase must contain at least one word.")

    articles = {"a", "an", "the", "and", "but", "or", "nor", "on", "at", "to", "by", "in"}

    acronym = ""
    for word in words:
        if word.lower() not in articles:
            acronym += word[0].upper()

    return acronym

Create the following test cases for this function:

A test that checks if the acronym for “As Soon As Possible” is “ASAP” and that the acronym for “For Your Information” is “FYI”.
A test that checks that the function raises a TypeError when the input is not a string.
A test that checks that the function raises a ValueError when the input is an empty string.

Are there any other edge cases you can think of? Write a test to prove that your edge case is not handled by this function as it is currently written.

Give me a hint

Remember that to use pytest, you need to create a file that starts with test_ and that the test functions need to start with test_ as well.

Give me a hint

You can use the pytest.raises context manager to check for specific exceptions. For example:

PYTHON

def test_raises_error():
    with pytest.raises(FileNotFoundError):
        read_my_file("non_existent_file.txt")

Give me a hint

What happens if the phrase contains only articles? For example, “and the or by”?

Show me the solution

in the tests directory, create a file called test_text_utilities.py:

PYTHON

import pytest

from textanalysis_tool.text_utilities import create_acronym


def test_create_acronym():
    assert create_acronym("As Soon As Possible") == "ASAP"
    assert create_acronym("For Your Information") == "FYI"


def test_create_acronym_invalid_type():
    with pytest.raises(TypeError):
        create_acronym(123)


def test_create_acronym_empty_string():
    with pytest.raises(ValueError):
        create_acronym("")


def test_create_acronym_no_valid_words():
    with pytest.raises(ValueError):
        create_acronym("and the or")

Run the tests with uv run pytest.

In the create_acronym function, we need to add a check after we finish iterating through the words to see if the acronym is empty. If it is, we can raise a ValueError:

PYTHON

    ...

    if not acronym:
        raise ValueError("Phrase must contain at least one non-article word.")

    return acronym

Challenge

Challenge 2: Additional Edge Case

Try adding a test for another edge case for our Document class, this time for a file that is not actually a text file, for example, a binary file or an image file. Then, update the Document class to make the test pass.

Give me a hint

You can mock a binary file by using the mock_open function from the unittest.mock module, and using the read_data parameter to provide binary data like b'\x00\x01\x02'.

Give me a hint

In the Document class, we need to check if the data read from the file is binary data. The read method of a file object is clever enough to return binary data as a bytes object, so we can check if the data in self._content is an instance of type bytes. If it is, we can raise a ValueError.

PYTHON

text_data = "This is a test string."
if isinstance(text_data, bytes):
    raise ValueError("File is not a valid text file.")

binary_data = b'\x00\x01\x02'
if isinstance(binary_data, bytes):
    raise ValueError("File is not a valid text file.")

Show me the solution

You can create a test that simulates opening a binary file by using the mock_open function from the unittest.mock module. Here’s an example of how you might write such a test:

PYTHON

def test_binary_file(monkeypatch):
    # Mock a binary file
    mock = mock_open(read_data=b'\x00\x01\x02')
    monkeypatch.setattr("builtins.open", mock)

    with pytest.raises(ValueError):
        Document(filepath="binary_file.bin")

And then, in the Document class, you can check if the data read from the file is binary data like this:

PYTHON

...
    def get_content(self, filepath: str) -> str:
        raw_text = self.read(filepath)
        if not raw_text:
            raise ValueError(f"File {filepath} contains no content.")

        if isinstance(raw_text, bytes):
            raise ValueError(f"File {self.filepath} is not a valid text file.")

        match = re.search(self.CONTENT_PATTERN, raw_text, re.DOTALL)
        if match:
            return match.group(1).strip()
        raise ValueError(f"File {filepath} is not a valid Project Gutenberg Text file.")
...

Key Points

We can use pytest to write and run unit tests in Python.
A good test is isolated, repeatable, fast, and clear.
We can use fixtures to provide data or state to our tests.
We can use monkey patching to modify the behavior of functions or classes during testing.
Test Driven Development (TDD) is a practice where we write tests before writing the code to make the tests pass.