Unit Testing
Last updated on 2025-09-28 | Edit this page
Overview
Questions
- What is unit testing?
- Why is unit testing important?
- How do you write a unit test in Python?
Objectives
- Explain the concept of unit testing
- Demonstrate how to write and run unit tests in Python using
pytest
Unit Testing
We have our two little test files, but you might imagine that it’s not particularly efficient to always write individual scripts to test our code. What if we had a lot of functions or classes? Or a lot of different ideas to test? What if our objects changed down the line? Our existing modules are going to be difficult to maintain, and, as you may have expected, there is already a solution for this problem.
What Makes For a Good Test?
A good test is one that is: - Isolated: A good test should be able to run independently of other tests, and should not rely on external resources such as databases or web services. - Repeatable: A good test should produce the same results every time it is run. - Fast: A good test should run quickly, so that it can be run frequently during development. - Clear: A good test should be easy to understand, so that other developers can easily see what is being tested and why. Not just in the output provided by the test, but the variables, function names, and structure of the test should be clear and descriptive.
It can be easy to get carried away with testing everything and writing tests that cover every single case that you can come up with. However having a massive test suite that is cumbersome to update and takes a long time to run is not useful. A good rule of thumb is to focus on testing the most important parts of your code, and the parts that are most likely to break. This often means focusing on edge cases and error handling, rather than trying to test every possible input.
Generally, a good test module will test the basic functionality of an object or function, as well as a few edge cases.
What are some edge cases that you can think of for the
hello
function we wrote earlier?
What about the Document
class?
Some edge cases for the hello
function could
include:
- Passing in an empty string
- Passing in a very long string
- Passing in something that is not a string (e.g. a number or a list)
Some edge cases for the Document
class could
include:
- Passing in a file path that does not exist
- Passing in a file that is not a text file
- Passing in a file that is empty
- Passing in a word that does not exist in the document when testing
get_word_occurrence
pytest
pytest
is a testing framework for Python that helps to
write simple and scalable test cases. It is widely used in the python
community, and has in-depth and comprehensive documentation. We won’t be
getting into all of the different things you can do with pytest, but we
will cover the basics here.
To start off with, we need to add pytest to our environment. However
unlike our previous packages, pytest
is not required for
our module to work, it is only used by us as we are writing code.
Therefore it is a “development dependency”. We can still add this to our
pyproject.toml
file via uv, but we need to add a special
flag to our command so that it goes in the correct place.
If you open up your pyproject.toml
file, you should see
that pytest
has been added under a new called “dependency
groups” (your version number may be different):
Now we can start creating our tests.
Writing a pytest Test File
Part of pytest
is the concept of “test discovery”. This
means that pytest will automatically find any test files that follow a
certain naming convention. By default, pytest will look for files that
start with test_
or end with _test.py
. Inside
these files, pytest will look for functions that start with
test_
.
Now, our files already have the correct names, so we just need to
change the contents. Let’s start with test_say_hello.py
.
Open it up and replace the contents with the following:
PYTHON
from textanalysis_tool.say_hello import hello
def test_hello():
assert hello("My Name") == "Hello, My Name!"
In our previous test file, we had to add the path to our module each
time. Now that we are using pytest
, we can use a special
file called conftest.py
to add this path automatically.
Create a file called conftest.py
in the tests
directory and add the following code to it:
Now, we just need to run the tests. We can do this with the following command:
Note that we are using uv run
to run pytest, this
ensures that pytest is run in the correct environment with all the
dependencies we have installed.
You should see output similar to the following:
============================= test session starts ==============================
platform win32 -- Python 3.13.7, pytest-8.4.2, pluggy-1.6.0
rootdir: D:\Documents\Projects\textanalysis-tool
configfile: pyproject.toml
collected 1 item
tests\test_say_hello.py . [100%]
============================== 1 passed in 0.12s ===============================
Why didn’t it run the other test file? Because even though the file
is named correctly, it doesn’t contain any functions that start with
test_
. Let’s fix that now.
Open up test_document.py
and replace the contents with
the following:
PYTHON
from textanalysis_tool.document import Document
def test_create_document():
Document.CONTENT_PATTERN = r"(.*)"
doc = Document(filepath="tests/example_file.txt")
assert doc.filepath == "tests/example_file.txt"
def test_document_word_count():
Document.CONTENT_PATTERN = r"(.*)"
doc = Document(filepath="tests/example_file.txt")
assert doc.get_line_count() == 2
def test_document_word_occurrence():
Document.CONTENT_PATTERN = r"(.*)"
doc = Document(filepath="tests/example_file.txt")
assert doc.get_word_occurrence("test") == 2
Our example file doesn’t exactly look like a Project Gutenberg text
file, so we need to change the CONTENT_PATTERN
to match
everything. This is a class level variable, so we can change it on the
class itself, rather than on the instance.
Let’s run our tests again:
You should see output similar to the following:
============================= test session starts ==============================
platform win32 -- Python 3.13.7, pytest-8.4.2, pluggy-1.6.0
rootdir: D:\Documents\Projects\textanalysis-tool
configfile: pyproject.toml
collected 4 items
tests\test_document.py ... [ 75%]
tests\test_say_hello.py . [100%]
============================== 4 passed in 0.15s ===============================
You can see that all of the tests have passed. There is a small green
pip for each test that was performed, and a summary at the end. Compare
this to the test file we had before. We got rid of all of the if
statements, and just use the assert
statement to check if
the output is what we expect.
Testing Edge Cases / Exceptions
Let’s add an edge case to our tests. Open up
test_say_hello.py
and add a test case for an empty
string:
PYTHON
from textanalysis_tool.say_hello import hello
def test_hello():
assert hello("My Name") == "Hello, My Name!"
def test_hello_empty_string():
assert hello("") == "Hello, !"
Run the tests again:
We get passing tests, which is what we expect. But we are the ones in
charge of the function, what if we say that if the user doesn’t provide
a name, we want to raise an exception? Let’s change the test to say that
if the user provides an empty string, we want to raise a
ValueError
:
PYTHON
import pytest
from textanalysis_tool.say_hello import hello
def test_hello():
assert hello("My Name") == "Hello, My Name!"
def test_hello_empty_string():
with pytest.raises(ValueError):
hello("")
Run the tests again:
This time, we get a failing test, because the hello
function DID NOT raise a ValueError
. Let’s change the
hello
function to raise a ValueError
if the
name is an empty string:
PYTHON
def hello(name: str = "User"):
if name == "":
raise ValueError("Name cannot be empty")
return f"Hello, {name}!"
Running the tests again, we can see that all the tests pass.
Fixtures
One of the great features of pytest is the ability to use fixtures.
Fixtures are a way to provide data, state or configurations to your
tests. For example, we have a line in each of our tests that creates a
new Document
object. We can use a fixture to create this
object once, and then use it in each of our tests. That way, if we need
to change the way we create the object in the future, we only need to
change it in one place.
Let’s create a fixture for our Document
object. Open up
test_document.py
and add the following import at the
top:
Then, add the following code below the imports:
Now, we can use this fixture in our tests. Update the test functions
to accept a parameter called doc
, and remove the line that
creates the Document
object. The updated test file should
look like this:
PYTHON
import pytest
from textanalysis_tool.document import Document
@pytest.fixture
def doc():
Document.CONTENT_PATTERN = r"(.*)"
return Document(filepath="tests/example_file.txt")
def test_create_document(doc):
assert doc.filepath == "tests/example_file.txt"
def test_document_word_count(doc):
assert doc.get_line_count() == 2
def test_document_word_occurrence(doc):
assert doc.get_word_occurrence("test") == 2
Because our Documents are validated by searching for a starting and
ending regex pattern, our test files will not have that. We could ensure
that our test files would, or we can just temporarily alter the search
pattern for the duration of the test. CONTENT_PATTERN
is a
class level variable, so we need to modify it before the instance is
created.
Let’s run our tests again. Nothing changed in the output, but our code is now cleaner and easier to maintain.
Monkey Patching
Another useful feature of pytest is monkey patching. Monkey patching is a way to modify or extend the behavior of a function or class during testing. This is useful when you want to test a function that depends on an external resource, such as a database, file system or web resource. Instead of actually accessing the external resource, you can use monkey patching to replace the function that accesses the resource with a mock function that returns a predefined value.
In our use case, we have a file called example_file.txt
that we use to test our Document
class. However, if we
wanted to test the Document
class with files that have
different contents, would need to create a whole array of different test
files. Instead, we can use monkey patching to replace the
open
function, so that instead of actually opening a file,
it returns a string that we define.
Let’s monkey patch the open
function in our
test_document.py
file. First, we need to import the
monkeypatch
fixture from unittest.mock
(a
python built-in module). Add the following import at the top of the
file:
Then, we can create a new fixture that monkey patches the
open
function. Add the following code below the
doc
fixture:
PYTHON
@pytest.fixture(autouse=True)
def mock_file(monkeypatch):
mock = mock_open(read_data="This is a test document. It contains words.\nIt is only a test document.")
monkeypatch.setattr("builtins.open", mock)
return mock
The other difference you’ll notice is that we added the parameter
autouse=True
to the fixture. This means that, within this
test file, this specific fixture will be automatically applied to all
tests, without needing to explicitly include it as a parameter in each
test function.
Go ahead and delete the example_file.txt
file, and run
the tests again. Your tests should still pass, even though the file
doesn’t exist anymore. This is because we are using monkey patching to
replace the open
function with a mock function that returns
the string we defined.
Edge Cases Again / Test Driven Development
Let’s go back and think about other things that might go wrong with
our Document
class. What if the user provides a file path
that doesn’t exist? What if the user provides a file that is not a text
file? Or a file that is empty of content? Rather than write these into
our class object, we can first write tests that will check for the
behavior we expect or want in these edge cases, see if they fail, and
then update our class object to make the tests pass. This is called
“Test Driven Development” (TDD), and is a common practice in software
development.
Let’s add a test for a file that is empty. In this case, we would
want the initialization of the object to fail with a
ValueError
. However for this test, we can’t use our
fixtures from above, so we’ll have to code it into the test. Add the
following to test_document.py
:
PYTHON
def test_empty_file(monkeypatch):
# Mock an empty file
mock = mock_open(read_data="")
monkeypatch.setattr("builtins.open", mock)
with pytest.raises(ValueError):
Document(filepath="empty_file.txt")
Because we are monkeypatching the open
function, we
don’t actually need to have a file called empty_file.txt
in
our tests directory. The open
function will be replaced
with our mock function that returns an empty string. We are providing a
file name here to be consistent with the Document
class
initialization, and we are using the name to act as additional
information for later developers to clarify the intent of the test.
Run the tests again:
It fails, as we expect. Now, let’s update the Document
class to raise a ValueError
if the file is empty. Open up
document.py
and update the get_content
method
to the following:
PYTHON
...
def get_content(self, filepath: str) -> str:
raw_text = self.read(filepath)
if not raw_text:
raise ValueError(f"File {filepath} contains no content.")
match = re.search(self.CONTENT_PATTERN, raw_text, re.DOTALL)
if match:
return match.group(1).strip()
raise ValueError(f"File {filepath} is not a valid Project Gutenberg Text file.")
...
Challenge 1: Write a simple test
Create a file called text_utilities.py
in the
src/textanalysis_tool
directory. In this file, paste the
following function:
PYTHON
def create_acronym(phrase: str) -> str:
"""Create an acronym from a phrase.
Args:
phrase (str): The phrase to create an acronym from.
Returns:
str: The acronym.
"""
if not isinstance(phrase, str):
raise TypeError("Phrase must be a string.")
words = phrase.split()
if len(words) == 0:
raise ValueError("Phrase must contain at least one word.")
articles = {"a", "an", "the", "and", "but", "or", "nor", "on", "at", "to", "by", "in"}
acronym = ""
for word in words:
if word.lower() not in articles:
acronym += word[0].upper()
return acronym
Create the following test cases for this function:
- A test that checks if the acronym for “As Soon As Possible” is “ASAP” and that the acronym for “For Your Information” is “FYI”.
- A test that checks that the function raises a
TypeError
when the input is not a string. - A test that checks that the function raises a
ValueError
when the input is an empty string.
Are there any other edge cases you can think of? Write a test to prove that your edge case is not handled by this function as it is currently written.
Remember that to use pytest, you need to create a file that starts
with test_
and that the test functions need to start with
test_
as well.
What happens if the phrase contains only articles? For example, “and the or by”?
in the tests
directory, create a file called
test_text_utilities.py
:
PYTHON
import pytest
from textanalysis_tool.text_utilities import create_acronym
def test_create_acronym():
assert create_acronym("As Soon As Possible") == "ASAP"
assert create_acronym("For Your Information") == "FYI"
def test_create_acronym_invalid_type():
with pytest.raises(TypeError):
create_acronym(123)
def test_create_acronym_empty_string():
with pytest.raises(ValueError):
create_acronym("")
def test_create_acronym_no_valid_words():
with pytest.raises(ValueError):
create_acronym("and the or")
Run the tests with uv run pytest
.
In the create_acronym
function, we need to add a check
after we finish iterating through the words to see if the acronym is
empty. If it is, we can raise a ValueError
:
Challenge 2: Additional Edge Case
Try adding a test for another edge case for our Document class, this
time for a file that is not actually a text file, for example, a binary
file or an image file. Then, update the Document
class to
make the test pass.
You can mock a binary file by using the mock_open
function from the unittest.mock
module, and using the
read_data
parameter to provide binary data like
b'\x00\x01\x02'
.
In the Document
class, we need to check if the data read
from the file is binary data. The read
method of a file
object is clever enough to return binary data as a bytes
object, so we can check if the data in self._content
is an
instance of type bytes
. If it is, we can raise a
ValueError
.
You can create a test that simulates opening a binary file by using
the mock_open
function from the unittest.mock
module. Here’s an example of how you might write such a test:
PYTHON
def test_binary_file(monkeypatch):
# Mock a binary file
mock = mock_open(read_data=b'\x00\x01\x02')
monkeypatch.setattr("builtins.open", mock)
with pytest.raises(ValueError):
Document(filepath="binary_file.bin")
And then, in the Document
class, you can check if the
data read from the file is binary data like this:
PYTHON
...
def get_content(self, filepath: str) -> str:
raw_text = self.read(filepath)
if not raw_text:
raise ValueError(f"File {filepath} contains no content.")
if isinstance(raw_text, bytes):
raise ValueError(f"File {self.filepath} is not a valid text file.")
match = re.search(self.CONTENT_PATTERN, raw_text, re.DOTALL)
if match:
return match.group(1).strip()
raise ValueError(f"File {filepath} is not a valid Project Gutenberg Text file.")
...
- We can use
pytest
to write and run unit tests in Python. - A good test is isolated, repeatable, fast, and clear.
- We can use fixtures to provide data or state to our tests.
- We can use monkey patching to modify the behavior of functions or classes during testing.
- Test Driven Development (TDD) is a practice where we write tests before writing the code to make the tests pass.