Extending Classes with Inheritance

Last updated on 2025-09-28 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • What if we want classes that are similar, but handle slightly different cases?
  • How can we avoid duplicating code in our classes?

Objectives

  • Explain the concept of inheritance in object-oriented programming
  • Demonstrate how to create a subclass that inherits from a parent class
  • Show how to override methods and properties in a subclass

Extending Classes with Inheritance


So far you may be wondering why classes are useful. After all, all we’ve really done in essence is make a tiny module with some functions in it that are slightly more complicated than normal functions. One of the real powers of classes is the ability to limit code duplicate through a concept called inheritance.

Inheritance

Inheritance is a way to create a new class that contains all of the same properties and methods as an existing class, but allows us to add additional new properties and methods, or to override existing methods. This allows us to create a new class that is a specialized version of an existing class, without having to rewrite a whole bunch of code.

Taking a look at our Car class from earlier, we might want to create a new class for a specific type of engine, like a Gas Engine or an Electric Engine. Both kinds of cars will have the same basic properties and methods, but they will also have some additional properties and methods that are specific to the type of engine, or properties that are set by default, like our fuel property.

But since both types of cars are still cars, they will share a lot of the same properties and methods. Rather than repeating all of the code from the Car class in both our new classes, we can use inheritance to create our new classes based on the Car class:

In python, this would look something like this:

PYTHON

class Car:
    def __init__(self, make: str, model: str, year: int, color: str = "grey", fuel: str = "gasoline"):
        self.make = make
        self.model = model
        self.year = year
        self.color = color
        self.fuel = fuel

    def honk(self) -> str:
        return "beep"

    def paint(self, new_color: str) -> None:
        self.color = new_color

    def noise(self, speed: int) -> str:
        if speed <= 10:
            return "putt putt"
        else:
            return "vrooom"

class CarGasEngine(Car):
    def __init__(self, make: str, model: str, year: int, color: str = "grey"):
        super().__init__(make=make, model=model, year=year, color=color, fuel="gasoline")

class CarElectricEngine(Car):
    def __init__(self, make: str, model: str, year: int, color: str = "grey"):
        super().__init__(make=make, model=model, year=year, color=color, fuel="electric")

    def noise(self, speed: int) -> str:
        return "hmmmmmm"
Class diagram showing inheritance from Car to CarGasEngine and CarElectricEngine
Class diagram showing the Car class as a parent class, with CarGasEngine and CarElectricEngine as child classes that inherit from Car.
Callout

Note that the noise method in the CarElectricEngine class is overridden to provide a different implementation than the one in the Car class. This is called method overriding, and it allows us to define a different behavior for a method in a subclass. When we call the noise method on an instance of CarElectricEngine, it will use the overridden method, rather than the one defined in the Car class.

However in CarGasEngine, we do not override the noise method, so it will use the one defined in the Car class.

More on overriding methods in a moment.

You can see that the CarGasEngine class is defined in a similar way to the Car class, but it inherits from the Car class by including it in parentheses after the class name. The __init__ method of the CarGasEngine class also has a call to super().__init__(). The super() function is a way to refer specifically to the parent class, in this case, the Car class. This allows us to call the __init__ method of the Car class, which sets up all of the properties that a Car has.

Applying Inheritance to Our Document Class

For our Document class, we have a few different types of documents available from the Project Gutenberg website. We are currently using plain text files, but there are also HTML files that we can download. They will have the same information, but the data within will be structured in a slightly different way. We can use inheritance to create a pair of new classes: HTMLDocument and PlainTextDocument, that both inherit from the Document class. This will allow us to keep all of the common functionality in the Document class, but to add any additional functionality specific to each document type.

Most of what we’ve written so far is specific to reading and parsing data out of the plain text files, so almost all of the code from Document can be copied. We’ll leave the functions for gutenberg_url, get_line_count, and get_word_occurrence.

In addition, we’ll need an __init__ in our Document class. At the moment, all it does is save the filename in the filename property, but we might expand this in the future. We’ll also need a reference to the super().__init__() in our PlainTextDocument. At the moment, our classes look like this:

PYTHON

class Document:
    @property
    def gutenberg_url(self) -> str | None:
        if self.id:
            return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}.txt"
        return None

    def __init__(self, filepath: str):
        self.filepath = filepath

    def get_line_count(self) -> int:
        return len(self._content.splitlines())

    def get_word_occurrence(self, word: str) -> int:
        return self._content.lower().count(word.lower())

PYTHON

import re

from textanalysis_tool.document import Document


class PlainTextDocument(Document):
    TITLE_PATTERN = r"^Title:\s*(.*?)\s*$"
    AUTHOR_PATTERN = r"^Author:\s*(.*?)\s*$"
    ID_PATTERN = r"^Release date:\s*.*?\[eBook #(\d+)\]"
    CONTENT_PATTERN = r"\*\*\* START OF THE PROJECT GUTENBERG EBOOK .*? \*\*\*(.*?)\*\*\* END OF THE PROJECT GUTENBERG EBOOK .*? \*\*\*"

    def __init__(self, filepath: str):
        super().__init__(filepath=filepath)

    def _extract_metadata_element(self, pattern: str, text: str) -> str | None:
        match = re.search(pattern, text, re.MULTILINE)
        return match.group(1).strip() if match else None

    def get_content(self, filepath: str) -> str:
        raw_text = self.read(filepath)

        match = re.search(self.CONTENT_PATTERN, raw_text, re.DOTALL)
        if match:
            return match.group(1).strip()
        raise ValueError(f"File {filepath} is not a valid Project Gutenberg Text file.")

    def get_metadata(self, filepath: str) -> dict:
        raw_text = self.read(filepath)

        title = self._extract_metadata_element(self.TITLE_PATTERN, raw_text)
        author = self._extract_metadata_element(self.AUTHOR_PATTERN, raw_text)
        extracted_id = self._extract_metadata_element(self.ID_PATTERN, raw_text)

        return {
            "title": title,
            "author": author,
            "id": int(extracted_id) if extracted_id else None,
        }

    def read(self, file_path: str) -> None:
        with open(file_path, "r", encoding="utf-8") as file:
            raw_text = file.read()

        if not raw_text:
            raise ValueError(f"File {self.filepath} contains no content.")

        if isinstance(raw_text, bytes):
            raise ValueError(f"File {self.filepath} is not a valid text file.")

        return raw_text

We’ll also have another class for reading HTML files. This will be similar to the ´PlainTextDocument´ class, but it will use the ´BeautifulSoup´ library to parse the HTML file and extract the content and metadata. Rather than type out the entire class now, you can either copy and paste the code below into a new file called ´src/textanalysis_tool/html_document.py´, or you can download the file from the Workshop Resources.

Prerequisite

As we do not have BeautifulSoup in our environment yet, you will need to add it using uv:

uv add beautifulsoup4

This will install the package to your environment as well as add it to your pyproject.toml file.


import re

from bs4 import BeautifulSoup

from textanalysis_tool.document import Document


class HTMLDocument(Document):
    URL_PATTERN = "^https://www.gutenberg.org/files/([0-9]+)/.*"

    @property
    def gutenberg_url(self) -> str | None:
        if self.id:
            return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}-h.zip"
        return None

    def __init__(self, filepath: str):
        super().__init__(filepath=filepath)

        extracted_id = re.search(self.URL_PATTERN, self.metadata.get("url", ""), re.DOTALL)
        self.id = int(extracted_id.group(1)) if extracted_id.group(1) else None

    def read(self, filepath) -> BeautifulSoup:
        with open(filepath, encoding="utf-8") as file_obj:
            parsed_file = BeautifulSoup(file_obj, "html.parser")

        # Check that the file is parsable as HTML
        if not parsed_file or not parsed_file.find("h1"):
            raise ValueError("The file could not be parsed as HTML.")

        return parsed_file

    def get_content(self, filepath: str) -> str:
        parsed_file = self.read(filepath)

        # Find the first h1 tag (The book title)
        title_h1 = parsed_file.find("h1")

        # Collect all the content after the first h1
        content = []
        for element in title_h1.find_next_siblings():
            text = element.get_text(strip=True)

            # Stop early if we hit this text, which indicate the end of the book
            if "END OF THE PROJECT GUTENBERG EBOOK" in text:
                break

            if text:
                content.append(text)

        return "\n\n".join(content)

    def get_metadata(self, filename) -> str:
        parsed_file = self.read(filename)

        title = parsed_file.find("meta", {"name": "dc.title"})["content"]
        author = parsed_file.find("meta", {"name": "dc.creator"})["content"]
        url = parsed_file.find("meta", {"name": "dcterms.source"})["content"]
        extracted_id = re.search(self.URL_PATTERN, url, re.DOTALL)
        id = int(extracted_id.group(1)) if extracted_id.group(1) else None

        return {"title": title, "author": author, "id": id, "url": url}

Overriding Methods

Notice that in the HTMLDocument class, we have overridden the gutenberg_url property to return the URL for the HTML version of the book. This is an example of how we can override methods and properties in a subclass to provide specialized behavior. When we create an instance of HTMLDocument, it will use the gutenberg_url property defined in the HTMLDocument class, rather than the one defined in the Document class.

Callout

When overriding methods, it’s important to ensure that the new method has the same signature as the method being overridden. This means that the new method should have the same name, number of parameters, and return type as the method being overridden.

Additionally, the __init__ is technically also an overridden method, since it is defined in the parent class. However, since we are calling the parent class’s __init__ method using super(), we are not completely replacing the behavior of the parent class’s __init__ method, but rather extending it. We can do the exact same thing with other methods if we want to add some functionality to an existing method, rather than completely replacing it.

Testing our Inherited Classes

Now let’s try out our classes. We already have the pg2680.txt file in our ´scratch´ folder, now let’s download the HTML version of the same book from Project Gutenberg. You can download it from this link. (Note that the file is zipped, as it also contains images. We won’t be using the images, but you’ll need to unzip the file to get to the HTML file.) Once you have the HTML file, place it in the ´scratch´ folder alongside the ´pg2680.txt´ file.

You can either copy and paste the code below into a new file called demo_inheritance.py, or you can download the file from the Workshop Resources.

PYTHON

import sys

sys.path.insert(0, "src")

from textanalysis_tool.document import Document
from textanalysis_tool.plain_text_document import PlainTextDocument
from textanalysis_tool.html_document import HTMLDocument

# Test the PlainTextDocument class
plain_text_doc = PlainTextDocument(filepath="scratch/pg2680.txt")
print(f"Plain Text Document Title: {plain_text_doc.title}")
print(f"Plain Text Document Author: {plain_text_doc.author}")
print(f"Plain Text Document ID: {plain_text_doc.id}")
print(f"Plain Text Document Line Count: {plain_text_doc.line_count}")
print(f"Plain Text Document 'the' Occurrences: {plain_text_doc.get_word_occurrence('the')}")
print(f"Plain Text Document Gutenberg URL: {plain_text_doc.gutenberg_url}")
print(f"Type of Plain Text Document: {type(plain_text_doc)}")
print(f"Parent Class: {type(plain_text_doc).__bases__[0]}")

print("=" * 40)

# Test the HTMLDocument class
html_doc = HTMLDocument(filepath="scratch/pg2680-images.html")
print(f"HTML Document Title: {html_doc.title}")
print(f"HTML Document Author: {html_doc.author}")
print(f"HTML Document ID: {html_doc.id}")
print(f"HTML Document Line Count: {html_doc.line_count}")
print(f"HTML Document 'the' Occurrences: {html_doc.get_word_occurrence('the')}")
print(f"HTML Document Gutenberg URL: {html_doc.gutenberg_url}")
print(f"Type of HTML Document: {type(html_doc)}")
print(f"Parent Class: {type(html_doc).__bases__[0]}")

print("=" * 40)

# We can't use the Document class directly
doc = Document(filepath="scratch/pg2680.txt")

You should get some output that looks like this:

Plain Text Document Title: Meditations
Plain Text Document Author: Emperor of Rome Marcus Aurelius
Plain Text Document ID: 2680
Plain Text Document Line Count: 6845
Plain Text Document 'the' Occurrences: 5736
Plain Text Document Gutenberg URL: https://www.gutenberg.org/cache/epub/2680/pg2680.txt
Type of Plain Text Document: <class 'textanalysis_tool.plain_text_document.PlainTextDocument'>
Parent Class: <class 'textanalysis_tool.document.Document'>
========================================
HTML Document Title: Meditations
HTML Document Author: Marcus Aurelius, Emperor of Rome, 121-180
HTML Document ID: 2680
HTML Document Line Count: 5635
HTML Document 'the' Occurrences: 6161
HTML Document Gutenberg URL: https://www.gutenberg.org/cache/epub/2680/pg2680-h.zip
Type of HTML Document: <class 'textanalysis_tool.html_document.HTMLDocument'>
========================================
Parent Class: <class 'textanalysis_tool.document.Document'>
Traceback (most recent call last):
  File "E:\Projects\Python\scratch\textanalysis-tool\scratch\demo_inheritance.py", line 34, in <module>
    doc = Document(filepath="scratch/pg2680.txt")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Projects\Python\scratch\textanalysis-tool\src\textanalysis_tool\document.py", line 14, in __init__
    self.content = self.get_content(filepath)
                   ^^^^^^^^^^^^^^^^
AttributeError: 'Document' object has no attribute 'get_content'

Note that the end of the script results in an error - since the Document class is no longer contains the get_content or get_metadata methods, it cannot be used directly. However we don’t get an error until we try to call one of those methods.

Callout

This is a use case for something called an abstract base class, which is a class that is designed to be inherited from, but never instantiated directly. One way to handle this would be to add these methods to the Document class, but have them raise a NotImplementedError. This way, if someone tries to instantiate the Document class directly, they will get an error indicating that maybe this class is not meant to be used directly:

PYTHON

class Document:
    @property
    def gutenberg_url(self) -> str | None:
        if self.id:
            return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}.txt"
        return None

    @property
    def line_count(self) -> int:
        return len(self.content.splitlines())

    def __init__(self, filepath: str):
        self.filepath = filepath
        self.content = self.get_content(filepath)

        metadata = self.get_metadata(filepath)
        self.title = metadata.get("title")
        self.author = metadata.get("author")
        self.id = metadata.get("id")

    def get_word_occurrence(self, word: str) -> int:
        return self.content.lower().count(word.lower())

    def get_content(self, filepath: str) -> str:
        raise NotImplementedError("This method should be implemented by subclasses.")

    def get_metadata(self, filepath: str) -> dict[str, str | None]:
        raise NotImplementedError("This method should be implemented by subclasses.")

Another way to handle this is to use the abc module from the standard library, which provides a way to define abstract base classes. This is a more formal way to define a class that is meant to be inherited from, but not instantiated directly:

PYTHON

from abc import ABC, abstractmethod


class Document(ABC):
    @property
    def gutenberg_url(self) -> str | None:
        if self.id:
            return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}.txt"
        return None

    @property
    def line_count(self) -> int:
        return len(self.content.splitlines())

    def __init__(self, filepath: str):
        self.filepath = filepath
        self.content = self.get_content(filepath)

        self.metadata = self.get_metadata(filepath)
        self.title = self.metadata.get("title")
        self.author = self.metadata.get("author")
        self.id = self.metadata.get("id")

    def get_word_occurrence(self, word: str) -> int:
        return self.content.lower().count(word.lower())

    @abstractmethod
    def get_content(self, filepath: str) -> str:
        pass

    @abstractmethod
    def get_metadata(self, filepath: str) -> dict[str, str | None]:
        pass


:::

## Unit Testing

One of the first effects of this is that our `Document` class is no longer directly testable, since
it cannot be instantiated directly. However, we can still test the `PlainTextDocument` and
`HTMLDocument` classes, which will also indirectly test the `Document` class. You can either copy
the code below into two new files called `tests/test_plain_text_document.py` and
`tests/test_html_document.py`, or you can download the files from the [Workshop Resources](./workshop_resources.html).
(Also make sure to delete the existing `tests/test_document.py` file, since it is no longer
applicable.)

`tests/test_plain_text_document.py`

::: spoiler

```python
import pytest
from unittest.mock import mock_open

from textanalysis_tool.plain_text_document import PlainTextDocument

TEST_DATA = """
Title: Test Document

Author: Test Author

Release date: January 1, 2001 [eBook #1234]
                Most recently updated: February 2, 2002

*** START OF THE PROJECT GUTENBERG EBOOK TEST ***
This is a test document. It contains words.
It is only a test document.
*** END OF THE PROJECT GUTENBERG EBOOK TEST ***
"""


@pytest.fixture(autouse=True)
def mock_file(monkeypatch):
    mock = mock_open(read_data=TEST_DATA)
    monkeypatch.setattr("builtins.open", mock)
    return mock


@pytest.fixture
def doc():
    return PlainTextDocument(filepath="tests/example_file.txt")


def test_create_document(doc):
    assert doc.title == "Test Document"
    assert doc.author == "Test Author"
    assert isinstance(doc.id, int) and doc.id == 1234


def test_empty_file(monkeypatch):
    # Mock an empty file
    mock = mock_open(read_data="")
    monkeypatch.setattr("builtins.open", mock)

    with pytest.raises(ValueError):
        PlainTextDocument(filepath="empty_file.txt")


def test_binary_file(monkeypatch):
    # Mock a binary file
    mock = mock_open(read_data=b"\x00\x01\x02")
    monkeypatch.setattr("builtins.open", mock)

    with pytest.raises(ValueError):
        PlainTextDocument(filepath="binary_file.bin")


def test_document_line_count(doc):
    assert doc.line_count == 2


def test_document_word_occurrence(doc):
    assert doc.get_word_occurrence("test") == 2

tests/test_html_document.py

PYTHON

import pytest
from unittest.mock import mock_open

from textanalysis_tool.html_document import HTMLDocument

TEST_DATA = """
<head>
  <meta name="dc.title" content="Test Document">
  <meta name="dcterms.source" content="https://www.gutenberg.org/files/1234/1234-h/1234-h.htm">
  <meta name="dc.creator" content="Test Author">
</head>
<body>
  <h1>Test Document</h1>
  <p>
    This is a test document. It contains words.
    It is only a test document.
  </p>
</body>
"""


@pytest.fixture(autouse=True)
def mock_file(monkeypatch):
    mock = mock_open(read_data=TEST_DATA)
    monkeypatch.setattr("builtins.open", mock)
    return mock


@pytest.fixture
def doc():
    return HTMLDocument(filepath="tests/example_file.txt")


def test_create_document(doc):
    assert doc.title == "Test Document"
    assert doc.author == "Test Author"
    assert isinstance(doc.id, int) and doc.id == 1234


def test_empty_file(monkeypatch):
    # Mock an empty file
    mock = mock_open(read_data="")
    monkeypatch.setattr("builtins.open", mock)

    with pytest.raises(ValueError):
        HTMLDocument(filepath="empty_file.html")


def test_document_line_count(doc):
    assert doc.line_count == 2


def test_document_word_occurrence(doc):
    assert doc.get_word_occurrence("test") == 2
Challenge

Challenge 1: Predict the output

What will happen when we run the following code? Why?

PYTHON

class Animal:
    def __init__(self, name: str):
        print(f"Creating an animal named {name}")
        self.name = name

    def whoami(self) -> str:
        return f"I am a {type(self)} named {self.name}"

class Dog(Animal):
    def __init__(self, name: str):
        print(f"Creating a dog named {name}")
        super().__init__(name=name)

class Cat(Animal):
    def __init__(self, name: str):
        print(f"Creating a cat named {name}")


animals = [Dog(name="Chance"), Cat(name="Sassy"), Dog(name="Shadow")]

for animal in animals:
    print(animal.whoami())

We get some of the output we expect, but we also get an error:

Creating a dog named Chance
Creating an animal named Chance
Creating a cat named Sassy
Creating a dog named Shadow
Creating an animal named Shadow
I am a <class '__main__.Dog'> named Chance

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 22
     19 animals = [Dog(name="Chance"), Cat(name="Sassy"), Dog(name="Shadow")]
     21 for animal in animals:
---> 22     print(animal.whoami())

Cell In[4], line 7, in Animal.whoami(self)
      6 def whoami(self) -> str:
----> 7     return f"I am a {type(self)} named {self.name}"

AttributeError: 'Cat' object has no attribute 'name'

We failed to call the super().__init__() method in the Cat class, so the name property was never set. When we then try to access the instance property name in the whoami method, we get an AttributeError.

Challenge

Challenge 2: Class Methods and Properties

We’ve mostly focused on instance properties and methods so far, but classes can also have what are called “class properties” and “class methods”. These are properties and methods that are associated with the class itself, rather than with an instance of the class.

Without running it, what do you think the following code will do? Will it run without error?

PYTHON

class Animal:
    PHYLUM = "Chordata"

    def __init__(self, name: str):
        self.name = name

    def whoami(self) -> str:
        return f"I am a {type(self)} named {self.name} in the phylum {self.PHYLUM}"

class Snail(Animal):
    def __init__(self, name: str):
        super().__init__(name=name)

animal1 = Snail(name="Gary")
Animal.PHYLUM = "Mollusca"
print(animal1.whoami())

animal2 = Snail(name="Slurms MacKenzie")
print(animal2.whoami())

creature3 = Snail(name="Turbo")
creature3.CLASS = "Gastropoda"
print(creature3.whoami(), "and is in class", creature3.CLASS)

The PHYLUM property is a class property, so it is shared among all instances of the class.

There’s two things about this piece of code that are a bit tricky.

1 The PHYLUM property is a class property, so it is shared among all instances of the class. When we set Animal.PHYLUM = "Mollusca", we are actually modifying the class property for all instances going forward, which is why when we print animal2.whoami(), it shows that the phylum is still “Mollusca”, even though we created a new instance of Snail.

2 - We never defined a CLASS property in the Animal or Snail class, but we can actually still create a new property on an instance of a class at any time. (Generally, this is not a good idea, as it can cause confusion when you reference a property that doesn’t exist in any class definition, but it is technically possible.)

Challenge

Challenge 3: Create a new subclass

The previous challenge is not quite correct, as canonnically “Slurms MacKenzie” is not a snail, but a slug. Create a subclass of ‘Animal’ called “Mollusk” that inherits from “Animal”, but only sets the class property PHYLUM to “Mollusca”. Then create two subclasses of “Mollusk”: “Snail” and “Slug”.

You can implement any methods or properties you want in the “Snail” and “Slug” classes, but you may also just leave them empty like so:

PYTHON

class MyClass:
    pass

It is not necessary for the Snail and Slug classes to have their own __init__ methods, as they will inherit the __init__ method from the Animal class through the Mollusk class.

PYTHON

class Animal:
    PHYLUM = "Chordata"

    def __init__(self, name: str):
        self.name = name

    def whoami(self) -> str:
        return f"I am a {type(self)} named {self.name} in the phylum {self.PHYLUM}"

class Mollusk(Animal):
    PHYLUM = "Mollusca"

class Snail(Mollusk):
    pass

class Slug(Mollusk):
    pass
Key Points
  • Inheritance allows us to create a new class that is a specialized version of an existing class
  • We can override methods and properties in a subclass to provide specialized behavior