Extending Classes with Inheritance
Last updated on 2025-09-28 | Edit this page
Overview
Questions
- What if we want classes that are similar, but handle slightly different cases?
- How can we avoid duplicating code in our classes?
Objectives
- Explain the concept of inheritance in object-oriented programming
- Demonstrate how to create a subclass that inherits from a parent class
- Show how to override methods and properties in a subclass
Extending Classes with Inheritance
So far you may be wondering why classes are useful. After all, all we’ve really done in essence is make a tiny module with some functions in it that are slightly more complicated than normal functions. One of the real powers of classes is the ability to limit code duplicate through a concept called inheritance.
Inheritance
Inheritance is a way to create a new class that contains all of the same properties and methods as an existing class, but allows us to add additional new properties and methods, or to override existing methods. This allows us to create a new class that is a specialized version of an existing class, without having to rewrite a whole bunch of code.
Taking a look at our Car class from earlier, we might want to create
a new class for a specific type of engine, like a Gas Engine or an
Electric Engine. Both kinds of cars will have the same basic properties
and methods, but they will also have some additional properties and
methods that are specific to the type of engine, or properties that are
set by default, like our fuel
property.
But since both types of cars are still cars, they will share a lot of the same properties and methods. Rather than repeating all of the code from the Car class in both our new classes, we can use inheritance to create our new classes based on the Car class:
In python, this would look something like this:
PYTHON
class Car:
def __init__(self, make: str, model: str, year: int, color: str = "grey", fuel: str = "gasoline"):
self.make = make
self.model = model
self.year = year
self.color = color
self.fuel = fuel
def honk(self) -> str:
return "beep"
def paint(self, new_color: str) -> None:
self.color = new_color
def noise(self, speed: int) -> str:
if speed <= 10:
return "putt putt"
else:
return "vrooom"
class CarGasEngine(Car):
def __init__(self, make: str, model: str, year: int, color: str = "grey"):
super().__init__(make=make, model=model, year=year, color=color, fuel="gasoline")
class CarElectricEngine(Car):
def __init__(self, make: str, model: str, year: int, color: str = "grey"):
super().__init__(make=make, model=model, year=year, color=color, fuel="electric")
def noise(self, speed: int) -> str:
return "hmmmmmm"

Note that the noise
method in the
CarElectricEngine
class is overridden to provide a
different implementation than the one in the Car
class.
This is called method overriding, and it allows us to define a different
behavior for a method in a subclass. When we call the noise
method on an instance of CarElectricEngine
, it will use the
overridden method, rather than the one defined in the Car
class.
However in CarGasEngine
, we do not override the
noise
method, so it will use the one defined in the
Car
class.
More on overriding methods in a moment.
You can see that the CarGasEngine class is defined in a similar way
to the Car class, but it inherits from the Car class by including it in
parentheses after the class name. The __init__
method of
the CarGasEngine class also has a call to
super().__init__()
. The super()
function is a
way to refer specifically to the parent class, in this case, the Car
class. This allows us to call the __init__
method of the
Car class, which sets up all of the properties that a Car has.
Applying Inheritance to Our Document Class
For our Document
class, we have a few different types of
documents available from the Project Gutenberg website. We are currently
using plain text files, but there are also HTML files that we can
download. They will have the same information, but the data within will
be structured in a slightly different way. We can use inheritance to
create a pair of new classes: HTMLDocument
and
PlainTextDocument
, that both inherit from the
Document
class. This will allow us to keep all of the
common functionality in the Document
class, but to add any
additional functionality specific to each document type.
Most of what we’ve written so far is specific to reading and parsing
data out of the plain text files, so almost all of the code from
Document
can be copied. We’ll leave the functions for
gutenberg_url
, get_line_count
, and
get_word_occurrence
.
In addition, we’ll need an __init__
in our
Document
class. At the moment, all it does is save the
filename in the filename
property, but we might expand this
in the future. We’ll also need a reference to the
super().__init__()
in our PlainTextDocument
.
At the moment, our classes look like this:
PYTHON
class Document:
@property
def gutenberg_url(self) -> str | None:
if self.id:
return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}.txt"
return None
def __init__(self, filepath: str):
self.filepath = filepath
def get_line_count(self) -> int:
return len(self._content.splitlines())
def get_word_occurrence(self, word: str) -> int:
return self._content.lower().count(word.lower())
PYTHON
import re
from textanalysis_tool.document import Document
class PlainTextDocument(Document):
TITLE_PATTERN = r"^Title:\s*(.*?)\s*$"
AUTHOR_PATTERN = r"^Author:\s*(.*?)\s*$"
ID_PATTERN = r"^Release date:\s*.*?\[eBook #(\d+)\]"
CONTENT_PATTERN = r"\*\*\* START OF THE PROJECT GUTENBERG EBOOK .*? \*\*\*(.*?)\*\*\* END OF THE PROJECT GUTENBERG EBOOK .*? \*\*\*"
def __init__(self, filepath: str):
super().__init__(filepath=filepath)
def _extract_metadata_element(self, pattern: str, text: str) -> str | None:
match = re.search(pattern, text, re.MULTILINE)
return match.group(1).strip() if match else None
def get_content(self, filepath: str) -> str:
raw_text = self.read(filepath)
match = re.search(self.CONTENT_PATTERN, raw_text, re.DOTALL)
if match:
return match.group(1).strip()
raise ValueError(f"File {filepath} is not a valid Project Gutenberg Text file.")
def get_metadata(self, filepath: str) -> dict:
raw_text = self.read(filepath)
title = self._extract_metadata_element(self.TITLE_PATTERN, raw_text)
author = self._extract_metadata_element(self.AUTHOR_PATTERN, raw_text)
extracted_id = self._extract_metadata_element(self.ID_PATTERN, raw_text)
return {
"title": title,
"author": author,
"id": int(extracted_id) if extracted_id else None,
}
def read(self, file_path: str) -> None:
with open(file_path, "r", encoding="utf-8") as file:
raw_text = file.read()
if not raw_text:
raise ValueError(f"File {self.filepath} contains no content.")
if isinstance(raw_text, bytes):
raise ValueError(f"File {self.filepath} is not a valid text file.")
return raw_text
We’ll also have another class for reading HTML files. This will be similar to the ´PlainTextDocument´ class, but it will use the ´BeautifulSoup´ library to parse the HTML file and extract the content and metadata. Rather than type out the entire class now, you can either copy and paste the code below into a new file called ´src/textanalysis_tool/html_document.py´, or you can download the file from the Workshop Resources.
As we do not have BeautifulSoup in our environment yet, you will need
to add it using uv
:
uv add beautifulsoup4
This will install the package to your environment as well as add it
to your pyproject.toml
file.
import re
from bs4 import BeautifulSoup
from textanalysis_tool.document import Document
class HTMLDocument(Document):
URL_PATTERN = "^https://www.gutenberg.org/files/([0-9]+)/.*"
@property
def gutenberg_url(self) -> str | None:
if self.id:
return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}-h.zip"
return None
def __init__(self, filepath: str):
super().__init__(filepath=filepath)
extracted_id = re.search(self.URL_PATTERN, self.metadata.get("url", ""), re.DOTALL)
self.id = int(extracted_id.group(1)) if extracted_id.group(1) else None
def read(self, filepath) -> BeautifulSoup:
with open(filepath, encoding="utf-8") as file_obj:
parsed_file = BeautifulSoup(file_obj, "html.parser")
# Check that the file is parsable as HTML
if not parsed_file or not parsed_file.find("h1"):
raise ValueError("The file could not be parsed as HTML.")
return parsed_file
def get_content(self, filepath: str) -> str:
parsed_file = self.read(filepath)
# Find the first h1 tag (The book title)
title_h1 = parsed_file.find("h1")
# Collect all the content after the first h1
content = []
for element in title_h1.find_next_siblings():
text = element.get_text(strip=True)
# Stop early if we hit this text, which indicate the end of the book
if "END OF THE PROJECT GUTENBERG EBOOK" in text:
break
if text:
content.append(text)
return "\n\n".join(content)
def get_metadata(self, filename) -> str:
parsed_file = self.read(filename)
title = parsed_file.find("meta", {"name": "dc.title"})["content"]
author = parsed_file.find("meta", {"name": "dc.creator"})["content"]
url = parsed_file.find("meta", {"name": "dcterms.source"})["content"]
extracted_id = re.search(self.URL_PATTERN, url, re.DOTALL)
id = int(extracted_id.group(1)) if extracted_id.group(1) else None
return {"title": title, "author": author, "id": id, "url": url}
Overriding Methods
Notice that in the HTMLDocument
class, we have
overridden the gutenberg_url
property to return the URL for
the HTML version of the book. This is an example of how we can override
methods and properties in a subclass to provide specialized behavior.
When we create an instance of HTMLDocument
, it will use the
gutenberg_url
property defined in the
HTMLDocument
class, rather than the one defined in the
Document
class.
When overriding methods, it’s important to ensure that the new method has the same signature as the method being overridden. This means that the new method should have the same name, number of parameters, and return type as the method being overridden.
Additionally, the __init__
is technically also an
overridden method, since it is defined in the parent class. However,
since we are calling the parent class’s __init__
method
using super()
, we are not completely replacing the behavior
of the parent class’s __init__
method, but rather extending
it. We can do the exact same thing with other methods if we want to add
some functionality to an existing method, rather than completely
replacing it.
Testing our Inherited Classes
Now let’s try out our classes. We already have the
pg2680.txt
file in our ´scratch´ folder, now let’s download
the HTML version of the same book from Project Gutenberg. You can
download it from this
link. (Note that the file is zipped, as it also contains images. We
won’t be using the images, but you’ll need to unzip the file to get to
the HTML file.) Once you have the HTML file, place it in the ´scratch´
folder alongside the ´pg2680.txt´ file.
You can either copy and paste the code below into a new file called
demo_inheritance.py
, or you can download the file from the
Workshop Resources.
PYTHON
import sys
sys.path.insert(0, "src")
from textanalysis_tool.document import Document
from textanalysis_tool.plain_text_document import PlainTextDocument
from textanalysis_tool.html_document import HTMLDocument
# Test the PlainTextDocument class
plain_text_doc = PlainTextDocument(filepath="scratch/pg2680.txt")
print(f"Plain Text Document Title: {plain_text_doc.title}")
print(f"Plain Text Document Author: {plain_text_doc.author}")
print(f"Plain Text Document ID: {plain_text_doc.id}")
print(f"Plain Text Document Line Count: {plain_text_doc.line_count}")
print(f"Plain Text Document 'the' Occurrences: {plain_text_doc.get_word_occurrence('the')}")
print(f"Plain Text Document Gutenberg URL: {plain_text_doc.gutenberg_url}")
print(f"Type of Plain Text Document: {type(plain_text_doc)}")
print(f"Parent Class: {type(plain_text_doc).__bases__[0]}")
print("=" * 40)
# Test the HTMLDocument class
html_doc = HTMLDocument(filepath="scratch/pg2680-images.html")
print(f"HTML Document Title: {html_doc.title}")
print(f"HTML Document Author: {html_doc.author}")
print(f"HTML Document ID: {html_doc.id}")
print(f"HTML Document Line Count: {html_doc.line_count}")
print(f"HTML Document 'the' Occurrences: {html_doc.get_word_occurrence('the')}")
print(f"HTML Document Gutenberg URL: {html_doc.gutenberg_url}")
print(f"Type of HTML Document: {type(html_doc)}")
print(f"Parent Class: {type(html_doc).__bases__[0]}")
print("=" * 40)
# We can't use the Document class directly
doc = Document(filepath="scratch/pg2680.txt")
You should get some output that looks like this:
Plain Text Document Title: Meditations
Plain Text Document Author: Emperor of Rome Marcus Aurelius
Plain Text Document ID: 2680
Plain Text Document Line Count: 6845
Plain Text Document 'the' Occurrences: 5736
Plain Text Document Gutenberg URL: https://www.gutenberg.org/cache/epub/2680/pg2680.txt
Type of Plain Text Document: <class 'textanalysis_tool.plain_text_document.PlainTextDocument'>
Parent Class: <class 'textanalysis_tool.document.Document'>
========================================
HTML Document Title: Meditations
HTML Document Author: Marcus Aurelius, Emperor of Rome, 121-180
HTML Document ID: 2680
HTML Document Line Count: 5635
HTML Document 'the' Occurrences: 6161
HTML Document Gutenberg URL: https://www.gutenberg.org/cache/epub/2680/pg2680-h.zip
Type of HTML Document: <class 'textanalysis_tool.html_document.HTMLDocument'>
========================================
Parent Class: <class 'textanalysis_tool.document.Document'>
Traceback (most recent call last):
File "E:\Projects\Python\scratch\textanalysis-tool\scratch\demo_inheritance.py", line 34, in <module>
doc = Document(filepath="scratch/pg2680.txt")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Projects\Python\scratch\textanalysis-tool\src\textanalysis_tool\document.py", line 14, in __init__
self.content = self.get_content(filepath)
^^^^^^^^^^^^^^^^
AttributeError: 'Document' object has no attribute 'get_content'
Note that the end of the script results in an error - since the
Document
class is no longer contains the
get_content
or get_metadata
methods, it cannot
be used directly. However we don’t get an error until we try to call one
of those methods.
This is a use case for something called an abstract base class, which
is a class that is designed to be inherited from, but never instantiated
directly. One way to handle this would be to add these methods to the
Document
class, but have them raise a
NotImplementedError
. This way, if someone tries to
instantiate the Document
class directly, they will get an
error indicating that maybe this class is not meant to be used
directly:
PYTHON
class Document:
@property
def gutenberg_url(self) -> str | None:
if self.id:
return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}.txt"
return None
@property
def line_count(self) -> int:
return len(self.content.splitlines())
def __init__(self, filepath: str):
self.filepath = filepath
self.content = self.get_content(filepath)
metadata = self.get_metadata(filepath)
self.title = metadata.get("title")
self.author = metadata.get("author")
self.id = metadata.get("id")
def get_word_occurrence(self, word: str) -> int:
return self.content.lower().count(word.lower())
def get_content(self, filepath: str) -> str:
raise NotImplementedError("This method should be implemented by subclasses.")
def get_metadata(self, filepath: str) -> dict[str, str | None]:
raise NotImplementedError("This method should be implemented by subclasses.")
Another way to handle this is to use the abc
module from
the standard library, which provides a way to define abstract base
classes. This is a more formal way to define a class that is meant to be
inherited from, but not instantiated directly:
PYTHON
from abc import ABC, abstractmethod
class Document(ABC):
@property
def gutenberg_url(self) -> str | None:
if self.id:
return f"https://www.gutenberg.org/cache/epub/{self.id}/pg{self.id}.txt"
return None
@property
def line_count(self) -> int:
return len(self.content.splitlines())
def __init__(self, filepath: str):
self.filepath = filepath
self.content = self.get_content(filepath)
self.metadata = self.get_metadata(filepath)
self.title = self.metadata.get("title")
self.author = self.metadata.get("author")
self.id = self.metadata.get("id")
def get_word_occurrence(self, word: str) -> int:
return self.content.lower().count(word.lower())
@abstractmethod
def get_content(self, filepath: str) -> str:
pass
@abstractmethod
def get_metadata(self, filepath: str) -> dict[str, str | None]:
pass
:::
## Unit Testing
One of the first effects of this is that our `Document` class is no longer directly testable, since
it cannot be instantiated directly. However, we can still test the `PlainTextDocument` and
`HTMLDocument` classes, which will also indirectly test the `Document` class. You can either copy
the code below into two new files called `tests/test_plain_text_document.py` and
`tests/test_html_document.py`, or you can download the files from the [Workshop Resources](./workshop_resources.html).
(Also make sure to delete the existing `tests/test_document.py` file, since it is no longer
applicable.)
`tests/test_plain_text_document.py`
::: spoiler
```python
import pytest
from unittest.mock import mock_open
from textanalysis_tool.plain_text_document import PlainTextDocument
TEST_DATA = """
Title: Test Document
Author: Test Author
Release date: January 1, 2001 [eBook #1234]
Most recently updated: February 2, 2002
*** START OF THE PROJECT GUTENBERG EBOOK TEST ***
This is a test document. It contains words.
It is only a test document.
*** END OF THE PROJECT GUTENBERG EBOOK TEST ***
"""
@pytest.fixture(autouse=True)
def mock_file(monkeypatch):
mock = mock_open(read_data=TEST_DATA)
monkeypatch.setattr("builtins.open", mock)
return mock
@pytest.fixture
def doc():
return PlainTextDocument(filepath="tests/example_file.txt")
def test_create_document(doc):
assert doc.title == "Test Document"
assert doc.author == "Test Author"
assert isinstance(doc.id, int) and doc.id == 1234
def test_empty_file(monkeypatch):
# Mock an empty file
mock = mock_open(read_data="")
monkeypatch.setattr("builtins.open", mock)
with pytest.raises(ValueError):
PlainTextDocument(filepath="empty_file.txt")
def test_binary_file(monkeypatch):
# Mock a binary file
mock = mock_open(read_data=b"\x00\x01\x02")
monkeypatch.setattr("builtins.open", mock)
with pytest.raises(ValueError):
PlainTextDocument(filepath="binary_file.bin")
def test_document_line_count(doc):
assert doc.line_count == 2
def test_document_word_occurrence(doc):
assert doc.get_word_occurrence("test") == 2
tests/test_html_document.py
PYTHON
import pytest
from unittest.mock import mock_open
from textanalysis_tool.html_document import HTMLDocument
TEST_DATA = """
<head>
<meta name="dc.title" content="Test Document">
<meta name="dcterms.source" content="https://www.gutenberg.org/files/1234/1234-h/1234-h.htm">
<meta name="dc.creator" content="Test Author">
</head>
<body>
<h1>Test Document</h1>
<p>
This is a test document. It contains words.
It is only a test document.
</p>
</body>
"""
@pytest.fixture(autouse=True)
def mock_file(monkeypatch):
mock = mock_open(read_data=TEST_DATA)
monkeypatch.setattr("builtins.open", mock)
return mock
@pytest.fixture
def doc():
return HTMLDocument(filepath="tests/example_file.txt")
def test_create_document(doc):
assert doc.title == "Test Document"
assert doc.author == "Test Author"
assert isinstance(doc.id, int) and doc.id == 1234
def test_empty_file(monkeypatch):
# Mock an empty file
mock = mock_open(read_data="")
monkeypatch.setattr("builtins.open", mock)
with pytest.raises(ValueError):
HTMLDocument(filepath="empty_file.html")
def test_document_line_count(doc):
assert doc.line_count == 2
def test_document_word_occurrence(doc):
assert doc.get_word_occurrence("test") == 2
Challenge 1: Predict the output
What will happen when we run the following code? Why?
PYTHON
class Animal:
def __init__(self, name: str):
print(f"Creating an animal named {name}")
self.name = name
def whoami(self) -> str:
return f"I am a {type(self)} named {self.name}"
class Dog(Animal):
def __init__(self, name: str):
print(f"Creating a dog named {name}")
super().__init__(name=name)
class Cat(Animal):
def __init__(self, name: str):
print(f"Creating a cat named {name}")
animals = [Dog(name="Chance"), Cat(name="Sassy"), Dog(name="Shadow")]
for animal in animals:
print(animal.whoami())
We get some of the output we expect, but we also get an error:
Creating a dog named Chance
Creating an animal named Chance
Creating a cat named Sassy
Creating a dog named Shadow
Creating an animal named Shadow
I am a <class '__main__.Dog'> named Chance
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 22
19 animals = [Dog(name="Chance"), Cat(name="Sassy"), Dog(name="Shadow")]
21 for animal in animals:
---> 22 print(animal.whoami())
Cell In[4], line 7, in Animal.whoami(self)
6 def whoami(self) -> str:
----> 7 return f"I am a {type(self)} named {self.name}"
AttributeError: 'Cat' object has no attribute 'name'
We failed to call the super().__init__()
method in the
Cat
class, so the name
property was never set.
When we then try to access the instance property name
in
the whoami
method, we get an
AttributeError
.
Challenge 2: Class Methods and Properties
We’ve mostly focused on instance properties and methods so far, but classes can also have what are called “class properties” and “class methods”. These are properties and methods that are associated with the class itself, rather than with an instance of the class.
Without running it, what do you think the following code will do? Will it run without error?
PYTHON
class Animal:
PHYLUM = "Chordata"
def __init__(self, name: str):
self.name = name
def whoami(self) -> str:
return f"I am a {type(self)} named {self.name} in the phylum {self.PHYLUM}"
class Snail(Animal):
def __init__(self, name: str):
super().__init__(name=name)
animal1 = Snail(name="Gary")
Animal.PHYLUM = "Mollusca"
print(animal1.whoami())
animal2 = Snail(name="Slurms MacKenzie")
print(animal2.whoami())
creature3 = Snail(name="Turbo")
creature3.CLASS = "Gastropoda"
print(creature3.whoami(), "and is in class", creature3.CLASS)
The PHYLUM
property is a class property, so it is shared
among all instances of the class.
There’s two things about this piece of code that are a bit tricky.
1 The PHYLUM
property is a class property, so it is
shared among all instances of the class. When we set
Animal.PHYLUM = "Mollusca"
, we are actually modifying the
class property for all instances going forward, which is why when we
print animal2.whoami()
, it shows that the phylum is still
“Mollusca”, even though we created a new instance of
Snail
.
2 - We never defined a CLASS
property in the
Animal
or Snail
class, but we can actually
still create a new property on an instance of a class at any time.
(Generally, this is not a good idea, as it can cause confusion when you
reference a property that doesn’t exist in any class definition, but it
is technically possible.)
Challenge 3: Create a new subclass
The previous challenge is not quite correct, as canonnically “Slurms
MacKenzie” is not a snail, but a slug. Create a subclass of ‘Animal’
called “Mollusk” that inherits from “Animal”, but only sets the class
property PHYLUM
to “Mollusca”. Then create two subclasses
of “Mollusk”: “Snail” and “Slug”.
You can implement any methods or properties you want in the “Snail” and “Slug” classes, but you may also just leave them empty like so:
It is not necessary for the Snail
and Slug
classes to have their own __init__
methods, as they will
inherit the __init__
method from the Animal
class through the Mollusk
class.
- Inheritance allows us to create a new class that is a specialized version of an existing class
- We can override methods and properties in a subclass to provide specialized behavior