Robust Python
Pat Viafore
July 2021
About Me
About Me
Started my career working on lightning detection systems
About Me
Worked on telecommunication systems for 9+ years
About Me
Now working for Canonical, building Ubuntu images for the cloud
What do these have in common?
About Me
Wrote code in 2007 -- was operational for at least 10 years
About Me
Wrote code in 2009 -- still operational; worked with code from 2001 (20 years old)
About Me
Working on a codebase that started in 2007 (14 years old)
I've worked on code that lives a long
time
Code will live a long time
Code will outlive your time working on it
Code will become legacy
Legacy Code
A codebase where you do not have direct communication
with the original authors
Why does this matter?
YOU
YOU
Tasks completed in
a month
Fast-forward a few years........
Future
Maintainer
Future
Maintainer
Tasks completed in
a month
?
Future
Maintainer
Tasks completed in
a month
Future
Maintainer
Tasks completed in
a month
You have a duty to deliver value in a timely
manner
You have a duty to make it easy for future
collaborators to deliver value in a timely
manner
Make it easy for your future maintainers
Do you want future maintainers to thank you
for your foresight ........
.......or curse your name?
Robustness
Robustness
A robust codebase is resilient and error-free in spite of constant
change
Your code will change
How do you make it easy for someone
working in your codebase?
What about someone you'll never meet?
The goal is to speed up future developers
They will not know your code as well as you
do
Detect errors through tooling, make your code
understandable, and make it hard to introduce
faults
Communicating
Intent
Ask yourself why you make the decisions in
code?
Why?
● For Loop vs. While Loop
● Lists vs. Sets
● Dataclasses vs. Classes
● Dictionaries vs. Classes
text = "This is some generic text"
index = 0
while index < len(text):
print(text[index])
index += 1
for character in text:
print(character)
The abstractions you choose communicate to
the future
Robust Python
Typechecking User Defined Types
Extensibility
Building a Safety
Net
Typechecking User Defined Types
Extensibility
Building a Safety
Net
User-Defined Types in Robust Python
● Enumerations
● Dataclasses
● Classes
● Subtyping
● API Design
● Protocols
● Runtime Validation with Pydantic
User-Defined Types in Robust Python
● Enumerations
● Dataclasses
● Classes
● Subtyping
● API Design
● Protocols
● Runtime Validation with Pydantic
User-defined types are a way for you to define
your own vocabulary
The abstractions you choose communicate to
the future
def print_receipt(
order: Order,
restaurant: tuple[str, int,
str]):
total = (order.subtotal *
(1 + tax[restaurant[2]]))
def print_receipt(
order: Order,
restaurant: tuple[str, int,
str]):
total = (order.subtotal *
(1 + tax[restaurant[2]]))
def print_receipt(
order: Order,
restaurant: Restaurant):
total = (order.subtotal *
(1 + tax[restaurant.city]))
print(Receipt(restaurant.name,
User-defined types unify mental models
User-defined types make it easier to reason
about a codebase
User-defined types make it harder to make
errors
I want to focus on the "why" we use user-
defined types, not "how" to create them
Enumerations
MOTHER_SAUCES = ("Béchamel",
"Velouté",
"Espagnole",
"Tomato",
"Hollandaise")
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
# What was MOTHER_SAUCE[0] again?
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("Bechamel",
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("BBQ Sauce",
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("BBQ Sauce",
["Onion"])
Wrong
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("Bechamel",
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("Béchamel",
["Onion"])
Strings can be anything
Restrict your choices with enumerations
from enum import Enum
class MotherSauce(Enum):
BÉCHAMEL = "Béchamel"
VELOUTÉ = "Velouté"
ESPAGNOLE = "Espagnole"
TOMATO = "Tomato"
HOLLANDAISE = "Hollandaise"
MotherSauce.BÉCHAMEL
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: MotherSauce,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MotherSauce.BÉCHAMEL,
["Onion"])
Catch mistakes with static analysis
Use enumerations to prevent collaborators
from using incorrect values
Use enumerations to simplify choices
Prevent bugs
What about composite data?
Data classes
Data classes represent a relationship between
data
Author's Name
Recipe
Ingredient List
Author's Life
Story
# of servings
Recipe Name
Online Recipe
Author's Name
Recipe
Ingredient List
Author's Life
Story
# of servings
Recipe Name
@dataclass
class OnlineRecipe:
name: str
author_name: str
author_life_story: str
number_of_servings: int
ingredients: list[Ingredient]
recipe: str
recipe = OnlineRecipe(
"Pasta With Sausage",
"Pat Viafore",
"When I was 15, I remember ......",
6,
["Rigatoni", ..., "Basil", "Sausage"],
"First, brown the sausage ...."
)
recipe.name
>>> "Pasta With Sausage"
recipe.number_of_servings
>>> 6
Data classes represent heterogeneous
data
Heterogeneous data
● Heterogeneous data is data that may be multiple different
types (such as str, int, list[Ingredient], etc.)
● Typically not iterated over -- you access a single field at a
time
# DO NOT DO THIS
recipe = {
"name": "Pasta With Sausage",
"author": "Pat Viafore",
"story": "When I was 15, I remember
....",
"number_of_servings": 6,
"ingredients": ["Rigatoni", ..., "Basil"],
"recipe": "First, brown the sausage ...."
# is life story the right key name?
do_something(recipe["life_story"])
# What type is recipe?
def do_something_else(recipe: dict):
# .... snip ....
Any time a developer has to trawl through the
codebase to answer a question about data, it
wastes time and increases frustration
This will create mistakes and incorrect
assumptions, leading to bugs
recipe: OnlineRecipe = create_recipe()
# type checker will catch problems
do_something(recipe.life_story)
def do_something_else(recipe: OnlineRecipe):
# .... snip ....
Use data classes to group data together and
reduce errors when accessing
You communicate intent and prevent future
developers from making errors
Data classes aren't appropriate for all
heterogeneous data
Invariants
Invariants
● Fundamental truths throughout your codebase
● Developers will depend on these truths and build
assumptions on them
● These are not universal truths in every possible system,
just your system
Invariants
● Sauce will never be put on top of other toppings
(cheese is a topping in this scenario).
● Toppings may go above or below cheese.
● Pizza will have at most only one sauce.
● Dough radius can be only whole numbers.
● The radius of dough may be only between 15 and 30 cm
@dataclass
class Pizza:
radius_in_cm: int
toppings: list[str]
pizza = Pizza(15, ["Tomato Sauce",
"Mozzarella",
"Pepperoni"])
# THIS IS BAD!
pizza.radius_in_cm = 1000
pizza.toppings.append("Alfredo Sauce")
Classes
@dataclass
class Pizza:
radius_in_cm: int
toppings: list[str]
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
INVARIAN
T
CHECKING
# Now an exception
pizza = Pizza(1000, ["Tomato Sauce",
"Mozzarella",
"Pepperoni"])
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
"Private"
Members
# Linters will catch this error
# Also a runtime error
pizza.__radius_in_cm = 1000
pizza.__toppings.append("Alfredo Sauce")
Classes create invariants that developers
cannot easily modify
Classes must always preserve these
invariants
class Pizza:
# ... snip ...
def add_topping(self, topping: str):
if is_sauce(topping) and self.has_sauce():
raise TooManySaucesError()
if is_sauce(topping):
self.__toppings.insert(0, topping)
else:
self.__toppings.append(topping)
Give future collaborators solid classes to
reason upon
Classes allow you to group inter-related data
and preserve invariants across their lifetime
Creating User-Defined Types
Sub-typing
Sub-typing is a relationship between
types
A sub-type has all the same behaviors as a
super-type (it may also customize some
behaviors)
Inheritance
class Rectangle:
def __init__(self, height: int, width: int):
self.__height = height
self.__width = width
def set_width(self, width: int):
self.__width = width
def set_height(self, height: int):
self.__height = height
# ... snip getters ...
class Square(Rectangle):
def __init__(self, side_length: int):
self.set_height(side_length)
def set_height(self, side_length: int):
self.__height = side_length
self.__width = side_length
def set_width(self, side_length: int):
self.__height = side_length
self.__width = side_length
What I've just shown you has a very subtle
error.
FOOD_TRUCK_AREA_SIZES = [
Rectangle(1, 20),
Rectangle(5, 5),
Rectangle(20, 30)
]
Is a square a rectangle?
Yes, a square is a rectangle (geometrically
speaking)
Is a square substitutable for a rectangle?
FOOD_TRUCK_AREA_SIZES = [
Rectangle(1, 20),
Rectangle(5, 5),
Rectangle(20, 30)
]
FOOD_TRUCK_AREA_SIZES = [
Rectangle(1, 20),
Square(5),
Rectangle(20, 30)
]
What can go wrong?
def double_food_truck_area_widths():
for ft_shape in FOOD_TRUCK_AREA_SIZES:
old_size = ft_shape.get_width()
ft_shape.set_width(old_size * 2)
def double_food_truck_area_widths():
for ft_shape in FOOD_TRUCK_AREA_SIZES:
old_size = ft_shape.get_width()
# What happens when this is a square?
ft_shape.set_width(old_size * 2)
def double_food_truck_area_widths():
for food_truck_shape in
FOOD_TRUCK_AREA_SIZES:
old_size = food_truck_shape.get_width()
food_truck_shape.set_width(old_size *
2)
def double_food_truck_area_widths():
for ft_shape in FOOD_TRUCK_AREA_SIZES:
old_size = ft_shape.get_width()
old_height = ft_shape.get_height()
ft_shape.set_width(old_size * 2)
# Is this a reasonable assert?
assert ft_shape.get_height() ==
old_height
Developers will write code based on the
constraints of the superclass
Do not let subclasses violate those constraints
Someone changing a super-class should not
need to know about all possible sub-classes
Liskov Substitution Principle
Substitutability
● Do not strengthen pre-conditions
● Do not weaken post-conditions
● Do not raise new types of exceptions
○ Looking at you, NotImplementedError
● Overridden functions almost always should call super()
Inheritance
Is-A
Is-A
Can-Substitute-For-A
Types of sub-typing
● Inheritance
● Duck Typing
● Protocols
● Plug-ins
● etc.
How you subtype will influence how easy it is
to make errors as code changes
The choices you make in code may reduce
future errors
The choices you make in code communicate
to the future
The abstractions you make in code
communicate to the future
Software Development is both Archaeology
and Time Travel
Tips for writing Robust Python
● Don't rely on developer's memory or their ability to find all
usage in a codebase
● Communicate intent through deliberate decisions in your
codebase
● Make it hard for developers to do the wrong thing
● Make them succeed by default
● Use tooling to catch errors when they do happen
You have a duty to deliver value in a timely
manner
You have a duty to make it easy for future
collaborators to deliver value in a timely
manner
After all, you'll step into a project one day that
is legacy
How do you want that codebase to look?
Robust Python
https://learning.oreilly.com/get-learning/?code=ROBUSTP21
Contact
Twitter: @PatViaforever
Blog: https://patviafore.com
Contracting/Consulting through Kudzera, LLC
https://kudzera.com
E-mail: pat@kudzera.com

Robust Python.pptx