Days?
Days?
Months?
Days?
Months?
Years?
What about Decades?
Code will live a long time
Code will outlive your time working on it
Code will become legacy
Legacy Code
A codebase where you do not have direct communication with
the original authors
Why does this matter?
YOU
YOU
Tasks completed in
a month
Fast-forward a few years........
?
Future
Collaborator
Tasks completed in
a month
Future
Collaborator
Tasks completed in
a month
Future
Collaborator
Tasks completed in
a month
You have a duty to deliver value in a timely
manner
You have a duty to make it easy for future
collaborators to deliver value in a timely manner
Make it easy for your future collaborator
Do you want future collaborators to thank you
for your foresight ........
.......or curse your name?
Robust Python
Building a Vocabulary with User-Defined Types
Patrick Viafore
HSV.py Lunchtime Talk
4/18/2023
A Robust Codebase
A robust codebase is full of code that is clean and
maintainable, and is resilient and error-free in spite of
constant change
How do you build your codebase without
knowing what's in the future?
Legacy Code
A codebase where you do not have direct communication with
the original authors
Your codebase is your most effective tool of
communication and collaboration with other
developers
Software Engineering is simultaneously
archaeology and time-travel
Let your codebase be your Rosetta Stone
Communicating
Intent
Reduce Errors
Improve Future
Development
https://learning.oreilly.com/get-learning/?code=ROBUSTP21
Typechecking User Defined Types
Extensibility Building a Safety Net
Typechecking User Defined Types
Extensibility Building a Safety Net
User-defined types are a way for you to define
your own vocabulary
The abstractions you choose communicate to
the future
def print_receipt(
order: Order,
restaurant: tuple[str, int, str]):
total = (order.subtotal *
(1 + tax[restaurant[2]]))
print(Receipt(restaurant[0], total))
def print_receipt(
order: Order,
restaurant: tuple[str, int, str]):
total = (order.subtotal *
(1 + tax[restaurant[2]]))
print(Receipt(restaurant[0], total))
def print_receipt(
order: Order,
restaurant: Restaurant):
total = (order.subtotal *
(1 + tax[restaurant.city]))
print(Receipt(restaurant.name, total))
User-defined types unify mental models
User-defined types make it easier to reason
about a codebase
User-defined types make it harder to make
errors
I want to focus on the "why" we use user-defined
types, not "how" to create them
Enumerations
MOTHER_SAUCES = ("Béchamel",
"Velouté",
"Espagnole",
"Tomato",
"Hollandaise")
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
# What was MOTHER_SAUCE[0] again?
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("Bechamel",
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("BBQ Sauce",
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("BBQ Sauce",
["Onion"])
Wrong
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("Bechamel",
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce("Béchamel",
["Onion"])
Strings can be anything
Restrict your choices with enumerations
from enum import Enum
class MotherSauce(Enum):
BÉCHAMEL = "Béchamel"
VELOUTÉ = "Velouté"
ESPAGNOLE = "Espagnole"
TOMATO = "Tomato"
HOLLANDAISE = "Hollandaise"
MotherSauce.BÉCHAMEL
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: str,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MOTHER_SAUCE[0],
["Onion"])
def create_daughter_sauce(
mother_sauce: MotherSauce,
extra_ingredients: list[str]):
# ...
create_daughter_sauce(MotherSauce.BÉCHAMEL,
["Onion"])
Catch mistakes with static analysis
Use enumerations to prevent collaborators from
using incorrect values
Use enumerations to simplify choices
Prevent bugs
What about composite data?
Data Classes
Author's Name
Recipe
Ingredient List
Author's Life
Story
# of Servings
Recipe Name
Online Recipe
Author's Name
Recipe
Ingredient List
Author's Life
Story
# of Servings
Recipe Name
Data classes represent a relationship between
data
@dataclass
class OnlineRecipe:
name: str
author_name: str
author_life_story: str
number_of_servings: int
ingredients: list[Ingredient]
recipe: str
recipe = OnlineRecipe(
"Pasta With Sausage",
"Pat Viafore",
"When I was 15, I remember ......",
6,
["Rigatoni", ..., "Basil", "Sausage"],
"First, brown the sausage ...."
)
recipe.name
>>> "Pasta With Sausage"
recipe.number_of_servings
>>> 6
Data classes represent heterogeneous data
Heterogeneous data
● Heterogeneous data is data that may be multiple different
types (such as str, int, list[Ingredient], etc.)
● Typically not iterated over -- you access a single field at a time
# DO NOT DO THIS
recipe = {
"name": "Pasta With Sausage",
"author": "Pat Viafore",
"story": "When I was 15, I remember ....",
"number_of_servings": 6,
"ingredients": ["Rigatoni", ..., "Basil"],
"recipe": "First, brown the sausage ...."
}
# is life story the right key name?
put_on_top_of_page(recipe["life_story"])
# What type is recipe?
def double_recipe(recipe: dict):
# .... snip ....
Any time a developer has to trawl through the
codebase to answer a question about data, it
wastes time and increases frustration
This will create mistakes and incorrect
assumptions, leading to bugs
recipe: OnlineRecipe = create_recipe()
# type checker will catch problems
put_on_top_of_page(recipe.life_story)
def double_recipe(recipe: OnlineRecipe):
# .... snip ....
Use data classes to group data together and
reduce errors when accessing
You communicate intent and prevent future
developers from making errors
Data classes aren't appropriate for all
heterogeneous data
Invariants
Invariants
● Fundamental truths throughout your codebase
● Developers will depend on these truths and build
assumptions on them
● These are not universal truths in every possible system, just
your system
Invariants
● Sauce will never be put on top of other toppings
(cheese is a topping in this scenario).
● Toppings may go above or below cheese.
● Pizza will have at most only one sauce.
● Dough radius can be only whole numbers.
● The radius of dough may be only between 15 and 30 cm
@dataclass
class Pizza:
radius_in_cm: int
toppings: list[str]
pizza = Pizza(15, ["Tomato Sauce",
"Mozzarella",
"Pepperoni"])
# THIS IS BAD!
pizza.radius_in_cm = 1000
pizza.toppings.append("Alfredo Sauce")
Classes
@dataclass
class Pizza:
radius_in_cm: int
toppings: list[str]
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
INVARIANT
CHECKING
# Now an exception
pizza = Pizza(1000, ["Tomato Sauce",
"Mozzarella",
"Pepperoni"])
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
class Pizza:
def __init__(self, radius_in_cm: int,
toppings: list[str])
assert 15 <= radius_in_cm <= 30
sauces = [t for t in toppings
if is_sauce(t)]
assert len(sauces) <= 1
self.__radius_in_cm = radius_in_cm
sauce = sauces[:1]
self.__toppings = sauce + 
[t for t in toppings if not is_sauce(t)]
"Private"
Members
# Linters will catch this error
# Also a runtime error
pizza.__radius_in_cm = 1000
pizza.__toppings.append("Alfredo Sauce")
Classes create invariants that developers cannot
easily modify
Classes must always preserve these invariants
class Pizza:
# ... snip ...
def add_topping(self, topping: str):
if is_sauce(topping) and self.has_sauce():
raise TooManySaucesError()
if is_sauce(topping):
self.__toppings.insert(0, topping)
else:
self.__toppings.append(topping)
Give future collaborators solid classes to reason
upon
Classes allow you to group inter-related data
and preserve invariants across their lifetime
Creating User-Defined Types
The abstractions you choose communicate to
the future
API
Methods
@dataclass
class OnlineRecipe:
name: str
author_name: str
author_life_story: str
number_of_servings: int
ingredients: list[Ingredient]
recipe: str
@dataclass
class OnlineRecipe:
...
def condense(self):
# ...
def scale_to(self, servings: int)
# ...
The Paradox of Code Interfaces
You have one chance to get your interface
right....
.....but you won't know it's right until it's used.
What happens if you don't get it right
When it all goes wrong
● Duplicated functionality
● Broken mental models
● Reduced testing
Test-Driven Development
Test-Driven Development
Test-Driven Development
README-Driven Development
Usability Testing
Natural Design
Magic Methods
@dataclass
class Ingredient:
name: str
units: ImperialMeasure # cup, tbsp, or tsp
quantity: int
class Recipe:
...
def add_ingredient(self, ing: Ingredient):
if self.contains(ing):
???
else:
self.ingredients.append(ingredients)
class Recipe:
...
def add_ingredient(self, ing: Ingredient):
matched = self._find_ingredient(ing)
if matched:
matched += ing
else:
self.ingredients.append(ingredients)
class Ingredient:
def __add__(self, rhs: Ingredient):
assert self.name == rhs.name
converted = rhs.convert(self.units)
return Ingredient(self.name,
self.units,
(self.quantity +
converted.quantity))
Context Managers
def reserve_table(table: Table):
assert not table.is_reserved()
table.hold()
wait_for_user_confirmation()
if is_confirmed():
table.reserve()
table.remove_hold()
def reserve_table(table: Table):
assert not table.is_reserved()
table.hold()
wait_for_user_confirmation()
if is_confirmed():
table.reserve()
table.remove_hold()
def reserve_table(table: Table):
assert not table.is_reserved()
with table.hold():
wait_for_user_confirmation()
if is_confirmed():
table.reserve()
from contextlib import contextmanager
class Table:
...
@contextmanager
def hold(self):
...
@contextmanager
def hold(self):
# .. hold logic ...
try:
yield
finally:
table.remove_hold()
Remove friction in your code
Sub-typing
Sub-typing is a relationship between types
A sub-type has all the same behaviors as a
super-type (it may also customize some
behaviors)
Inheritance
class Rectangle:
def __init__(self, height: int, width: int):
self.__height = height
self.__width = width
def set_width(self, width: int):
self.__width = width
def set_height(self, height: int):
self.__height = height
# ... snip getters ...
class Square(Rectangle):
def __init__(self, side_length: int):
self.set_height(side_length)
def set_height(self, side_length: int):
self.__height = side_length
self.__width = side_length
def set_width(self, side_length: int):
self.__height = side_length
self.__width = side_length
What I've just shown you has a very subtle error.
FOOD_TRUCK_AREA_SIZES = [
Rectangle(1, 20),
Rectangle(5, 5),
Rectangle(20, 30)
]
Is a square a rectangle?
Yes, a square is a rectangle (geometrically
speaking)
Is a square substitutable for a rectangle?
FOOD_TRUCK_AREA_SIZES = [
Rectangle(1, 20),
Rectangle(5, 5),
Rectangle(20, 30)
]
FOOD_TRUCK_AREA_SIZES = [
Rectangle(1, 20),
Square(5),
Rectangle(20, 30)
]
What can go wrong?
def double_food_truck_area_widths():
for ft_shape in FOOD_TRUCK_AREA_SIZES:
old_size = ft_shape.get_width()
ft_shape.set_width(old_size * 2)
def double_food_truck_area_widths():
for ft_shape in FOOD_TRUCK_AREA_SIZES:
old_size = ft_shape.get_width()
# What happens when this is a square?
ft_shape.set_width(old_size * 2)
def double_food_truck_area_widths():
for ft_shape in FOOD_TRUCK_AREA_SIZES:
old_size = ft_shape.get_width()
old_height = ft_shape.get_height()
ft_shape.set_width(old_size * 2)
# Is this a reasonable assert?
assert ft_shape.get_height() == old_height
Developers will write code based on the
constraints of the superclass
Do not let subclasses violate those constraints
Someone changing a super-class should not
need to know about all possible sub-classes
Liskov Substitution Principle
Substitutability
● Do not strengthen pre-conditions
● Do not weaken post-conditions
● Do not raise new types of exceptions
○ Looking at you, NotImplementedError
● Overridden functions almost always should call super()
Inheritance
Is-A
Is-A
Can-Substitute-For-A
Types of sub-typing
● Inheritance
● Duck Typing
● Protocols
● Plug-ins
● etc.
How you subtype will influence how easy it is to
make errors as code changes
Why did I give this talk?
Who Am I?
Principal Software Engineer
Cloud Software Group
Owner of Kudzera, LLC
Author of Robust Python
Organizer of HSV.py
Contact Me
@PatViaforever
pat@kudzera.com
User-Defined Types.pdf

User-Defined Types.pdf