Add missing type annotations to a large Python codebase is not easy. The major challenges include: limited developer time, tons of missing types, code ownership, and active development. We solved the problem by building an automated refactoring pipeline that run CircleCI jobs to create incremental Github pull requests to backfill missing types using heuristic rules and MonkeyType. The refactor apps use LibCST to modify Python syntax tree. Changes are split into small reviewable pull requests and assigned to code owners to review. So far, the work has added type annotations to more than 45,000 Python functions and saved tons of engineering efforts.
5. A Large Python Codebase
Python code
1.8 million lines
27,000 files
120,000 functions
~200 active developers
Lots of TypeError,
AttributeError, ValueError
6. Type Annotation and Mypy
Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
7. Automated Refactoring
Automated code changes for fixing large scale tech
debt (Code Formatting, Type Annotation, Dead Code
Cleanup)
LibCST Features:
● Concrete Syntax Tree
● Transformer and Matcher API
● Metadata with static analysis
Recommended tool: LibCST
A library for modifying Python code easily.
16. MonkeyType: add missing types based on runtime data
1. Collect types by running Python program.
2. Aggregate collected types and apply to the code using LibCST.
Run test cases and apply types:
17. Make it more fun!
Automated weekly updates and leaderboards!