Successfully reported this slideshow.
Your SlideShare is downloading. ×

Type Annotations in Python: Whats, Whys and Wows!

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 29 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Type Annotations in Python: Whats, Whys and Wows! (20)

Advertisement

Recently uploaded (20)

Advertisement

Type Annotations in Python: Whats, Whys and Wows!

  1. 1. Type Annotations in Python: Whats, Whys & Wows Andreas Dewes (@japh44) Europython 2017 - Rimini
  2. 2. Outline Explain why type annotations are interesting and how they came to be. Show you how you can use them in your own code base. Analyze how people actually use them and show you what else you can do with them.
  3. 3. Motivation: When We Discover Bugs Formal Proof of Correctness Design Review Unit / Integration / … Testing Compiler Errors via Type Checker Code Reviews Static Analysis We Don‘t, Our Customer Does Position in Software Life Cycle Enter: Type Hinting
  4. 4. The Python Way: Gradual Typing Unannotated code Annotated code external code our code external type information
  5. 5. History of Type Hints in Python https://www.python.org/dev/peps/**** PEP 482 – Literature Overview PEP 483 – Theory of Type Hints PEP 484 – Type Hints 2014 3.5 PEP 3107 – Function Annotations 2006 3.0 DraftInformal Accepted / Final PEP 544 – Protocols: Structural Subtyping PEP 526 – Variable Annotations 2016 3.6 2017
  6. 6. Annotation Syntax in Python Return type annotation Argument annotation Variable annotation (Python 3.6 only)
  7. 7. Architecture of Type Hinting in Python Python Interpreter Just stores type annotations in a special __annotations__ variable. No runtime effects otherwise! The „typing“ module Allows us to specify the types that we want to annotate our code with (required for non-standard types and advanced use cases) External tools (e.g. mypy) Uses annotations to perform type checking (or other functionality)
  8. 8. Getting Started With Type Hints: MyPy • Originally Written by Jukka Lehtosalo, now also strongly pushed by Dropbox and Guido van Rossum Easy to install & use > pip install mypy > mypy [file to check] … http://mypy-lang.org/
  9. 9. Example Code Base: Flor
  10. 10. Our initial code • Small but functional codebase: Less than 200 lines of code • No external dependencies: Good as an example • Not many „exotic“ types: Easy to annotate • Compatible with Python 2+3: Good for testing different approaches
  11. 11. A Test Script
  12. 12. Adding Type Hints • Go through the code function by function, adding hints to all arguments and return types • Possibly add hints to ambiguous variable initializations (if needed) • Import and use necessary types from the typing module • Try to make mypy happy
  13. 13. I broke mypy! Quick fix: bytes → _bytes
  14. 14. Running MyPy: It (Finally) Works! Argument 1 to „BloomFilter“ has incompatible type „float“; expected „int“ (…) Argument 1 to „add“ of „BloomFilter“ has incompatible type „str“; expected „bytes“ Unsupported operand types for + (List[int] and „str“)
  15. 15. But … now we lost Python 2 compatibility
  16. 16. Second Approach: Type Comments • Instead of writing hints as code, we add them as comments • A special syntax tells mypy to treat them as type hints • We still need to import the „typing“ module (there is a workaround though)
  17. 17. Nice! But what about code we can‘t change?
  18. 18. Stub files (.pyi) Mypy will look for Stub files in several places (search path, current directory, typeshed, …). If it finds a .pyi file and a .py file, it will only load the .pyi! Third Approach: Stub Files
  19. 19. Building a Stub File • As before, start with the code • Remove all actual values and function bodies, just leaving the signatures • Use ellipsis (…) to indicate missing parts • Add type hints • Think „header files“ for Python
  20. 20. Comparing Our Approaches https://travis-ci.org/DCSO/flor commentedinline https://stackoverflow.com/questions/43516780/adding-type-information- without-dependency-on-typing-module
  21. 21. Pros & Cons Inline + The „canonical“ way of using type hints + Easy to read + Code and hints are kept in one place - Only compatible with Python ≥ 3.3 (or ≥ 3.6 if using variable annotations) Type Comments + Keeps code compatible with Python ≥ 2.7 + Allows variable annotations regardless of Python version - Ugly (?) - Still requires importing the „typing“ module (but there is a workaround) Stub Files + Does not modify original source at all + Allows use of latest features regardless of Python version - Duplicates maintenance effort - Does not (yet) allow checking of the actual code against the stubs
  22. 22. There‘s (Much) More To Type Hints! • Generics • Type variables • Classes • Generators • …
  23. 23. Are People Actually Using This? Let‘s Check! We download code from the top 1000 Python repositories on Github and check it for any kind of type hinting: Inline annotations, type comments and stubs. https://github.com/adewes/type-annotations-in-the-wild
  24. 24. 103 projects with at least 1 type annotation 53 projects with at least 10 24 projects with at least 100 inline comments pyi files
  25. 25. Top Python Repositories with Type Hints
  26. 26. Function & Variable Annotations: Not Only For Type Hinting > foo.__annotations__ {‘x‘ : int, ‘return‘ : bool}
  27. 27. Example: Contracts in Python (not saying this is a good idea, just that it works…) https://gist.github.com/adewes/b87c8adc95e768ebf6366130ad6d85a7
  28. 28. Summary Type hinting works & makes your code more robust (for you and for others) You can use it now already (regardless if you use Python 2 or 3) Annotations can do more than type hinting (but think about it twice)
  29. 29. Thanks! Slides: https://slideshare.com/japh44 Me: @japh44 andreas@7scientists.com License: CC BY-NC 3.0: https://creativecommons.org/licenses/by-nc/3.0/en/

Editor's Notes

  • You might have heard of type annotations or already used them
    Type annotations are actually and old feature, but still not widely used
    There‘s a lot of activity in the field now, so it‘s interesting to have a look at what we can do with them
    Personally I have worked in static analysis for the last three years, so I‘m really interested in anything related to code quality. This talk was an opportunity for me to have a closer look at the type annotation and type hinting ecosystem in Python and see what has changed in the last years (spoiler alert: a lot)
  • First, I‘m going to explain to you why we need type annotations at all and how they actually made their way into Python.
    Then, I‘ll try to show you how you can use them in your own code base. For that we‘re going to integrate type hinting into an existing project and look at different strategies for doing so.
    Then, I‘ll try to find type hints in the wild to see if (and how) people are already using them in large Python projects
    Finally, I‘ll quickly show you what else you can do with annotations if you‘re not interested in type hints
  • Now your first question is probably: Why do we need type annotations at all?!? After all, our code is running fine without them!
    To answer that, let‘s think a bit about when we find bugs in our code. The arrow tries to show where we are in the software development lifecycle (obviously this is only one way of seeing software development)
    So what‘s the first opportunity for us to catch bugs? Well, when we design or write the code of course: In a perfect world, we could just anticipate all use cases and boundary conditions of the software that we develop, and then design it such that it fits. Problem solved!
    Of course that‘s not what happens: Though there are some tools available that can help us to formally proof that our software is correct, the use cases for these tools do not include most everyday software development tasks.
    The next stage at which we might catch bugs is when the code is already written: We can have other people read it and try to find bugs in it, which is actually quite effective, or we can have automated tools that go through the code and try to find problems in there. For Python there are several such tools available (I wrote one myself), so if you don‘t yet use them I recommend you to.
    If we weren‘t using Python but another language like C++ or Rust, we would have a compiler that would look at our program and tell us if something doesn‘t fit together. Most compilers do that by looking at the information about types which we put in the code. Unfortunately, the Python interpreter doesn‘t know much about types and is very lenient when interpreting code (you probably noticed this when changing code and e.g. forgetting an import statement).
    So if the compiler or interpreter doesn‘t find the bugs, what can we do? Well, we can have our code do some exercises for us in the form of unit testing, where we try to make sure that it behaves as intended and that at least every line of code runs. Again, this is common best practice and if you‘re not doing this you should start.
    Finally of course, if we don‘t find our bugs, the customer might: I call that the „banana strategy“ because it‘s similar to shipping bananas: When they‘re sent to the customer they‘re actually still green and unrip, and only at the customer site they ripen as they age. I‘m not saying this is a bad thing necessarily, but it usually isn‘t what we want.
    Turns out the Python community has a similar view on this, so there was an effort to implement something for Python that is similar to a type checking compiler, and that‘s what type hinting is about.
  • So how can we implement type hinting in Python? Well, we could of course create a new version of Python that is fully typed.
    While this is doable, it would probably lead to a fork of the language, as there are probably billions of lines of untyped Python code, and demanding of people to rewrite their code with types seems not a good strategy given how long it already took to convince most people to make some minor adaptions to their code to support Python 3.
    So the Python community took a different approach in the form of „gradual typing“: Instead of forcing people to annotate the whole code, let them instead just annotate the parts where they think it makes sense.
    This leads to another problem of course, because now you have both typed and untyped code mixed in your own projects and in external code that you use
    So what you need to also have is a way to add types to external code, which is another ingredient in the system that we‘ll discuss later.
  • So this sounds all nice, but how did we actually get there? Let‘s have a quick look at the history of type hinting and annotations in Python!
    Well, the first thing that we needed to do type checking are of course: Types in our code! Python didn‘t have a way to add this information to the code, which is why the syntax had to be changed. This was done via PEP 3107.
    If you want to learn more about that I recommend you to read the PEP (Python Enhancement Proposals), which are the official documents through which new things get added to Python. You can view the PEPs by putting the number above into this URL.
    You can see this PEP is quite old and was written so that the changes could be included in Python 3. As there wasn‘t enough time (or the will) to actually work out a type system at this time, the annotations were added without and further information about what to do with them.
    Now fast forward to 2014: After almost 10 years of type annotations, the community finally decided to write a formal spec for the annotations to decide what to do with them actually. This is PEP 484, which didn‘t introduce any new syntax this time (apart from one small thing), but which instead formalized the use of type hinting via a new module. There are two very interesting PEPs 482 and 483, which explain the ideas behind this BTW.
    So you could think that was it, but no! Actually type hinting is still evolving: In 2016 there was PEP 526, which introduced a new syntax element again into Python 3.6. This was necessary because it turned out the initial idea for type annotations wasn‘t always sufficient for all use cases.
    And finally there‘s PEP 544, which introduces a second approach to typing, which is called „structural subtyping“, that I‘m pretty excited about but that we won‘t talk about today.
  • Ok, you say „please show me the syntax already!“, so here it is in all it‘s glory.
    What are we seeing here? Three types of annotations:
    Argument annotations, which annotate what goes into a function.
    Return type annotations, which annotate what comes out of a function
    Variable annotations (the Python 3.6 addition), that annotate variables created inside of a module, class or function body.
    Now you can like this syntax or not (personally I like it), but you have to admit it‘s pretty unobtrusive
  • But let‘s take a step back: How do these type annotations actually play together and produce something useful?
    For instance: What happens if Python runs a program with type annotations? Answer: Nothing special! Python just takes the annotations and puts them in a magic variable called __annotations__, but otherwise ignores them when running your code. This means adding type annotations to your code don‘t change the runtime behavior in any way.
    Second: What is this strange module that we import? Well, to express type information we need a system, and as Python didn‘t have one the community created one in form of the „typing“ module. The module lets us specify simple and complex type information in our program, and is quite powerful. So whenever you see this import at the top of a module you can be pretty sure that the code uses type hinting.
    Okay, but who does the actual type checking work, if the Python interpreter doesn‘t do it? The answer: External tools! There are several of them available, the most popular being probably mypy, but we also have IDEs like PyCharm that can use the type hinting information.
    The choice to not have the interpreter do the type checking might seem odd but it‘s a sensible strategy in my opinion.
  • So for this talk we‘re going to have a look at mypy
    It was originally developed by Jukka Lehotsalo with ideas from his PhD thesis
    As far as I know Jukka now works at Dropbox, who heavily use type hints in their code base and who push the mypy development under the lead of Guide van Rossum, which probably shows you that they are really serious about this.
    As you can see from their website, which I copied here, the tool is still in beta and heavy development
    It‘s pretty easy to install and use as we‘re going to see how we can use it
  • As an example, we‘re going to add type hints to a small OS Python project I developed for the German Cyber Security Organization (DCSO), called Flor.
    Flor is the German translation of Bloom, and as you might guess it implements a so-called Bloom filter in Python
    A Bloom filter is a data structure to which you can feed individual string or bytes values and have the filter remember them. You can then later ask the filter „Have you seen this value?“ and it will answer „yes“ or „no“. There‘s a small chance it will misremember and tell you „yes“ even if it didn‘t see the value before, but if it really saw it, it will never tell you „no“.
    We use this e.g. for threat intelligence where we have e.g. list of millions of domains that distribute or control malware. We can put all these domains in a list, put them on a computer and have the Python script check if a given domain that the hosts wants to connect to is in the list, which is quite handy.
  • Why did I chose that code base? Well, one one hand it‘s really small with less than 200 lines of code, but it actually does something useful, which is always more interesting than having a toy project to check this.
    It doesn‘t have any external dependencies, which makes it easy to annotate. Again, I don‘t want to give you the impression that all code is that easy to annotate (it isn‘t), but if we want to make progress we need something simple.
    There are also not many exotic types in there, which means I don‘t have to explain the intricacies of the typing module to annotate the code.
    And finally: The code is compatible with Python 2 and Python 3, which makes it an ideal candidate to show you different strategies for annotating the code while maintaining that compatibility.
    The code itself is quite simple, it contains only one class, the bloom filter, which we initialize with two values N and p: N is the maximum number of elements that „fit“ into the filter, and „p“ is the probability of the filter misremembering a value it hasn‘t seen when it‘s full (not so different from humans).
    There are then some functions to check if a given value is in the filter, and to load the filter from a file. There‘s some more stuff that I don‘t show here but I think you get the idea.
  • Okay, to test our type hints that we‘ll introduce we need a test script
    The script has several errors that you can see above
    It illustrates how type hints can help to make external code more robust. What it doesn‘t show how type hints help to make the code itself more robust against errors, so I just want to stress again that this is only one of the main motivations of using type hints, albeit an important one: Make the code harder to use in the wrong way by your users.
  • So let‘s annotate the code! To do that, we go through it…
    Then, the only thing we need to do is run mypy with our test script:
    You can see I pass in this environment variable, which tells mypy where to look for dependencies (it has its own search path), and which Python version to use
    So when I did that I though „hah this is easy Andreas, you got this!“ and then ran the tool expecting to see the errors in the script pop up.
    What I saw instead was this, which was quite puzzling and really made me doubt if I actually understand type hints.
    After some head scratching and debugging I found the answer!
  • I broke mypy!
    Turns out that it doesn‘t like if you have instance variables that are named like builtin types.
    This is a nice example of „your debugging is only as good as your most stupid user“, and it turns out I lowered that bar a bit for mypy :D
    Anyway, the guys there are amazing: I sent in the bug report on Monday and had a reply within 30 minutes (from GvR as well), and the bug was fixed yesterday morning already.
    So as you can see, while there are still some edge cases and problems in the code, they are working really hard to fix them.
    The quick fix here was to just change bytes to _bytes for now
  • Okay, running this modified version finally worked! Hurray!
    What do we get as a result?
    As expected, mypy find all three problems that we have put into our code, which would be hard (but not impossible) to spot for a static analyzer.
    This might seem simple, but think about it this way: By just adding a couple of annotations to our code, we made it much harder for our users (and ourselves) to use the code in the wrong way, which is really powerful.
  • So now we could go home and push this to master, but if we did that it wouldn‘t take long for our customer complaints to come in, because some of them actually need to still use Python 2 (cough cough Red Hat Enterprise Linux anyone?)
    Alright, that‘s a bummer. But can we do something about that to make type hints work even in Python 2? Yes we can!
  • Mypy introduced a second way to add type hints: Type comments
  • Alright, but now we have all our bases covered, haven‘t we?
    Actually no. What about code we can‘t change, like external libraries or things that are beyond our sphere of influence? Are we lost?
    Again: no!
  • - There‘s a third way to introduce type hints, called stubs:
  • - So let‘s compare our approaches

×