A powerful unused feature of Python: function annotations.

Something I’ve always missed when using Python (and dynamically typed languages in general) is nice tooling support. C# and Java have powerful IDEs that can improve your productivity significantly. Some people say that IDEs are a language smell. I disagree, IDEs are a truly valuable tool and the “nice language or IDE” statement is a false dilemma.

The problem with dynamically typed languages is that it’s impossible for the IDE to infer things about some parts of  your code. For example, if you start typing this:

def myfunction(a, b):
...

It’s impossible for the editor to give you any hint about a or b.

I’ve been playing with Dart and TypeScript recently. These are languages that compile to Javascript and both try to improve tooling support. They’re interesting because, despite being dynamically typed languages, both implement optional type annotations. These have no different purpose than aiding editors and IDEs. Let me show you a simple example of how this can be seriously useful, consider the following Javascript code:

function findTitle(title) {
	var titleElement = document.getElementById('title-' + title);
	return title;
}

var t = findTitle('mytitle');
t.innerHTML = 'New title';

The code has a small error that is not very easy to notice. Now let’s see the TypeScript Web Editor with the same code adding a single type annotation to findTitle:

typescript

TypeScript found an error. By knowing that title is a string, it knows that findTitle is returning a string too, and therefore t is a string and strings don’t have an innerHTML method.

Early error detection is one advantage of good tooling support. Another interesting thing is accurate code completion. With good code completion you don’t have to browse huge API docs looking for what you need. You can discover the API while you type and use automatic re-factor tools without worrying about breaking code.

typescript-small

Anders Hejlsberg’s introduction video to TypeScript contains more interesting details about how annotations are really useful.

While playing with TypeScript I couldn’t stop thinking how cool would be to have something like that in Python. Then I realized that Python had syntax for annotations years before TypeScript or Dart were even planned. PEP 3107 introduced function annotations in Python. Here is a small example:

def greet(name: str, age: int) -> str:
    print('Hello {0}, you are {1} years old'.format(name, age))

Here I annotated the greet function with the types of each argument and return value. Python annotations are completely optional and if you don’t do anything with them, they’re just ignored. However, with some little magic, it’s possible to tell python to check types at run-time:

>>> @typechecked
... def greet(name: str, age: int) -> str:
...     print('Hello {0}, you are {1} years old'.format(name, age))
...
>>> greet(1, 28)
Traceback (most recent call last):
    ...
TypeError: Incorrect type for "name"

Run-time type checking is not very useful though. However, a static analyzer could use that information to report errors as soon as you type. Also, IDEs and code completion libraries such as Jedi could use that information to provide nice completion tips just like TypeScript does.

Some people might say that this makes the language too verbose. People using dynamic languages often want concise code. But in practice, if you take a look at any medium to large Python project or library, chances are that you’ll find something like this:

def attach_volume(self, volume_id, instance_id, device):
    """
    Attach an EBS volume to an EC2 instance.

    :type volume_id: str
    :param volume_id: The ID of the EBS volume to be attached.

    :type instance_id: str
    :param instance_id: The ID of the EC2 instance to which it will
                        be attached.

    :type device: str
    :param device: The device on the instance through which the
                   volume will be exposted (e.g. /dev/sdh)

    :rtype: bool
    :return: True if successful
    """
    params = {'InstanceId': instance_id,
              'VolumeId': volume_id,
              'Device': device}
    return self.get_status('AttachVolume', params, verb='POST')

I took this code from the boto library, they annotate functions using docstrings and sphinx. It’s a very common way of annotating public APIs. However, this method has some drawbacks: first, it’s really verbose and you repeat your self a lot writing code like this; second, it’s harder to parse because there are different docstring formats (sphinx, epydoc, pydoctor), so editors don’t bring code completion or early error checking; third, it’s very easy to make mistakes that unsync the docstrings and the code. In this particular example, if you ever run this function, you’ll notice that it returns a string, not a bool as the annotation suggests.

Google Closure uses a similar docstring approach for type annotations in Javascript.

So, if people are already writing verbose docstrings to annotate functions, why not just use real function annotations? They’re completely optional and you don’t have to use them for non-public APIs or small scripts. They’re more concise, easier to process and easier to verify. Function annotations are only available on Python 3, you might say, but there are some approaches to emulate them in Python 2.x using decorators and it’s still way better than docstrings.

An interesting thing about Python annotations is that they don’t have to be types. In fact, you can use any Python expression as a function annotation. This opens the possibilities for a lot of interesting applications: typechecking, auto documentation, language bridges, object mappers, adaptation, design by contract, etc.

The typelanguage library defines a whole language for communicating types. This language can be used with just string annotations. For example:

def get_keys(a_dict: '{str: int}') -> '[str]':
    ...

The downside of this flexibility is that it causes some confusion in the community about how annotations should be used. A recent discussion in the Python-ideas mailing list unveiled this problem.

Personally, I would love to see this feature more used in the Python community. It has a lot of potential. I started a small library to work with type annotations. It implements the typechecked decorator described before, and some other useful things like structural interfaces, unions and logic predicates that can be used as function annotations. It’s still very immature, but I would like to improve it in the future by adding function overloading and other features. A detailed description of the library probably deserves a whole post for it. I would love to hack Jedi to add some basic support for auto-completion based on annotations.

About these ads

10 comentarios on “A powerful unused feature of Python: function annotations.”

  1. >The problem with dynamically typed languages is that it’s impossible for the IDE to infer things about some parts of your code.

    I assume you mean to imply “without running the code while you are typing it” ?

    Yes, because it’s mathematically impossible to do so in general (and to find out whether a given case is one of the impossible ones without running the program).

    >It’s impossible for the editor to give you any hint about a or b.

    Also, the language is designed in a way that it shouldn’t matter much what a and b are.

    >The code has a small error that is not very easy to notice.

    In Python, the unit test would find it in about 2ms. I’m aware that people don’t do tests much in Javascript…

    >You can discover the API while you type

    Since you can (and I do) write the Python program while it is running (for example in ipython, bpython, Idle etc) and save it to file when done, you are also able to discover the API while you type. dir(xxx), help(xxx).

    >and use automatic re-factor tools without worrying about breaking code.

    Tests…

    >Run-time type checking is not very useful though.

    It is useful to me, at least. Done automatically by Python due to its strong type system.

    >Some people might say that this makes the language too verbose.

    If non-optional, it limits the expressivity of the language, which is much much worse. Otherwise, it’s just verbose. But as long as they are optional and only used when needed, more power to you. Looks unpythonic to me, though.

    >if you take a look at any medium to large Python project or library, chances are that you’ll find something like this:

    While I agree that it is often like that, they rather should be doctests. Then they cannot get out of sync *and* they document how you should *use* it – rather than some tiny detail on what kind of thing the first parameter is.

    >So, if people are already writing verbose docstrings to annotate functions, why not just use real function annotations?

    How would writing a test as an annotation look?

    >Personally, I would love to see this feature more used in the Python community.

    Personally, I’d love to see people use tests for their intended purposes which is 1) making sure the system works as planned and 2) documenting what “as planned” is. The kind of pigeonhole micromanagement a type system usually does is so counterproductive…

    > logic predicates that can be used as function annotations.

    That’s actually nice.

    > by adding function overloading

    Why would you do that? Just add a new function with another name. Why the ambiguosness?

    Please note that I wrote the above mostly to show another viewpoint. While I don’t see much point, I do like your work, especially on preconditions (judging from the documentation).

    Also, I did clone the git repo, I get:

    In Python 3.2.3

    python3 typeannotations.py
    Traceback (most recent call last):
    File “typeannotations.py”, line 34, in
    EMPTY_ANNOTATION = inspect.Signature.empty
    AttributeError: ‘module’ object has no attribute ‘Signature’

    In Python 2.7.3

    python2 typeannotations.py
    File “typeannotations.py”, line 1
    SyntaxError: Non-ASCII character ‘\xc3′ in file typeannotations.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

    After adding
    # -*- coding: utf-8 -*-
    to typeannotations.py:

    python2 typeannotations.py
    File “typeannotations.py”, line 113
    class AnyType(metaclass=AnyTypeMeta):
    ^
    SyntaxError: invalid syntax

    • ceronman dice:

      Thanks for the comments.

      >> It’s impossible for the editor to give you any hint about a or b.
      >Also, the language is designed in a way that it shouldn’t matter much what a and b are.

      Well the language is also designed to support annotations. In fact the idea of using type annotations in Python has been in Guido’s head for years, look:

      http://www.artima.com/weblogs/viewpost.jsp?thread=85551
      http://www.artima.com/weblogs/viewpost.jsp?thread=86641

      Type annotations are optional and are meant to give your tools a little help. No more.

      > In Python, the unit test would find it in about 2ms. I’m aware that people don’t do tests much in Javascript…

      I agree. I love unit tests. I use them all the time. Using annotations is not an excuse for not writing proper tests. Unit tests are good software practice. But also, a well known fact in the software community is that the earlier you fail the cheaper it is. Annotations let you find errors as soon as you type and it’s much cheaper to fix it right there than running a test (writing it first if you don’t use TDD) then debug to find what was the error. I’ve been there.

      > Since you can write the Python program while it is running (for example in ipython, bpython etc), you are also able to discover the API while you type. dir(xxx), help(xxx).

      Yes I use bpython all the time. Love it. But for some kinds of projects, specially big ones, it’s just not practical to develop in bpython and running while you type. Also, even interactive REPLs can benefit from some annotations. In fact, the IPython developers were pushing the community to define some conventions for using annotations, because they find they’re a cool tool. Look:

      http://mail.python.org/pipermail/python-ideas/2012-December/018088.html

      > While I agree that it is like that, they rather should be doctests. Then they cannot get out of sync and they document how you should use it – rather than some tiny detail on what kind of thing the first parameter is.

      Love doctests too. I prefer to mix them with regular unit tests. And they’re not incompatible with annotations.

      >>logic predicates that can be used as function annotations.
      >That’s actually nice.

      Check some examples here: https://github.com/ceronman/typeannotations

      >> by adding function overloading
      > Why would you do that? Just call the function another name.

      Sometimes it’s nice to keep the same name. Take for example, the standard isinstance function. It takes a type or a tuple as second argument. It would be akward to have isinstace_type and isinstance_tuple don’t you think?

      There is a PEP for overloading already: http://www.python.org/dev/peps/pep-3124/

      >Please note that I wrote the above mostly to show another viewpoint. While I don’t see much point, I do like your work, especially on preconditions.

      I know :) I sometimes like the other viewpoint too. Thanks for the comments again.

      Btw, the code is only Python 3.3, I’ll backport some things later.

    • burntsushi dice:

      > The kind of pigeonhole micromanagement a type system usually does is so counterproductive

      You could have saved everyone a lot of time reading your comment and said, “I hate static types. Tests are much better.”

  2. Jelle dice:

    Type annotation certainly are interesting for the purpose of wrapping too. Basically cython is vanilla python sprinkled with types. I’m sure the Cython devs will take advantage of type annotation in really creative ways…

  3. snowfall dice:

    I just discovered that feature of python thanks to your post. Admittedly, I am not a regular python user. I used it a few times (notably to play with django a few years ago, which I found more interesting that RoR for that matter, but anyway), and while I found it a nice language to use, I wasn’t so impressed by it. Nowadays, people are all over functional and strongly typed languages, and seeing that python had limited support for type checking is a nice surprise.

    Regarding tests: using unit tests to ensure that code is indeed manipulating values of the right type seems a bit…tedious. Why write code to do that when you can have that for free with the language? This let you concentrate on functional tests, which are the true reason for using tests.

  4. Hi,

    thanks for the links. I’ve read them and I see now that Guido probably also had something like that in mind.

    >But also, a well known fact in the software community is that the earlier you fail the cheaper it is. Annotations let you find errors as soon as you type

    Some of the easy errors, at least…

    >and it’s much cheaper to fix it right there than running a test (writing it first if you don’t use TDD) then debug to find what was the error. I’ve been there.

    Me too. I test the function right after I defined it nowadays. Might as well use a random number generator to generate it otherwise since my mistakes are much worse than simple “type errors” – but if you want to check for these kinds of things too, sure.

    >Yes I use bpython all the time. Love it. But for some kinds of projects, specially big ones, it’s just not practical to develop in bpython and running while you type.

    It’s not? Why not?

    Having used LISP and Ruby REPL and connected editor for editing really big projects, I can tell you that one can write, debug and maintain projects entirely in the REPL (and connected editor) while they are indeed running. Maybe it’s different in Python, I don’t know (not being able to serialize out Python objects – including the function bodies – to a file sure makes it difficult)…

    >Also, even interactive REPLs can benefit from some annotations.

    The question is whether anyone would use them there, given they’d use up vertical screen area and scroll stuff you probably need off-screen. Note that the ipython people also used inline-like “annotations” in their examples because of that.

    >Take for example, the standard isinstance function. It takes a type or a tuple as second argument. It would be akward to have isinstace_type and isinstance_tuple don’t you think?

    Hmm… That’s one of the strangest standard functions I know. You’re supposed to either pass a class or a tuple of classes O_o

    No way I’d ever put that in the same function, no. Might be my mathematics background, but that’s just… wrong in so many ways:
    1) a tuple is not a set, although here a set is meant – since the order is not fixed (neither is the dimension fixed, for what it’s worth).
    2) a class is not a tuple and a specific tuple is not a class.
    3) while it might make sense to have a function work both on tuples and numbers (if you are really careful), having it work on both sets and numbers makes no sense without an aggregate function to decide what to do in order to reduce (hardcoding “any” is a bad idea since you will want others).
    4) you can just use the isinstance(, ) in order to everything the other form can do, so why have the other form? any(isinstance(1, x) for x in {int, str}) – also makes it clearer what it does.

    If I absolutely had to have another function, I’d indeed rather have isinstance and isinstanceany instead of messing up the body of isinstance with a case analysis (or worse, a hidden case analysis) and recursion.

    >There is a PEP for overloading already: http://www.python.org/dev/peps/pep-3124/

    Yes, these are good examples and already work with the current setup: just pass the object to the single function and let it handle it. The function flatten in the example which flattens iterables but not strings (although strings *are* iterable) is something I wouldn’t like to write (I’d use Symbols instead of strings – which are not iterable and faster to compare – while they can still print. Python doesn’t really have Symbols, so I’d first write a Symbol class like in almost every single Python project that does any parsing I ever wrote O_o – is the distinction so unusual nowadays?)

    It also brings up LISP-style before and after methods which I think are a mistake even there – if you have classes, just make the class call a setup and teardown method on its own – much more transparent.

    So to summarize, while I agree that some of the problems are there, I sometimes disagree about the cause and the fix – for reasons of simplicity and orthogonality, mostly :-)

    >Btw, the code is only Python 3.3, I’ll backport some things later.

    Ah ok. Thanks.

  5. [...] A powerful unused feature of Python: function annotations. [...]

  6. I have a duck-type-checking library – Obiwan – which I mostly use for validating incoming JSON: http://williamedwardscoder.tumblr.com/post/33185451698/obiwan-typescript-for-python

    If you chose to, you can enforce run-time type-checking using annotations (Obiwan shows one way to do that) which is ideal during development.

  7. surficle dice:

    Oh! great l like the GIF file concept in the article!
    it’s Looking well and easy to understand for beginner.


Commentarios

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

Seguir

Recibe cada nueva publicación en tu buzón de correo electrónico.