Os Vanrossum

Python 3000 Phoenix, Amsterdam, Vilnius, The Dalles, Portland (Google, CWI, EuroPython, Google, OSCON) Guido van Rossum [email_address] [email_address]

What Is Python 3000? The next major Python release to be released as Python 3.0 The first one in a long time to be incompatible but not a completely different language Concept first formed around 2000 Py3k nickname was a play on Windows 2000 Goal: to correct my early design mistakes those that would require incompatibility to fix reduce cognitive load for first-time learners fix “regrets” and “warts”

Recent History Work started for real in early 2006 PEP submission deadline was April 2007 41 PEPs submitted (including meta-PEPs) 12 open, 10 accepted, 10 final, 8 rejected Several branches created in Subversion p3yk (sic): main Py3k branch py3k-struni: string unification branch and some private branches regular merges: trunk (2.6) -> p3yk -> py3k-struni

Tentative Release Schedule Interleaving 3.0 and 2.6 releases 3.0 alpha 1: August 2007 2.6 alpha 1: December 2007 2.6 final: June 2008 3.0 final: August 2008 Standard library reorganization work to start in earnest after 3.0a1 release

Compatibility Python 3.0: will break backwards compatibility not even aiming for a usable common subset Python 2.6: will be fully backwards compatible with Python 2.5 will provide forward compatibility features: “ Py3k warnings mode” detects runtime problems many Py3k features backported may need “from __future__ import <feature>”

2to3: Source Conversion Tool In subversion: sandbox/trunk/2to3 Context-free source-to-source translation Can do things like `x` -> repr(x) x <> y -> x != y apply(f, args, kwds) -> f(*args, **kwds) d.iterkeys() -> iter(d.keys()) d.keys() -> list(d.keys()) xrange() -> range() range() -> list(range()) except E, err: -> except E as err: Can’t do dataflow analysis or type inference

How to Support 2.6 and 3.0 0. Start with excellent unit tests & full coverage Port project to 2.6 Test with Py3k warnings mode turned on Fix all warnings Use 2to3 to convert to 3.0 syntax no hand-editing of output allowed!! Test converted source code under 3.0 To fix problems, edit 2.6 source and go to (2) Release separate 2.6 and 3.0 tarballs

Do’s and Don'ts Do start using “modern” features now: e.g. new-style classes, sorted(), xrange(), int//int, relative import, new exception hierarchy segregate Unicode processing Don't try to write source-level compatible code intersection of 2.6 and 3.0 is large but incomplete Don’t go in one step from 2.5 (or older) to 3.0 always plan to transition via 2.6 users can easily upgrade to 2.6

What’s New in Python 3.0 Status of Individual Features

Unicode Source Code Default source encoding is UTF-8 was (7-bit) ASCII; this is fully forward compatible Unicode letters allowed for identifiers Open issues: normalization which alphabets are supported support for right-to-left scripts Standard library remains ASCII only! except for author names and a few unit tests

Unicode Strings Java-like model: strings (the str type) are always Unicode separate bytes type must explicitly specify encoding to go between these Implementation is ~same as 2.x unicode Dropping u”…” prefix to string literals Codecs changes: .encode() always goes from str -> bytes .decode() always goes from bytes -> str base64, rot13, bz2 “codecs” dropped

Bytes Type Mutable sequence of small ints (0…255) b[0] is an int; b[:1] is a new bytes object Implemented efficiently as unsigned char[] Has some list-like methods, e.g. .extend() Has some string-like methods, e.g. .find() But none that depend on locale bytes literals: b"ascii or \xDD or \012" bytes and str don’t mix: must use .encode() or .decode()

New I/O Library Stackable components (inspired by Java, Perl) Lowest level: unbuffered bytes I/O platform-specific; doesn't use C stdio Middle layer: buffering Top layer: unicode encoding/decoding encoding explicitly specified or system default optionally handles CRLF/LF mapping Compatible API open(filename) returns a buffered text file read() and readline() return strings open(filename, "rb") returns a buffered binary file read() returns bytes; readline() too (!?)

Print is a Function print x, y -> print(x, y) print x, -> print(x, end=" ") print >>f, x -> print(x, file=f) Automatic translation is 98% correct Fails for cases involving softspace cleverness: print "x\n", "y" doesn 't insert a space before y print("x\n", "y") does ditto for print "x\t", "y"

String Formatting Examples (see PEP 3101 for more): "See {0}, {1} and {foo}".format("A", "B", foo="C") "See A, B and C" "my name is {0} :-{{}}".format("Fred") "my name is Fred :-{}" "File name {0.foo}".format(open("foo.txt")) "File name foo.txt" "Name is {0[name]}".format({"name": "Fred"}) "Name is Fred" Shoe size {0:8}".format(42) "Shoe size 42"

Classic Classes are Dead In 2.2 … 2.9: class C: # classic class (0.1 … 2.1) class C(object): # new-style class (old now :-) In 3.0: both are new-style classes (just say "classes") Differences are subtle, few of you will notice new-style classes don’t support “magic” methods (e.g. __hash__) stored in instance __dict__

Class Decorators @some.decorator class C: … Same semantics as function decorators: class C: … C = some.decorator(C)

Signature Annotations NOT type declarations! “Meaning” is up to you Example: def foo(x: "whatever", y: range(10)) -> 42: … Argument syntax is (roughly): NAME [':' expr] ['=' expr] Both expressions are evaluated at 'def' time foo.func_annotations is: {'a': "whatever", 'b': [0, 1, 2], "return": 84} NO use is made of annotations by the language library may use them, e.g. generic functions

New Metaclass Syntax class C(B1, B2, metaclass=MC): … Other keywords passed to MC.__new__ __metaclass__ dropped (in module too) Class heading has full function call syntax: bases = (B1, B2) class C(*bases): … Metaclass can provide __prepare__() returns the namespace dict for class body execution use case: ordered dict to define e.g. db schema

issubclass(), isinstance() Overloadable on the second argument (a class) isinstance(x, C) tries C.__instancecheck__(x) issubclass(D, C) tries C.__subclasscheck__(D) If not overloaded, behavior is unchanged Used for “virtual inheritance” from ABCs

Abstract Base Classes: abc.py Voluntary base classes for standard APIs e.g. Iterable, MutableMapping, Real, RawIOBase Usable as a mix-in class (like DictMixin) provides abstract methods you must override @abstractmethod decorates an abstract method class with abstract methods left can’t be instantiated requires metaclass=ABCMeta from abc import ABCMeta, abstractmethod Alternatively, can register virtual subclasses after A.register(C), issubclass(C, A) is true however, C isn’t modified IOW C must already implement A’s abstract methods

Standard ABCs “ One-trick ponies” (collections.py): Hashable, Iterable, Iterator, Sized, Container, Callable these check for presence of magic method e.g. isinstance(x, Callable) ~ what used to be callable(x) Containers (collections.py): Set, Mapping, Sequence; Mutable<ditto> I/O classes (io.py): IOBase, RawIOBase, BufferedIOBase, TextIOBase Numbers (numbers.py?): Number, Complex, Real, Rational, Integer

Exception Reform "raise E(arg)" replaces "raise E, arg" "except E as v:" replaces "except E, v:" v is deleted at end of except block!!! All exceptions must derive from BaseException better still, Exception; StandardError removed New standard exception attributes: __traceback__: instead of sys.exc_info()[2] __cause__: set by raise E from v __context__: set when raising in except/finally block Exceptions aren’t sequences; use v.args

Int/Long Unification A single built-in integer type Name is int Behaves like old long ‘ L’ suffix is dropped C API is mostly compatible Performance may still need some boosting

Int Division Returns a Float 1/2 == 0.5 Always! Same effect in 2.x with from __future__ import division Use // for int division Use -Q option to Python 2.x to find old usage Has been supported since 2.2!

Octal and Binary Literals 0o777 instead of 0777 Avoid accidental mistakes in data entry Avoid confusing younger generation :-) 0b1010 is a binary number bin(10) returns '0b1010'

Itera{tors,bles}, not Lists range() behaves like old xrange() zip(), map(), filter() return iterators dict.keys(), .items(), .values()

Dictionary Views Inspired by Java Collections Framework Remove .iterkeys(), .iteritems(), .itervalues() Change .keys(), .items(), .values() These return a dict view Not an iterator A lightweight object that can be iterated repeatedly .keys(), .items() have set semantics .values() has "collection" semantics supports iter(), len(), and not much else

Default Comparison Changed Default ==, != compare object identity unchanged from 2.x many type override this New : default <, <=, >, >= raise TypeError Example: [1, 2, ""].sort() raises TypeError Rationale: 2.x default ordering is bogus depends on type names depends on addresses

Nonlocal Statement Paul Graham’s challenge * finally met (sort of): def new_accumulator(n): def accumulator(i): nonlocal n n += i return n return accumulator * Revenge of the Nerds, “Appendix: Power”

New super() Call PEP 3135 (was 367) note: PEP is behind; actual design chosen is different Instead of super(ThisClass, self), use super() old style calls still supported When called without args, digs out of frame: __class__ cell, the class defining the method based on static, textual inclusion cell is filled in after metaclass created the class but before class decorators run first argument (self; or cls for class methods)

Set Literals {1, 2, 3} is the same as set([1, 2, 3]) No empty set literal; use set() No frozenset literal; use frozenset({…}) Set comprehensions: { f ( x ) for x in S if P ( x )} same as set( f ( x ) for x in S if P ( x ))

And More – Much More! it.next() -> it.__next__() f.func_code -> f.__code__ reduce() is dead (I can’t read code using it) <> is dead; use != `…` is dead; use repr(…) raw_input() -> input() Read PEP 3100 for many more PS. lambda lives!

C API Changes Too early to tell what will happen exactly Still working on this… Will have to recompile at the very least Biggest problem expected: Unicode, bytes For now, these simple rules: Adding APIs is okay (of course) Deleting APIs is okay Changing APIs incompatibly is NOT OKAY

Questions PS: Read my blog at artima.com

Why reduce() is an Attractive Nuisance ~90% of reduce() calls found in practice can be rewritten using sum() half the rest are concatenating sequences, i.e. O(N**2) running time First example found by Google Code Search (google.com/codesearch): quotechar = reduce(lambda a, b: (quotes[a] > quotes[b]) and a or b, quotes.keys()) I find the following rewrite much more readable: quotechar = quotes.keys()[0] for a in quotes.keys(): if quotes[a] > quotes[quotechar]: quotechar = a Another: reduce(lambda a, b: a+'|'+b, value) Rewrite as: '|'.join(value) Another: reduce(lambda x, y: x or y.ambiguous and True, parents, False) Rewrite as: any(y.ambiguous for y in parents) All-time worst, from Python Cookbook (unreadable and O(N**2) running time): def wrap(text, width): return reduce(lambda line, word, width=width: '%s%s%s' % (line, ' \n'[(len(line) - line.rfind('\n') - 1 + len(word.split('\n', 1)[0])) >= width)], word), text.split(' '))

Os Vanrossum

More Related Content

What's hot

Viewers also liked

Similar to Os Vanrossum

More from oscon2007

Recently uploaded

Os Vanrossum