Python 3000 Phoenix, Amsterdam, Vilnius, The Dalles, Portland (Google, CWI, EuroPython, Google, OSCON) Guido van Rossum [email_address] [email_address]
What Is Python 3000? The next  major  Python release to be released as Python 3.0 The first one in a long time to be  incompatible but not a completely different language Concept first formed around 2000 Py3k nickname was a play on Windows 2000 Goal: to correct my early design mistakes those that would require incompatibility to fix reduce cognitive load for first-time learners fix “regrets” and “warts”
Recent History Work started for real in early 2006 PEP submission deadline was April 2007 41 PEPs submitted (including meta-PEPs) 12 open, 10 accepted, 10 final, 8 rejected Several branches created in Subversion p3yk (sic): main Py3k branch py3k-struni: string unification branch and some private branches regular merges: trunk (2.6) -> p3yk -> py3k-struni
Tentative Release Schedule Interleaving 3.0 and 2.6 releases 3.0 alpha 1: August 2007 2.6 alpha 1: December 2007 2.6 final: June 2008 3.0 final: August 2008 Standard library reorganization work to start in earnest after 3.0a1 release
Compatibility Python 3.0: will break backwards compatibility not even aiming for a usable common subset Python 2.6: will be fully backwards compatible with Python 2.5 will provide forward compatibility features: “ Py3k warnings mode” detects runtime problems many Py3k features backported may need “from __future__ import <feature>”
2to3: Source Conversion Tool In subversion: sandbox/trunk/2to3 Context-free source-to-source translation Can do things like `x` -> repr(x) x <> y -> x != y apply(f, args, kwds) -> f(*args, **kwds) d.iterkeys() -> iter(d.keys()) d.keys() -> list(d.keys()) xrange() -> range() range() -> list(range()) except E, err: -> except E as err: Can’t do dataflow analysis or type inference
How to Support 2.6  and  3.0 0. Start with excellent unit tests & full coverage Port project to 2.6 Test with Py3k warnings mode turned on Fix all warnings Use 2to3 to convert to 3.0 syntax no hand-editing of output allowed!! Test converted source code under 3.0 To fix problems, edit 2.6 source and go to (2) Release separate 2.6 and 3.0 tarballs
Do’s and Don'ts Do start using “modern” features now: e.g. new-style classes, sorted(), xrange(), int//int, relative import, new exception hierarchy segregate Unicode processing Don't try to write source-level compatible code intersection of 2.6 and 3.0 is large but incomplete Don’t go in one step from 2.5 (or older) to 3.0 always plan to transition via 2.6 users can easily upgrade to 2.6
What’s New in Python 3.0 Status of Individual Features
Unicode Source Code Default source encoding is UTF-8 was (7-bit) ASCII; this is fully forward compatible Unicode letters allowed for identifiers Open issues: normalization which alphabets are supported support for right-to-left scripts Standard library remains ASCII only! except for author names and a few unit tests
Unicode Strings Java-like model: strings (the str type) are always Unicode separate bytes type must explicitly specify encoding to go between these Implementation is ~same as 2.x unicode Dropping u”…” prefix to string literals Codecs changes: .encode() always goes from str -> bytes .decode() always goes from bytes -> str base64, rot13, bz2 “codecs” dropped
Bytes Type Mutable  sequence of small ints (0…255) b[0] is an int; b[:1] is a new bytes object Implemented efficiently as unsigned char[] Has some list-like methods, e.g. .extend() Has some string-like methods, e.g. .find() But none that depend on locale bytes literals: b&quot;ascii or \xDD or \012&quot; bytes and str don’t mix: must  use .encode() or .decode()
New I/O Library Stackable components (inspired by Java, Perl) Lowest level: unbuffered bytes I/O platform-specific; doesn't use C stdio Middle layer: buffering Top layer: unicode encoding/decoding encoding explicitly specified or system default optionally handles CRLF/LF mapping Compatible API open(filename) returns a buffered text file read() and readline() return strings open(filename, &quot;rb&quot;) returns a buffered binary file read() returns bytes; readline() too (!?)
Print is a Function print x, y -> print(x, y) print x, -> print(x, end=&quot; &quot;) print >>f, x -> print(x, file=f) Automatic translation is 98% correct Fails for cases involving softspace cleverness: print &quot;x\n&quot;, &quot;y&quot; doesn 't insert a space before y print(&quot;x\n&quot;, &quot;y&quot;) does ditto for print &quot;x\t&quot;, &quot;y&quot;
String Formatting Examples (see PEP 3101 for more): &quot;See {0}, {1} and {foo}&quot;.format(&quot;A&quot;, &quot;B&quot;, foo=&quot;C&quot;) &quot;See A, B and C&quot; &quot;my name is {0} :-{{}}&quot;.format(&quot;Fred&quot;) &quot;my name is Fred :-{}&quot; &quot;File name {0.foo}&quot;.format(open(&quot;foo.txt&quot;)) &quot;File name foo.txt&quot; &quot;Name is {0[name]}&quot;.format({&quot;name&quot;: &quot;Fred&quot;}) &quot;Name is Fred&quot; Shoe size {0:8}&quot;.format(42) &quot;Shoe size  42&quot;
Classic Classes are Dead In 2.2 … 2.9: class C: # classic class (0.1 … 2.1) class C(object): # new-style class (old now :-) In 3.0: both are new-style classes (just say &quot;classes&quot;) Differences are subtle, few of you will notice new-style classes don’t support “magic” methods (e.g. __hash__) stored in instance __dict__
Class Decorators @some.decorator class C:   … Same semantics as function decorators: class C:   … C = some.decorator(C)
Signature Annotations NOT  type declarations! “Meaning” is up to you Example: def foo(x: &quot;whatever&quot;, y: range(10)) -> 42: … Argument syntax is (roughly): NAME [':' expr] ['=' expr] Both expressions are evaluated at 'def' time foo.func_annotations is: {'a': &quot;whatever&quot;, 'b': [0, 1, 2], &quot;return&quot;: 84} NO use is made of annotations by the language library may use them, e.g. generic functions
New Metaclass Syntax class C(B1, B2, metaclass=MC): … Other keywords passed to MC.__new__ __metaclass__ dropped (in module too) Class heading has full function call syntax: bases = (B1, B2) class C(*bases): … Metaclass can provide __prepare__() returns the namespace dict for class body execution use case: ordered dict to define e.g. db schema
issubclass(), isinstance() Overloadable on the second argument (a class) isinstance(x, C) tries C.__instancecheck__(x) issubclass(D, C) tries C.__subclasscheck__(D) If not overloaded, behavior is unchanged Used for “virtual inheritance” from ABCs
Abstract Base Classes: abc.py Voluntary  base classes for standard APIs e.g. Iterable, MutableMapping, Real, RawIOBase Usable as a mix-in class (like DictMixin) provides abstract methods you must override @abstractmethod decorates an abstract method class with abstract methods left can’t be instantiated requires metaclass=ABCMeta from abc import ABCMeta, abstractmethod Alternatively, can register  virtual  subclasses after A.register(C), issubclass(C, A) is true however, C isn’t modified IOW C must already implement A’s abstract methods
Standard ABCs “ One-trick ponies” (collections.py): Hashable, Iterable, Iterator, Sized, Container, Callable these check for presence of magic method e.g. isinstance(x, Callable) ~ what used to be callable(x) Containers (collections.py): Set, Mapping, Sequence; Mutable<ditto> I/O classes (io.py): IOBase, RawIOBase, BufferedIOBase, TextIOBase Numbers (numbers.py?): Number, Complex, Real, Rational, Integer
Exception Reform &quot;raise E(arg)&quot; replaces &quot;raise E, arg&quot; &quot;except E as v:&quot; replaces &quot;except E, v:&quot; v is deleted at end of except block!!! All exceptions must derive from BaseException better still, Exception; StandardError removed New standard exception attributes: __traceback__: instead of sys.exc_info()[2] __cause__: set by raise E from v __context__: set when raising in except/finally block Exceptions aren’t sequences; use v.args
Int/Long Unification A single built-in integer type Name is int Behaves like old long ‘ L’ suffix is dropped C API is mostly compatible Performance may still need some boosting
Int Division Returns a Float 1/2 == 0.5 Always! Same effect in 2.x with from __future__ import division Use // for int division Use -Q option to Python 2.x to find old usage Has been supported since 2.2!
Octal and Binary Literals 0o777 instead of 0777 Avoid accidental mistakes in data entry Avoid confusing younger generation :-) 0b1010 is a binary number bin(10) returns '0b1010'
Itera{tors,bles}, not Lists range() behaves like old xrange() zip(), map(), filter() return iterators dict.keys(), .items(), .values()
Dictionary Views Inspired by Java Collections Framework Remove .iterkeys(), .iteritems(), .itervalues() Change .keys(), .items(), .values() These return a  dict view Not an iterator A lightweight object that can be iterated repeatedly .keys(), .items() have set semantics .values() has &quot;collection&quot; semantics supports iter(), len(), and not much else
Default Comparison Changed Default  ==, != compare object identity unchanged from 2.x many type override this New : default <, <=, >, >= raise TypeError Example: [1, 2, &quot;&quot;].sort() raises TypeError Rationale: 2.x default ordering is bogus depends on type names depends on addresses
Nonlocal Statement Paul Graham’s challenge *  finally met (sort of): def new_accumulator(n):   def accumulator(i):   nonlocal n   n += i   return n   return accumulator * Revenge of the Nerds, “Appendix: Power”
New super() Call PEP 3135 (was 367) note: PEP is behind; actual design chosen is different Instead of super(ThisClass, self), use super() old style calls still supported When called without args, digs out of frame: __class__ cell, the class defining the method based on static, textual inclusion cell is filled in after metaclass created the class but before class decorators run first argument (self; or cls for class methods)
Set Literals {1, 2, 3} is the same as set([1, 2, 3]) No empty set literal; use set() No frozenset literal; use frozenset({…}) Set comprehensions: { f ( x ) for  x  in  S  if  P ( x )} same as set( f ( x ) for  x  in  S  if  P ( x ))
And More – Much More! it.next() -> it.__next__() f.func_code -> f.__code__ reduce() is dead (I can’t read code using it) <> is dead; use != `…` is dead; use repr(…) raw_input() -> input() Read PEP 3100 for many more PS. lambda lives!
C API Changes Too early to tell what will happen exactly Still working on this… Will have to recompile at the  very  least Biggest problem expected: Unicode, bytes For now, these simple rules: Adding APIs is okay (of course) Deleting APIs is okay Changing APIs incompatibly is NOT OKAY
Questions PS: Read my blog at artima.com
Why reduce() is an Attractive Nuisance ~90% of reduce() calls found in practice can be rewritten using sum() half the rest are concatenating sequences, i.e. O(N**2) running time First example found by Google Code Search (google.com/codesearch): quotechar = reduce(lambda a, b: (quotes[a] > quotes[b]) and a or b, quotes.keys()) I find the following rewrite much more readable: quotechar = quotes.keys()[0] for a in quotes.keys():   if quotes[a] > quotes[quotechar]:   quotechar = a Another:  reduce(lambda a, b: a+'|'+b, value) Rewrite as:   '|'.join(value) Another:  reduce(lambda x, y: x or y.ambiguous and True, parents, False) Rewrite as:  any(y.ambiguous for y in parents) All-time worst, from Python Cookbook (unreadable  and  O(N**2) running time): def wrap(text, width):   return reduce(lambda line, word, width=width:   '%s%s%s' % (line,   ' \n'[(len(line) - line.rfind('\n') - 1   + len(word.split('\n', 1)[0])) >= width)],   word),   text.split(' '))

Os Vanrossum

  • 1.
    Python 3000 Phoenix,Amsterdam, Vilnius, The Dalles, Portland (Google, CWI, EuroPython, Google, OSCON) Guido van Rossum [email_address] [email_address]
  • 2.
    What Is Python3000? The next major Python release to be released as Python 3.0 The first one in a long time to be incompatible but not a completely different language Concept first formed around 2000 Py3k nickname was a play on Windows 2000 Goal: to correct my early design mistakes those that would require incompatibility to fix reduce cognitive load for first-time learners fix “regrets” and “warts”
  • 3.
    Recent History Workstarted for real in early 2006 PEP submission deadline was April 2007 41 PEPs submitted (including meta-PEPs) 12 open, 10 accepted, 10 final, 8 rejected Several branches created in Subversion p3yk (sic): main Py3k branch py3k-struni: string unification branch and some private branches regular merges: trunk (2.6) -> p3yk -> py3k-struni
  • 4.
    Tentative Release ScheduleInterleaving 3.0 and 2.6 releases 3.0 alpha 1: August 2007 2.6 alpha 1: December 2007 2.6 final: June 2008 3.0 final: August 2008 Standard library reorganization work to start in earnest after 3.0a1 release
  • 5.
    Compatibility Python 3.0:will break backwards compatibility not even aiming for a usable common subset Python 2.6: will be fully backwards compatible with Python 2.5 will provide forward compatibility features: “ Py3k warnings mode” detects runtime problems many Py3k features backported may need “from __future__ import <feature>”
  • 6.
    2to3: Source ConversionTool In subversion: sandbox/trunk/2to3 Context-free source-to-source translation Can do things like `x` -> repr(x) x <> y -> x != y apply(f, args, kwds) -> f(*args, **kwds) d.iterkeys() -> iter(d.keys()) d.keys() -> list(d.keys()) xrange() -> range() range() -> list(range()) except E, err: -> except E as err: Can’t do dataflow analysis or type inference
  • 7.
    How to Support2.6 and 3.0 0. Start with excellent unit tests & full coverage Port project to 2.6 Test with Py3k warnings mode turned on Fix all warnings Use 2to3 to convert to 3.0 syntax no hand-editing of output allowed!! Test converted source code under 3.0 To fix problems, edit 2.6 source and go to (2) Release separate 2.6 and 3.0 tarballs
  • 8.
    Do’s and Don'tsDo start using “modern” features now: e.g. new-style classes, sorted(), xrange(), int//int, relative import, new exception hierarchy segregate Unicode processing Don't try to write source-level compatible code intersection of 2.6 and 3.0 is large but incomplete Don’t go in one step from 2.5 (or older) to 3.0 always plan to transition via 2.6 users can easily upgrade to 2.6
  • 9.
    What’s New inPython 3.0 Status of Individual Features
  • 10.
    Unicode Source CodeDefault source encoding is UTF-8 was (7-bit) ASCII; this is fully forward compatible Unicode letters allowed for identifiers Open issues: normalization which alphabets are supported support for right-to-left scripts Standard library remains ASCII only! except for author names and a few unit tests
  • 11.
    Unicode Strings Java-likemodel: strings (the str type) are always Unicode separate bytes type must explicitly specify encoding to go between these Implementation is ~same as 2.x unicode Dropping u”…” prefix to string literals Codecs changes: .encode() always goes from str -> bytes .decode() always goes from bytes -> str base64, rot13, bz2 “codecs” dropped
  • 12.
    Bytes Type Mutable sequence of small ints (0…255) b[0] is an int; b[:1] is a new bytes object Implemented efficiently as unsigned char[] Has some list-like methods, e.g. .extend() Has some string-like methods, e.g. .find() But none that depend on locale bytes literals: b&quot;ascii or \xDD or \012&quot; bytes and str don’t mix: must use .encode() or .decode()
  • 13.
    New I/O LibraryStackable components (inspired by Java, Perl) Lowest level: unbuffered bytes I/O platform-specific; doesn't use C stdio Middle layer: buffering Top layer: unicode encoding/decoding encoding explicitly specified or system default optionally handles CRLF/LF mapping Compatible API open(filename) returns a buffered text file read() and readline() return strings open(filename, &quot;rb&quot;) returns a buffered binary file read() returns bytes; readline() too (!?)
  • 14.
    Print is aFunction print x, y -> print(x, y) print x, -> print(x, end=&quot; &quot;) print >>f, x -> print(x, file=f) Automatic translation is 98% correct Fails for cases involving softspace cleverness: print &quot;x\n&quot;, &quot;y&quot; doesn 't insert a space before y print(&quot;x\n&quot;, &quot;y&quot;) does ditto for print &quot;x\t&quot;, &quot;y&quot;
  • 15.
    String Formatting Examples(see PEP 3101 for more): &quot;See {0}, {1} and {foo}&quot;.format(&quot;A&quot;, &quot;B&quot;, foo=&quot;C&quot;) &quot;See A, B and C&quot; &quot;my name is {0} :-{{}}&quot;.format(&quot;Fred&quot;) &quot;my name is Fred :-{}&quot; &quot;File name {0.foo}&quot;.format(open(&quot;foo.txt&quot;)) &quot;File name foo.txt&quot; &quot;Name is {0[name]}&quot;.format({&quot;name&quot;: &quot;Fred&quot;}) &quot;Name is Fred&quot; Shoe size {0:8}&quot;.format(42) &quot;Shoe size 42&quot;
  • 16.
    Classic Classes areDead In 2.2 … 2.9: class C: # classic class (0.1 … 2.1) class C(object): # new-style class (old now :-) In 3.0: both are new-style classes (just say &quot;classes&quot;) Differences are subtle, few of you will notice new-style classes don’t support “magic” methods (e.g. __hash__) stored in instance __dict__
  • 17.
    Class Decorators @some.decoratorclass C: … Same semantics as function decorators: class C: … C = some.decorator(C)
  • 18.
    Signature Annotations NOT type declarations! “Meaning” is up to you Example: def foo(x: &quot;whatever&quot;, y: range(10)) -> 42: … Argument syntax is (roughly): NAME [':' expr] ['=' expr] Both expressions are evaluated at 'def' time foo.func_annotations is: {'a': &quot;whatever&quot;, 'b': [0, 1, 2], &quot;return&quot;: 84} NO use is made of annotations by the language library may use them, e.g. generic functions
  • 19.
    New Metaclass Syntaxclass C(B1, B2, metaclass=MC): … Other keywords passed to MC.__new__ __metaclass__ dropped (in module too) Class heading has full function call syntax: bases = (B1, B2) class C(*bases): … Metaclass can provide __prepare__() returns the namespace dict for class body execution use case: ordered dict to define e.g. db schema
  • 20.
    issubclass(), isinstance() Overloadableon the second argument (a class) isinstance(x, C) tries C.__instancecheck__(x) issubclass(D, C) tries C.__subclasscheck__(D) If not overloaded, behavior is unchanged Used for “virtual inheritance” from ABCs
  • 21.
    Abstract Base Classes:abc.py Voluntary base classes for standard APIs e.g. Iterable, MutableMapping, Real, RawIOBase Usable as a mix-in class (like DictMixin) provides abstract methods you must override @abstractmethod decorates an abstract method class with abstract methods left can’t be instantiated requires metaclass=ABCMeta from abc import ABCMeta, abstractmethod Alternatively, can register virtual subclasses after A.register(C), issubclass(C, A) is true however, C isn’t modified IOW C must already implement A’s abstract methods
  • 22.
    Standard ABCs “One-trick ponies” (collections.py): Hashable, Iterable, Iterator, Sized, Container, Callable these check for presence of magic method e.g. isinstance(x, Callable) ~ what used to be callable(x) Containers (collections.py): Set, Mapping, Sequence; Mutable<ditto> I/O classes (io.py): IOBase, RawIOBase, BufferedIOBase, TextIOBase Numbers (numbers.py?): Number, Complex, Real, Rational, Integer
  • 23.
    Exception Reform &quot;raiseE(arg)&quot; replaces &quot;raise E, arg&quot; &quot;except E as v:&quot; replaces &quot;except E, v:&quot; v is deleted at end of except block!!! All exceptions must derive from BaseException better still, Exception; StandardError removed New standard exception attributes: __traceback__: instead of sys.exc_info()[2] __cause__: set by raise E from v __context__: set when raising in except/finally block Exceptions aren’t sequences; use v.args
  • 24.
    Int/Long Unification Asingle built-in integer type Name is int Behaves like old long ‘ L’ suffix is dropped C API is mostly compatible Performance may still need some boosting
  • 25.
    Int Division Returnsa Float 1/2 == 0.5 Always! Same effect in 2.x with from __future__ import division Use // for int division Use -Q option to Python 2.x to find old usage Has been supported since 2.2!
  • 26.
    Octal and BinaryLiterals 0o777 instead of 0777 Avoid accidental mistakes in data entry Avoid confusing younger generation :-) 0b1010 is a binary number bin(10) returns '0b1010'
  • 27.
    Itera{tors,bles}, not Listsrange() behaves like old xrange() zip(), map(), filter() return iterators dict.keys(), .items(), .values()
  • 28.
    Dictionary Views Inspiredby Java Collections Framework Remove .iterkeys(), .iteritems(), .itervalues() Change .keys(), .items(), .values() These return a dict view Not an iterator A lightweight object that can be iterated repeatedly .keys(), .items() have set semantics .values() has &quot;collection&quot; semantics supports iter(), len(), and not much else
  • 29.
    Default Comparison ChangedDefault ==, != compare object identity unchanged from 2.x many type override this New : default <, <=, >, >= raise TypeError Example: [1, 2, &quot;&quot;].sort() raises TypeError Rationale: 2.x default ordering is bogus depends on type names depends on addresses
  • 30.
    Nonlocal Statement PaulGraham’s challenge * finally met (sort of): def new_accumulator(n): def accumulator(i): nonlocal n n += i return n return accumulator * Revenge of the Nerds, “Appendix: Power”
  • 31.
    New super() CallPEP 3135 (was 367) note: PEP is behind; actual design chosen is different Instead of super(ThisClass, self), use super() old style calls still supported When called without args, digs out of frame: __class__ cell, the class defining the method based on static, textual inclusion cell is filled in after metaclass created the class but before class decorators run first argument (self; or cls for class methods)
  • 32.
    Set Literals {1,2, 3} is the same as set([1, 2, 3]) No empty set literal; use set() No frozenset literal; use frozenset({…}) Set comprehensions: { f ( x ) for x in S if P ( x )} same as set( f ( x ) for x in S if P ( x ))
  • 33.
    And More –Much More! it.next() -> it.__next__() f.func_code -> f.__code__ reduce() is dead (I can’t read code using it) <> is dead; use != `…` is dead; use repr(…) raw_input() -> input() Read PEP 3100 for many more PS. lambda lives!
  • 34.
    C API ChangesToo early to tell what will happen exactly Still working on this… Will have to recompile at the very least Biggest problem expected: Unicode, bytes For now, these simple rules: Adding APIs is okay (of course) Deleting APIs is okay Changing APIs incompatibly is NOT OKAY
  • 35.
    Questions PS: Readmy blog at artima.com
  • 36.
    Why reduce() isan Attractive Nuisance ~90% of reduce() calls found in practice can be rewritten using sum() half the rest are concatenating sequences, i.e. O(N**2) running time First example found by Google Code Search (google.com/codesearch): quotechar = reduce(lambda a, b: (quotes[a] > quotes[b]) and a or b, quotes.keys()) I find the following rewrite much more readable: quotechar = quotes.keys()[0] for a in quotes.keys(): if quotes[a] > quotes[quotechar]: quotechar = a Another: reduce(lambda a, b: a+'|'+b, value) Rewrite as: '|'.join(value) Another: reduce(lambda x, y: x or y.ambiguous and True, parents, False) Rewrite as: any(y.ambiguous for y in parents) All-time worst, from Python Cookbook (unreadable and O(N**2) running time): def wrap(text, width): return reduce(lambda line, word, width=width: '%s%s%s' % (line, ' \n'[(len(line) - line.rfind('\n') - 1 + len(word.split('\n', 1)[0])) >= width)], word), text.split(' '))