TheVanishing Pattern
from
iterators to
generators in
Python Luciano Ramalho
ramalho@turing.com.br
@ramalhoorg
@ramalhoorg
Demo: laziness
in the Django Shell
2
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Mun...
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Mun...
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Mun...
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Mun...
@ramalhoorg
QuerySet is a lazy iterable
7
@ramalhoorg
QuerySet is a lazy iterable
technical term
8
@ramalhoorg
Lazy
• Avoids unnecessary work, by postponing it as long as possible
• The opposite of eager
9
In Computer Sci...
@ramalhoorg
Now, back to basics...
10
@ramalhoorg
Iteration: C and Python
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; i < argc; i++)...
@ramalhoorg
Iteration: Java (classic)
class Arguments {
public static void main(String[] args) {
for (int i=0; i < args.le...
@ramalhoorg
Iteration: Java ≥1.5
$ java Arguments2 alfa bravo charlie
alfa
bravo
charlie
• Enhanced for (a.k.a. foreach)
s...
@ramalhoorg
Iteration: Java ≥1.5
• Enhanced for (a.k.a. foreach)
class Arguments2 {
public static void main(String[] args)...
@ramalhoorg
You can iterate over many
Python objects
• strings
• files
• XML: ElementTree nodes
• not limited to built-in t...
@ramalhoorg
So, what is an iterable?
• Informal, recursive definition:
• iterable: fit to be iterated
• just as:
edible: fit ...
@ramalhoorg
The for loop statement is
not the only construct that
handles iterables...
17
List comprehension
● Compreensão de lista ou abrangência
● Exemplo: usar todos os elementos:
– L2 = [n*10 for n in L]
List...
@ramalhoorg
Set comprehension
• An expression that builds a set from any iterable
>>> s = 'abracadabra'
>>> set(s)
{'b', '...
@ramalhoorg
Dict comprehensions
• An expression that builds a dict from any iterable
>>> s = 'abracadabra'
>>> {c:ord(c) f...
@ramalhoorg
Syntactic support for iterables
• Tuple unpacking,
parallel assignment
>>> a, b, c = 'XYZ'
>>> a
'X'
>>> b
'Y'...
@ramalhoorg
Syntactic support for iterables (2)
• Function calls: exploding arguments with *
>>> import math
>>> def hypot...
@ramalhoorg
Built-in iterable types
• basestring
• str
• unicode
• dict
• file
• frozenset
• list
• set
• tuple
• xrange
23
@ramalhoorg
Built-in functions that take iterable
arguments
• all
• any
• filter
• iter
• len
• map
• max
• min
• reduce
• ...
@ramalhoorg
Classic iterables in Python
25
@ramalhoorg
Iterator is...
• a classic design pattern
Design Patterns
Gamma, Helm, Johnson &Vlissides
Addison-Wesley,
ISBN...
@ramalhoorg
Head First Design
Patterns Poster
O'Reilly,
ISBN 0-596-10214-3
27
@ramalhoorg
Head First Design
Patterns Poster
O'Reilly,
ISBN 0-596-10214-3
28
“The Iterator Pattern
provides a way to acce...
An iterable Train class
>>> train = Train(4)
>>> for car in train:
... print(car)
car #1
car #2
car #3
car #4
>>>
@ramalhoorg
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __iter...
@ramalhoorg
Iterable ABC
• collections.Iterable abstract base class
• A concrete subclass of Iterable must
implement .__it...
@ramalhoorg
Iterator ABC
• Iterator provides
.next
or
.__next__
• .__next__ returns the next item
• You don’t usually call...
@ramalhoorg
for car in train:
• calls iter(train) to
obtain a TrainIterator
• makes repeated calls to
next(aTrainIterator)...
@ramalhoorg34 Richard Bartz/Wikipedia
@ramalhoorg
Iterable duck-like
creatures
35
@ramalhoorg
Design patterns in dynamic
languages
• Dynamic languages: Lisp, Smalltalk, Python,
Ruby, PHP, JavaScript...
• ...
Peter Norvig:
“Design Patterns in Dynamic Languages”
@ramalhoorg
Dynamic types
• No need to declare types or interfaces
• It does not matter what an object claims do be, only ...
@ramalhoorg
Duck typing
39
“In other words, don't
check whether it is-a duck:
check whether it quacks-
like-a duck, walks-...
@ramalhoorg
A Python iterable is...
• An object from which the iter function can produce an iterator
• The iter(x) call:
•...
@ramalhoorg
Train: a sequence of cars
train = Train(4)
41
train[0] train[1] train[2] train[3]
Train: a sequence of cars
>>> train = Train(4)
>>> len(train)
4
>>> train[0]
'car #1'
>>> train[3]
'car #4'
>>> train[-1]
...
Train: a sequence of cars
class Train(object):
def __init__(self, cars):
self.cars = cars
def __getitem__(self, key):
inde...
@ramalhoorg
The sequence protocol at work
>>> t = Train(4)
>>> len(t)
4
>>> t[0]
'car #1'
>>> t[3]
'car #4'
>>> t[-1]
'car...
@ramalhoorg
Protocol
• protocol: a synonym for interface used in dynamic
languages like Smalltalk, Python, Ruby, Lisp...
•...
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, ...
import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return...
import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return...
import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return...
@ramalhoorg
Sequence ABC
• collections.Sequence abstract base class
>>> train = Train(4)
>>> 'car #2' in train
True
>>> 'c...
@ramalhoorg51 U.S. NRC/Wikipedia
@ramalhoorg
Generators
52
@ramalhoorg
Iteration in C (example 2)
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; i < argc; i...
@ramalhoorg
Iteration in Python (ex. 2)
import sys
for i in range(len(sys.argv)):
print i, ':', sys.argv[i]
$ python args2...
@ramalhoorg
Iteration in Python (ex. 2)
import sys
for i, arg in enumerate(sys.argv):
print i, ':', arg
$ python args2.py ...
@ramalhoorg
import sys
for i, arg in enumerate(sys.argv):
print i, ':', arg
Iteration in Python (ex. 2)
$ python args2.py ...
@ramalhoorg
What
enumerate does >>> e = enumerate('Turing')
>>> e
<enumerate object at 0x...>
>>>
enumerate builds an
enum...
@ramalhoorg
What
enumerate does
isso constroi
um gerador
and that is iterable
>>> e = enumerate('Turing')
>>> e
<enumerate...
@ramalhoorg
What
enumerate does
isso constroi
um gerador
the enumerate object
produces an
(index, item) tuple
for each nex...
@ramalhoorg
Iterator x generator
• By definition (in GoF) an iterator retrieves successive items
from an existing collectio...
Faraday disc
(Wikipedia)
@ramalhoorg
Very simple
generators
62
@ramalhoorg
Generator
function
• Any function that has the
yield keyword in its body
is a generator function
63
>>> def ge...
@ramalhoorg
• When invoked, a
generator function
returns a
generator object
Generator
function
64
>>> def gen_123():
... y...
@ramalhoorg
Generator
function
>>> def gen_123():
... yield 1
... yield 2
... yield 3
...
>>> g = gen_123()
>>> g
<generat...
@ramalhoorg
Generator
behavior
• Note how the output of
the generator function is
interleaved with the
output of the calli...
@ramalhoorg
Generator
behavior
• The body is executed only
when next is called, and it
runs only up to the following
yield...
@ramalhoorg
Generator
behavior
• When the body of the function
returns, the generator object
throws StopIteration
• The fo...
for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function r...
Classic iterator
x generator
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return sel...
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car ...
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car ...
Generator expression (genexp)
>>> g = (c for c in 'ABC')
>>> g
<generator object <genexpr> at 0x10045a410>
>>> for l in g:...
@ramalhoorg
• When evaluated,
returns a generator object
>>> g = (n for n in [1, 2, 3])
>>> g
<generator object <genexpr> ...
for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function r...
for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function r...
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
return ('car #%s' % (i+1) for i in ran...
@ramalhoorg
Built-in functions that return
iterables, iterators or generators
• dict
• enumerate
• frozenset
• list
• reve...
@ramalhoorg
• boundless generators
• count(), cycle(), repeat()
• generators which combine several iterables:
• chain(), t...
@ramalhoorg
Generators in Python 3
• Several functions and methods of the standard library that used to
return lists, now ...
@ramalhoorg
A practical example using
generator functions
• Generator functions to decouple reading and writing logic in a...
@ramalhoorg
Main loop writes
JSON file
@ramalhoorg
Another loop reads
the input records
@ramalhoorg
One implementation:
same loop reads/writes
@ramalhoorg
But what if we need to read
another format?
@ramalhoorg
Functions in the
script
• iterMstRecords*
• iterIsoRecords*
• writeJsonArray
• main	

* generator functions
@ramalhoorg
main:
read command
line arguments
main: determine
input format
selected generator function is passed
as an argument
input generator function is selected
bas...
@ramalhoorg
writeJsonArray:
write JSON records
89
writeJsonArray:
iterates over one of the input
generator functions
selected generator function received as
an argument...
...
@ramalhoorg
iterIsoRecords:
read records
from ISO-2709
format file
generator function!
91
@ramalhoorg
iterIsoRecords
yields one record,
structured as a dict
creates a new dict in each
iteration
92
@ramalhoorg
iterMstRecords:
read records
from ISIS
.MST file
generator function!
iterIsoRecordsiterMstRecords
yields one record,
structured as a dict
creates a new dict in each
iteration
Generators at work
Generators at work
Generators at work
@ramalhoorg
We did not cover
• other generator methods:
• gen.close(): causes a GeneratorExit exception to be raised
withi...
@ramalhoorg
We did not cover
• generator delegation with yield from
• sending data into a generator function with the
gen....
@ramalhoorg
How to learn generators
• Forget about .send() and coroutines: that is a completely different
subject. Look in...
Q & A
Luciano Ramalho
luciano@ramalho.org
@ramalhoorg
https://github.com/ramalho/isis2json
Upcoming SlideShare
Loading in...5
×

The Vanishing Pattern: from iterators to generators in Python

2,798

Published on

The core of the talk is refactoring a simple iterable class from the classic Iterator design pattern (as implemented in the GoF book) to compatible but less verbose implementations using generators. This provides a meaningful context to understand the value of generators. Along the way the behavior of the iter function, the Sequence protocol and the Iterable interface are presented. The motivating examples of this talk are database applications.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,798
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "The Vanishing Pattern: from iterators to generators in Python"

  1. 1. TheVanishing Pattern from iterators to generators in Python Luciano Ramalho ramalho@turing.com.br @ramalhoorg
  2. 2. @ramalhoorg Demo: laziness in the Django Shell 2
  3. 3. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
  4. 4. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet
  5. 5. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet QuerySets are “lazy”: no database access so far
  6. 6. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet QuerySets are “lazy”: no database access so far the query is made only when we iterate over the results
  7. 7. @ramalhoorg QuerySet is a lazy iterable 7
  8. 8. @ramalhoorg QuerySet is a lazy iterable technical term 8
  9. 9. @ramalhoorg Lazy • Avoids unnecessary work, by postponing it as long as possible • The opposite of eager 9 In Computer Science, being “lazy” is often a good thing!
  10. 10. @ramalhoorg Now, back to basics... 10
  11. 11. @ramalhoorg Iteration: C and Python #include <stdio.h> int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%sn", argv[i]); return 0; } import sys for arg in sys.argv: print arg
  12. 12. @ramalhoorg Iteration: Java (classic) class Arguments { public static void main(String[] args) { for (int i=0; i < args.length; i++) System.out.println(args[i]); } } $ java Arguments alfa bravo charlie alfa bravo charlie
  13. 13. @ramalhoorg Iteration: Java ≥1.5 $ java Arguments2 alfa bravo charlie alfa bravo charlie • Enhanced for (a.k.a. foreach) since 2004 class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } }
  14. 14. @ramalhoorg Iteration: Java ≥1.5 • Enhanced for (a.k.a. foreach) class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } } since 2004 import sys for arg in sys.argv: print arg since 1991
  15. 15. @ramalhoorg You can iterate over many Python objects • strings • files • XML: ElementTree nodes • not limited to built-in types: • Django QuerySet • etc. 15
  16. 16. @ramalhoorg So, what is an iterable? • Informal, recursive definition: • iterable: fit to be iterated • just as: edible: fit to be eaten 16
  17. 17. @ramalhoorg The for loop statement is not the only construct that handles iterables... 17
  18. 18. List comprehension ● Compreensão de lista ou abrangência ● Exemplo: usar todos os elementos: – L2 = [n*10 for n in L] List comprehension • An expression that builds a list from any iterable >>> s = 'abracadabra' >>> l = [ord(c) for c in s] >>> l [97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97] input: any iterable object output: a list (always)
  19. 19. @ramalhoorg Set comprehension • An expression that builds a set from any iterable >>> s = 'abracadabra' >>> set(s) {'b', 'r', 'a', 'd', 'c'} >>> {ord(c) for c in s} {97, 98, 99, 100, 114} 19
  20. 20. @ramalhoorg Dict comprehensions • An expression that builds a dict from any iterable >>> s = 'abracadabra' >>> {c:ord(c) for c in s} {'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100} 20
  21. 21. @ramalhoorg Syntactic support for iterables • Tuple unpacking, parallel assignment >>> a, b, c = 'XYZ' >>> a 'X' >>> b 'Y' >>> c 'Z' 21 >>> l = [(c, ord(c)) for c in 'XYZ'] >>> l [('X', 88), ('Y', 89), ('Z', 90)] >>> for char, code in l: ... print char, '->', code ... X -> 88 Y -> 89 Z -> 90
  22. 22. @ramalhoorg Syntactic support for iterables (2) • Function calls: exploding arguments with * >>> import math >>> def hypotenuse(a, b): ... return math.sqrt(a*a + b*b) ... >>> hypotenuse(3, 4) 5.0 >>> sides = (3, 4) >>> hypotenuse(sides) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: hypotenuse() takes exactly 2 arguments (1 given) >>> hypotenuse(*sides) 5.0 22
  23. 23. @ramalhoorg Built-in iterable types • basestring • str • unicode • dict • file • frozenset • list • set • tuple • xrange 23
  24. 24. @ramalhoorg Built-in functions that take iterable arguments • all • any • filter • iter • len • map • max • min • reduce • sorted • sum • zip unrelated to compression
  25. 25. @ramalhoorg Classic iterables in Python 25
  26. 26. @ramalhoorg Iterator is... • a classic design pattern Design Patterns Gamma, Helm, Johnson &Vlissides Addison-Wesley, ISBN 0-201-63361-2 26
  27. 27. @ramalhoorg Head First Design Patterns Poster O'Reilly, ISBN 0-596-10214-3 27
  28. 28. @ramalhoorg Head First Design Patterns Poster O'Reilly, ISBN 0-596-10214-3 28 “The Iterator Pattern provides a way to access the elements of an aggregate object sequentially without exposing the underlying representation.”
  29. 29. An iterable Train class >>> train = Train(4) >>> for car in train: ... print(car) car #1 car #2 car #3 car #4 >>>
  30. 30. @ramalhoorg class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() An iterable Train with iterator iterable iterator
  31. 31. @ramalhoorg Iterable ABC • collections.Iterable abstract base class • A concrete subclass of Iterable must implement .__iter__ • .__iter__ returns an Iterator • You don’t usually call .__iter__ directly • when needed, call iter(x) 31
  32. 32. @ramalhoorg Iterator ABC • Iterator provides .next or .__next__ • .__next__ returns the next item • You don’t usually call .__next__ directly • when needed, call next(x) Python 3 Python 2 Python ≥ 2.6 32
  33. 33. @ramalhoorg for car in train: • calls iter(train) to obtain a TrainIterator • makes repeated calls to next(aTrainIterator) until it raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() Train with iterator 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  34. 34. @ramalhoorg34 Richard Bartz/Wikipedia
  35. 35. @ramalhoorg Iterable duck-like creatures 35
  36. 36. @ramalhoorg Design patterns in dynamic languages • Dynamic languages: Lisp, Smalltalk, Python, Ruby, PHP, JavaScript... • Many features not found in C++, where most of the original 23 Design Patterns were identified • Java is more dynamic than C++, but much more static than Lisp, Python etc. 36 Gamma, Helm, Johnson, Vlissides a.k.a. the Gang of Four (GoF)
  37. 37. Peter Norvig: “Design Patterns in Dynamic Languages”
  38. 38. @ramalhoorg Dynamic types • No need to declare types or interfaces • It does not matter what an object claims do be, only what it is capable of doing 38
  39. 39. @ramalhoorg Duck typing 39 “In other words, don't check whether it is-a duck: check whether it quacks- like-a duck, walks-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.” Alex Martelli comp.lang.python (2000)
  40. 40. @ramalhoorg A Python iterable is... • An object from which the iter function can produce an iterator • The iter(x) call: • invokes x.__iter__() to obtain an iterator • but, if x has no __iter__: • iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]... sequence protocol Iterable interface 40
  41. 41. @ramalhoorg Train: a sequence of cars train = Train(4) 41 train[0] train[1] train[2] train[3]
  42. 42. Train: a sequence of cars >>> train = Train(4) >>> len(train) 4 >>> train[0] 'car #1' >>> train[3] 'car #4' >>> train[-1] 'car #4' >>> train[4] Traceback (most recent call last): ... IndexError: no car at 4 >>> for car in train: ... print(car) car #1 car #2 car #3 car #4
  43. 43. Train: a sequence of cars class Train(object): def __init__(self, cars): self.cars = cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) if __getitem__ exists, iteration “just works”
  44. 44. @ramalhoorg The sequence protocol at work >>> t = Train(4) >>> len(t) 4 >>> t[0] 'car #1' >>> t[3] 'car #4' >>> t[-1] 'car #4' >>> for car in t: ... print(car) car #1 car #2 car #3 car #4 __len__ __getitem__ __getitem__
  45. 45. @ramalhoorg Protocol • protocol: a synonym for interface used in dynamic languages like Smalltalk, Python, Ruby, Lisp... • not declared, and not enforced by static checks 45
  46. 46. class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence protocol __len__ and __getitem__ implement the immutable sequence protocol
  47. 47. import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class abstract methods Python ≥ 2.6
  48. 48. import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class implement these 2
  49. 49. import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class inherit these 5
  50. 50. @ramalhoorg Sequence ABC • collections.Sequence abstract base class >>> train = Train(4) >>> 'car #2' in train True >>> 'car #7' in train False >>> for car in reversed(train): ... print(car) car #4 car #3 car #2 car #1 >>> train.index('car #3') 2 50
  51. 51. @ramalhoorg51 U.S. NRC/Wikipedia
  52. 52. @ramalhoorg Generators 52
  53. 53. @ramalhoorg Iteration in C (example 2) #include <stdio.h> int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%d : %sn", i, argv[i]); return 0; } $ ./args2 alfa bravo charlie 0 : ./args2 1 : alfa 2 : bravo 3 : charlie
  54. 54. @ramalhoorg Iteration in Python (ex. 2) import sys for i in range(len(sys.argv)): print i, ':', sys.argv[i] $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie 54 not Pythonic
  55. 55. @ramalhoorg Iteration in Python (ex. 2) import sys for i, arg in enumerate(sys.argv): print i, ':', arg $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie 55 Pythonic!
  56. 56. @ramalhoorg import sys for i, arg in enumerate(sys.argv): print i, ':', arg Iteration in Python (ex. 2) $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie this returns a lazy iterable object that object yields tuples (index, item) on demand, at each iteration 56
  57. 57. @ramalhoorg What enumerate does >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> enumerate builds an enumerate object 57
  58. 58. @ramalhoorg What enumerate does isso constroi um gerador and that is iterable >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> for item in e: ... print item ... (0, 'T') (1, 'u') (2, 'r') (3, 'i') (4, 'n') (5, 'g') >>> 58 enumerate builds an enumerate object
  59. 59. @ramalhoorg What enumerate does isso constroi um gerador the enumerate object produces an (index, item) tuple for each next(e) call >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> next(e) (0, 'T') >>> next(e) (1, 'u') >>> next(e) (2, 'r') >>> next(e) (3, 'i') >>> next(e) (4, 'n') >>> next(e) (5, 'g') >>> next(e) Traceback (most recent...): ... StopIteration • The enumerator object is an example of a generator
  60. 60. @ramalhoorg Iterator x generator • By definition (in GoF) an iterator retrieves successive items from an existing collection • A generator implements the iterator interface (next) but produces items not necessarily in a collection • a generator may iterate over a collection, but return the items decorated in some way, skip some items... • it may also produce items independently of any existing data source (eg. Fibonacci sequence generator) 60
  61. 61. Faraday disc (Wikipedia)
  62. 62. @ramalhoorg Very simple generators 62
  63. 63. @ramalhoorg Generator function • Any function that has the yield keyword in its body is a generator function 63 >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> the keyword gen was considered for defining generator functions, but def prevailed
  64. 64. @ramalhoorg • When invoked, a generator function returns a generator object Generator function 64 >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> g = gen_123() >>> g <generator object gen_123 at ...>
  65. 65. @ramalhoorg Generator function >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> g = gen_123() >>> g <generator object gen_123 at ...> >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): ... StopIteration • Generator objects implement the Iterator interface 65
  66. 66. @ramalhoorg Generator behavior • Note how the output of the generator function is interleaved with the output of the calling code 66 >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> for c in gen_AB(): ... print('--->', c) ... START ---> A CONTINUE ---> B END. >>>
  67. 67. @ramalhoorg Generator behavior • The body is executed only when next is called, and it runs only up to the following yield >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> g = gen_AB() >>> next(g) START 'A' >>>
  68. 68. @ramalhoorg Generator behavior • When the body of the function returns, the generator object throws StopIteration • The for statement catches that for you 68 >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> g = gen_AB() >>> next(g) START 'A' >>> next(g) CONTINUE 'B' >>> next(g) END. Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
  69. 69. for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  70. 70. Classic iterator x generator class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) 2 classes, 12 lines of code 1 class, 3 lines of code
  71. 71. class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) The pattern just vanished
  72. 72. class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) “When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough -- often that I'm generating by hand the expansions of some macro that I need to write.” Paul Graham Revenge of the nerds (2002)
  73. 73. Generator expression (genexp) >>> g = (c for c in 'ABC') >>> g <generator object <genexpr> at 0x10045a410> >>> for l in g: ... print(l) ... A B C >>>
  74. 74. @ramalhoorg • When evaluated, returns a generator object >>> g = (n for n in [1, 2, 3]) >>> g <generator object <genexpr> at 0x...> >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration Generator expression (genexp)
  75. 75. for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  76. 76. for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration 1 2 class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Train with generator expression >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3
  77. 77. class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Generator function x genexpclass Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)
  78. 78. @ramalhoorg Built-in functions that return iterables, iterators or generators • dict • enumerate • frozenset • list • reversed • set • tuple 78
  79. 79. @ramalhoorg • boundless generators • count(), cycle(), repeat() • generators which combine several iterables: • chain(), tee(), izip(), imap(), product(), compress()... • generators which select or group items: • compress(), dropwhile(), groupby(), ifilter(), islice()... • generators producing combinations of items: • product(), permutations(), combinations()... The itertools module Don’t reinvent the wheel, use itertools! this was not reinvented: ported from Haskell great for MapReduce
  80. 80. @ramalhoorg Generators in Python 3 • Several functions and methods of the standard library that used to return lists, now return generators and other lazy iterables in Python 3 • dict.keys(), dict.items(), dict.values()... • range(...) • like xrange in Python 2.x (more than a generator) • If you really need a list, just pass the generator to the list constructor. Eg.: list(range(10)) 80
  81. 81. @ramalhoorg A practical example using generator functions • Generator functions to decouple reading and writing logic in a database conversion tool designed to handle large datasets https://github.com/ramalho/isis2json 81
  82. 82. @ramalhoorg Main loop writes JSON file
  83. 83. @ramalhoorg Another loop reads the input records
  84. 84. @ramalhoorg One implementation: same loop reads/writes
  85. 85. @ramalhoorg But what if we need to read another format?
  86. 86. @ramalhoorg Functions in the script • iterMstRecords* • iterIsoRecords* • writeJsonArray • main * generator functions
  87. 87. @ramalhoorg main: read command line arguments
  88. 88. main: determine input format selected generator function is passed as an argument input generator function is selected based on the input file extension
  89. 89. @ramalhoorg writeJsonArray: write JSON records 89
  90. 90. writeJsonArray: iterates over one of the input generator functions selected generator function received as an argument... and called to produce input generator
  91. 91. @ramalhoorg iterIsoRecords: read records from ISO-2709 format file generator function! 91
  92. 92. @ramalhoorg iterIsoRecords yields one record, structured as a dict creates a new dict in each iteration 92
  93. 93. @ramalhoorg iterMstRecords: read records from ISIS .MST file generator function!
  94. 94. iterIsoRecordsiterMstRecords yields one record, structured as a dict creates a new dict in each iteration
  95. 95. Generators at work
  96. 96. Generators at work
  97. 97. Generators at work
  98. 98. @ramalhoorg We did not cover • other generator methods: • gen.close(): causes a GeneratorExit exception to be raised within the generator body, at the point where it is paused • gen.throw(e): causes any exception e to be raised within the generator body, at the point it where is paused Mostly useful for long-running processes. Often not needed in batch processing scripts. 98
  99. 99. @ramalhoorg We did not cover • generator delegation with yield from • sending data into a generator function with the gen.send(x) method (instead of next(gen)), and using yield as an expression to get the data sent • using generator functions as coroutines not useful in the context of iteration Python ≥ 3.3 “Coroutines are not related to iteration” David Beazley 99
  100. 100. @ramalhoorg How to learn generators • Forget about .send() and coroutines: that is a completely different subject. Look into that only after mastering and becoming really confortable using generators for iteration. • Study and use the itertools module • Don’t worry about .close() and .throw() initially. You can be productive with generators without using these methods. • yield from is only available in Python 3.3, and only relevant if you need to use .close() and .throw() 100
  101. 101. Q & A Luciano Ramalho luciano@ramalho.org @ramalhoorg https://github.com/ramalho/isis2json
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×