TheVanishing Pattern
from
iterators to
generators in
Python Luciano Ramalho
ramalho@turing.com.br
@ramalhoorg
@ramalhoorg
Demo: laziness
in the Django Shell
2
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
this expression makes
a Django QuerySet
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
this expression makes
a Django QuerySet
QuerySets are “lazy”:
no database access
so far
>>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
this expression makes
a Django QuerySet
QuerySets are “lazy”:
no database access
so far
the query is made
only when we
iterate over the
results
@ramalhoorg
QuerySet is a lazy iterable
7
@ramalhoorg
QuerySet is a lazy iterable
technical term
8
@ramalhoorg
Lazy
• Avoids unnecessary work, by postponing it as long as possible
• The opposite of eager
9
In Computer Science, being
“lazy” is often a good thing!
@ramalhoorg
Now, back to basics...
10
@ramalhoorg
Iteration: C and Python
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; i < argc; i++)
printf("%sn", argv[i]);
return 0;
}
import sys
for arg in sys.argv:
print arg
@ramalhoorg
Iteration: Java (classic)
class Arguments {
public static void main(String[] args) {
for (int i=0; i < args.length; i++)
System.out.println(args[i]);
}
}
$ java Arguments alfa bravo charlie
alfa
bravo
charlie
@ramalhoorg
Iteration: Java ≥1.5
$ java Arguments2 alfa bravo charlie
alfa
bravo
charlie
• Enhanced for (a.k.a. foreach)
since
2004
class Arguments2 {
public static void main(String[] args) {
for (String arg : args)
System.out.println(arg);
}
}
@ramalhoorg
Iteration: Java ≥1.5
• Enhanced for (a.k.a. foreach)
class Arguments2 {
public static void main(String[] args) {
for (String arg : args)
System.out.println(arg);
}
}
since
2004
import sys
for arg in sys.argv:
print arg
since
1991
@ramalhoorg
You can iterate over many
Python objects
• strings
• files
• XML: ElementTree nodes
• not limited to built-in types:
• Django QuerySet
• etc.
15
@ramalhoorg
So, what is an iterable?
• Informal, recursive definition:
• iterable: fit to be iterated
• just as:
edible: fit to be eaten
16
@ramalhoorg
The for loop statement is
not the only construct that
handles iterables...
17
List comprehension
● Compreensão de lista ou abrangência
● Exemplo: usar todos os elementos:
– L2 = [n*10 for n in L]
List comprehension
• An expression that builds a list from any iterable
>>> s = 'abracadabra'
>>> l = [ord(c) for c in s]
>>> l
[97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97]
input: any iterable object
output: a list (always)
@ramalhoorg
Set comprehension
• An expression that builds a set from any iterable
>>> s = 'abracadabra'
>>> set(s)
{'b', 'r', 'a', 'd', 'c'}
>>> {ord(c) for c in s}
{97, 98, 99, 100, 114}
19
@ramalhoorg
Dict comprehensions
• An expression that builds a dict from any iterable
>>> s = 'abracadabra'
>>> {c:ord(c) for c in s}
{'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100}
20
@ramalhoorg
Syntactic support for iterables
• Tuple unpacking,
parallel assignment
>>> a, b, c = 'XYZ'
>>> a
'X'
>>> b
'Y'
>>> c
'Z'
21
>>> l = [(c, ord(c)) for c in 'XYZ']
>>> l
[('X', 88), ('Y', 89), ('Z', 90)]
>>> for char, code in l:
... print char, '->', code
...
X -> 88
Y -> 89
Z -> 90
@ramalhoorg
Syntactic support for iterables (2)
• Function calls: exploding arguments with *
>>> import math
>>> def hypotenuse(a, b):
... return math.sqrt(a*a + b*b)
...
>>> hypotenuse(3, 4)
5.0
>>> sides = (3, 4)
>>> hypotenuse(sides)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: hypotenuse() takes exactly 2 arguments (1 given)
>>> hypotenuse(*sides)
5.0 22
@ramalhoorg
Built-in iterable types
• basestring
• str
• unicode
• dict
• file
• frozenset
• list
• set
• tuple
• xrange
23
@ramalhoorg
Built-in functions that take iterable
arguments
• all
• any
• filter
• iter
• len
• map
• max
• min
• reduce
• sorted
• sum
• zip
unrelated to compression
@ramalhoorg
Classic iterables in Python
25
@ramalhoorg
Iterator is...
• a classic design pattern
Design Patterns
Gamma, Helm, Johnson &Vlissides
Addison-Wesley,
ISBN 0-201-63361-2
26
@ramalhoorg
Head First Design
Patterns Poster
O'Reilly,
ISBN 0-596-10214-3
27
@ramalhoorg
Head First Design
Patterns Poster
O'Reilly,
ISBN 0-596-10214-3
28
“The Iterator Pattern
provides a way to access
the elements of an
aggregate object
sequentially without
exposing the underlying
representation.”
An iterable Train class
>>> train = Train(4)
>>> for car in train:
... print(car)
car #1
car #2
car #3
car #4
>>>
@ramalhoorg
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __iter__(self):
return TrainIterator(self)
class TrainIterator(object):
def __init__(self, train):
self.train = train
self.current = 0
def __next__(self): # Python 3
if self.current < len(self.train):
self.current += 1
return 'car #%s' % (self.current)
else:
raise StopIteration()
An iterable
Train with
iterator
iterable
iterator
@ramalhoorg
Iterable ABC
• collections.Iterable abstract base class
• A concrete subclass of Iterable must
implement .__iter__
• .__iter__ returns an Iterator
• You don’t usually call .__iter__ directly
• when needed, call iter(x)
31
@ramalhoorg
Iterator ABC
• Iterator provides
.next
or
.__next__
• .__next__ returns the next item
• You don’t usually call .__next__ directly
• when needed, call next(x)
Python 3
Python 2
Python ≥ 2.6
32
@ramalhoorg
for car in train:
• calls iter(train) to
obtain a TrainIterator
• makes repeated calls to
next(aTrainIterator)
until it raises StopIteration
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __iter__(self):
return TrainIterator(self)
class TrainIterator(object):
def __init__(self, train):
self.train = train
self.current = 0
def __next__(self): # Python 3
if self.current < len(self.train):
self.current += 1
return 'car #%s' % (self.current)
else:
raise StopIteration()
Train with
iterator
1
1
2
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
2
@ramalhoorg34 Richard Bartz/Wikipedia
@ramalhoorg
Iterable duck-like
creatures
35
@ramalhoorg
Design patterns in dynamic
languages
• Dynamic languages: Lisp, Smalltalk, Python,
Ruby, PHP, JavaScript...
• Many features not found in C++, where most
of the original 23 Design Patterns were
identified
• Java is more dynamic than C++, but much
more static than Lisp, Python etc.
36
Gamma, Helm, Johnson, Vlissides
a.k.a. the Gang of Four (GoF)
Peter Norvig:
“Design Patterns in Dynamic Languages”
@ramalhoorg
Dynamic types
• No need to declare types or interfaces
• It does not matter what an object claims do be, only what it is
capable of doing
38
@ramalhoorg
Duck typing
39
“In other words, don't
check whether it is-a duck:
check whether it quacks-
like-a duck, walks-like-a
duck, etc, etc, depending
on exactly what subset of
duck-like behaviour you
need to play your
language-games with.”
Alex Martelli
comp.lang.python (2000)
@ramalhoorg
A Python iterable is...
• An object from which the iter function can produce an iterator
• The iter(x) call:
• invokes x.__iter__() to obtain an iterator
• but, if x has no __iter__:
• iter makes an iterator which tries to fetch items from x by doing
x[0], x[1], x[2]...
sequence protocol
Iterable interface
40
@ramalhoorg
Train: a sequence of cars
train = Train(4)
41
train[0] train[1] train[2] train[3]
Train: a sequence of cars
>>> train = Train(4)
>>> len(train)
4
>>> train[0]
'car #1'
>>> train[3]
'car #4'
>>> train[-1]
'car #4'
>>> train[4]
Traceback (most recent call last):
...
IndexError: no car at 4
>>> for car in train:
... print(car)
car #1
car #2
car #3
car #4
Train: a sequence of cars
class Train(object):
def __init__(self, cars):
self.cars = cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
if __getitem__ exists,
iteration “just works”
@ramalhoorg
The sequence protocol at work
>>> t = Train(4)
>>> len(t)
4
>>> t[0]
'car #1'
>>> t[3]
'car #4'
>>> t[-1]
'car #4'
>>> for car in t:
... print(car)
car #1
car #2
car #3
car #4
__len__
__getitem__
__getitem__
@ramalhoorg
Protocol
• protocol: a synonym for interface used in dynamic
languages like Smalltalk, Python, Ruby, Lisp...
• not declared, and not enforced by static checks
45
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence protocol
__len__ and __getitem__
implement the immutable
sequence protocol
import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence ABC
• collections.Sequence abstract base class
abstract
methods
Python ≥ 2.6
import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence ABC
• collections.Sequence abstract base class
implement
these 2
import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence ABC
• collections.Sequence abstract base class
inherit these 5
@ramalhoorg
Sequence ABC
• collections.Sequence abstract base class
>>> train = Train(4)
>>> 'car #2' in train
True
>>> 'car #7' in train
False
>>> for car in reversed(train):
... print(car)
car #4
car #3
car #2
car #1
>>> train.index('car #3')
2 50
@ramalhoorg51 U.S. NRC/Wikipedia
@ramalhoorg
Generators
52
@ramalhoorg
Iteration in C (example 2)
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; i < argc; i++)
printf("%d : %sn", i, argv[i]);
return 0;
}
$ ./args2 alfa bravo charlie
0 : ./args2
1 : alfa
2 : bravo
3 : charlie
@ramalhoorg
Iteration in Python (ex. 2)
import sys
for i in range(len(sys.argv)):
print i, ':', sys.argv[i]
$ python args2.py alfa bravo charlie
0 : args2.py
1 : alfa
2 : bravo
3 : charlie 54
not Pythonic
@ramalhoorg
Iteration in Python (ex. 2)
import sys
for i, arg in enumerate(sys.argv):
print i, ':', arg
$ python args2.py alfa bravo charlie
0 : args2.py
1 : alfa
2 : bravo
3 : charlie 55
Pythonic!
@ramalhoorg
import sys
for i, arg in enumerate(sys.argv):
print i, ':', arg
Iteration in Python (ex. 2)
$ python args2.py alfa bravo charlie
0 : args2.py
1 : alfa
2 : bravo
3 : charlie
this returns a lazy
iterable object
that object yields tuples
(index, item)
on demand, at each
iteration
56
@ramalhoorg
What
enumerate does >>> e = enumerate('Turing')
>>> e
<enumerate object at 0x...>
>>>
enumerate builds an
enumerate object
57
@ramalhoorg
What
enumerate does
isso constroi
um gerador
and that is iterable
>>> e = enumerate('Turing')
>>> e
<enumerate object at 0x...>
>>> for item in e:
... print item
...
(0, 'T')
(1, 'u')
(2, 'r')
(3, 'i')
(4, 'n')
(5, 'g')
>>>
58
enumerate builds an
enumerate object
@ramalhoorg
What
enumerate does
isso constroi
um gerador
the enumerate object
produces an
(index, item) tuple
for each next(e) call
>>> e = enumerate('Turing')
>>> e
<enumerate object at 0x...>
>>> next(e)
(0, 'T')
>>> next(e)
(1, 'u')
>>> next(e)
(2, 'r')
>>> next(e)
(3, 'i')
>>> next(e)
(4, 'n')
>>> next(e)
(5, 'g')
>>> next(e)
Traceback (most recent...):
...
StopIteration
• The enumerator object is an
example of a generator
@ramalhoorg
Iterator x generator
• By definition (in GoF) an iterator retrieves successive items
from an existing collection
• A generator implements the iterator interface (next) but
produces items not necessarily in a collection
• a generator may iterate over a collection, but return the items
decorated in some way, skip some items...
• it may also produce items independently of any existing data
source (eg. Fibonacci sequence generator)
60
Faraday disc
(Wikipedia)
@ramalhoorg
Very simple
generators
62
@ramalhoorg
Generator
function
• Any function that has the
yield keyword in its body
is a generator function
63
>>> def gen_123():
... yield 1
... yield 2
... yield 3
...
>>> for i in gen_123(): print(i)
1
2
3
>>>
the keyword gen was considered
for defining generator functions,
but def prevailed
@ramalhoorg
• When invoked, a
generator function
returns a
generator object
Generator
function
64
>>> def gen_123():
... yield 1
... yield 2
... yield 3
...
>>> for i in gen_123(): print(i)
1
2
3
>>> g = gen_123()
>>> g
<generator object gen_123 at ...>
@ramalhoorg
Generator
function
>>> def gen_123():
... yield 1
... yield 2
... yield 3
...
>>> g = gen_123()
>>> g
<generator object gen_123 at ...>
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
Traceback (most recent call last):
...
StopIteration
• Generator objects implement the
Iterator interface
65
@ramalhoorg
Generator
behavior
• Note how the output of
the generator function is
interleaved with the
output of the calling code
66
>>> def gen_AB():
... print('START')
... yield 'A'
... print('CONTINUE')
... yield 'B'
... print('END.')
...
>>> for c in gen_AB():
... print('--->', c)
...
START
---> A
CONTINUE
---> B
END.
>>>
@ramalhoorg
Generator
behavior
• The body is executed only
when next is called, and it
runs only up to the following
yield
>>> def gen_AB():
... print('START')
... yield 'A'
... print('CONTINUE')
... yield 'B'
... print('END.')
...
>>> g = gen_AB()
>>> next(g)
START
'A'
>>>
@ramalhoorg
Generator
behavior
• When the body of the function
returns, the generator object
throws StopIteration
• The for statement catches
that for you
68
>>> def gen_AB():
... print('START')
... yield 'A'
... print('CONTINUE')
... yield 'B'
... print('END.')
...
>>> g = gen_AB()
>>> next(g)
START
'A'
>>> next(g)
CONTINUE
'B'
>>> next(g)
END.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function returns, which raises StopIteration
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
# index 2 is car #3
yield 'car #%s' % (i+1)
Train with
generator
function 1
1
2
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
2
Classic iterator
x generator
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __iter__(self):
return TrainIterator(self)
class TrainIterator(object):
def __init__(self, train):
self.train = train
self.current = 0
def __next__(self): # Python 3
if self.current < len(self.train):
self.current += 1
return 'car #%s' % (self.current)
else:
raise StopIteration()
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
2 classes,
12 lines of code
1 class,
3 lines of code
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
The pattern
just vanished
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
“When I see patterns in my
programs, I consider it a sign of
trouble. The shape of a program
should reflect only the problem it
needs to solve. Any other regularity
in the code is a sign, to me at least,
that I'm using abstractions that
aren't powerful enough -- often that
I'm generating by hand the
expansions of some macro that I
need to write.”
Paul Graham
Revenge of the nerds (2002)
Generator expression (genexp)
>>> g = (c for c in 'ABC')
>>> g
<generator object <genexpr> at 0x10045a410>
>>> for l in g:
... print(l)
...
A
B
C
>>>
@ramalhoorg
• When evaluated,
returns a generator object
>>> g = (n for n in [1, 2, 3])
>>> g
<generator object <genexpr> at 0x...>
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Generator expression (genexp)
for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function returns, which raises StopIteration
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
# index 2 is car #3
yield 'car #%s' % (i+1)
Train with
generator
function 1
1
2
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
2
for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function returns, which raises StopIteration
1
2
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
return ('car #%s' % (i+1)
for i in range(self.cars))
Train with generator expression
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
return ('car #%s' % (i+1) for i in range(self.cars))
Generator function
x genexpclass Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
@ramalhoorg
Built-in functions that return
iterables, iterators or generators
• dict
• enumerate
• frozenset
• list
• reversed
• set
• tuple
78
@ramalhoorg
• boundless generators
• count(), cycle(), repeat()
• generators which combine several iterables:
• chain(), tee(), izip(), imap(), product(), compress()...
• generators which select or group items:
• compress(), dropwhile(), groupby(), ifilter(), islice()...
• generators producing combinations of items:
• product(), permutations(), combinations()...
The itertools module Don’t reinvent the
wheel, use itertools!
this was not
reinvented:
ported from
Haskell
great for
MapReduce
@ramalhoorg
Generators in Python 3
• Several functions and methods of the standard library that used to
return lists, now return generators and other lazy iterables in Python 3
• dict.keys(), dict.items(), dict.values()...
• range(...)
• like xrange in Python 2.x (more than a generator)
• If you really need a list, just pass the generator to the list constructor.
Eg.: list(range(10))
80
@ramalhoorg
A practical example using
generator functions
• Generator functions to decouple reading and writing logic in a database
conversion tool designed to handle large datasets
https://github.com/ramalho/isis2json
81
@ramalhoorg
Main loop writes
JSON file
@ramalhoorg
Another loop reads
the input records
@ramalhoorg
One implementation:
same loop reads/writes
@ramalhoorg
But what if we need to read
another format?
@ramalhoorg
Functions in the
script
• iterMstRecords*
• iterIsoRecords*
• writeJsonArray
• main	

* generator functions
@ramalhoorg
main:
read command
line arguments
main: determine
input format
selected generator function is passed
as an argument
input generator function is selected
based on the input file extension
@ramalhoorg
writeJsonArray:
write JSON records
89
writeJsonArray:
iterates over one of the input
generator functions
selected generator function received as
an argument...
and called to produce input
generator
@ramalhoorg
iterIsoRecords:
read records
from ISO-2709
format file
generator function!
91
@ramalhoorg
iterIsoRecords
yields one record,
structured as a dict
creates a new dict in each
iteration
92
@ramalhoorg
iterMstRecords:
read records
from ISIS
.MST file
generator function!
iterIsoRecordsiterMstRecords
yields one record,
structured as a dict
creates a new dict in each
iteration
Generators at work
Generators at work
Generators at work
@ramalhoorg
We did not cover
• other generator methods:
• gen.close(): causes a GeneratorExit exception to be raised
within the generator body, at the point where it is paused
• gen.throw(e): causes any exception e to be raised within the
generator body, at the point it where is paused
Mostly useful for long-running processes.
Often not needed in batch processing scripts.
98
@ramalhoorg
We did not cover
• generator delegation with yield from
• sending data into a generator function with the
gen.send(x) method (instead of next(gen)),
and using yield as an expression to get the
data sent
• using generator functions as coroutines
not useful in the
context of iteration
Python ≥ 3.3
“Coroutines are not
related to iteration”
David Beazley
99
@ramalhoorg
How to learn generators
• Forget about .send() and coroutines: that is a completely different
subject. Look into that only after mastering and becoming really
confortable using generators for iteration.
• Study and use the itertools module
• Don’t worry about .close() and .throw() initially. You can be
productive with generators without using these methods.
• yield from is only available in Python 3.3, and only relevant if you
need to use .close() and .throw()
100
Q & A
Luciano Ramalho
luciano@ramalho.org
@ramalhoorg
https://github.com/ramalho/isis2json

The Vanishing Pattern: from iterators to generators in Python

  • 1.
    TheVanishing Pattern from iterators to generatorsin Python Luciano Ramalho ramalho@turing.com.br @ramalhoorg
  • 2.
  • 3.
    >>> from django.dbimport connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
  • 4.
    >>> from django.dbimport connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet
  • 5.
    >>> from django.dbimport connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet QuerySets are “lazy”: no database access so far
  • 6.
    >>> from django.dbimport connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet QuerySets are “lazy”: no database access so far the query is made only when we iterate over the results
  • 7.
  • 8.
    @ramalhoorg QuerySet is alazy iterable technical term 8
  • 9.
    @ramalhoorg Lazy • Avoids unnecessarywork, by postponing it as long as possible • The opposite of eager 9 In Computer Science, being “lazy” is often a good thing!
  • 10.
  • 11.
    @ramalhoorg Iteration: C andPython #include <stdio.h> int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%sn", argv[i]); return 0; } import sys for arg in sys.argv: print arg
  • 12.
    @ramalhoorg Iteration: Java (classic) classArguments { public static void main(String[] args) { for (int i=0; i < args.length; i++) System.out.println(args[i]); } } $ java Arguments alfa bravo charlie alfa bravo charlie
  • 13.
    @ramalhoorg Iteration: Java ≥1.5 $java Arguments2 alfa bravo charlie alfa bravo charlie • Enhanced for (a.k.a. foreach) since 2004 class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } }
  • 14.
    @ramalhoorg Iteration: Java ≥1.5 •Enhanced for (a.k.a. foreach) class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } } since 2004 import sys for arg in sys.argv: print arg since 1991
  • 15.
    @ramalhoorg You can iterateover many Python objects • strings • files • XML: ElementTree nodes • not limited to built-in types: • Django QuerySet • etc. 15
  • 16.
    @ramalhoorg So, what isan iterable? • Informal, recursive definition: • iterable: fit to be iterated • just as: edible: fit to be eaten 16
  • 17.
    @ramalhoorg The for loopstatement is not the only construct that handles iterables... 17
  • 18.
    List comprehension ● Compreensãode lista ou abrangência ● Exemplo: usar todos os elementos: – L2 = [n*10 for n in L] List comprehension • An expression that builds a list from any iterable >>> s = 'abracadabra' >>> l = [ord(c) for c in s] >>> l [97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97] input: any iterable object output: a list (always)
  • 19.
    @ramalhoorg Set comprehension • Anexpression that builds a set from any iterable >>> s = 'abracadabra' >>> set(s) {'b', 'r', 'a', 'd', 'c'} >>> {ord(c) for c in s} {97, 98, 99, 100, 114} 19
  • 20.
    @ramalhoorg Dict comprehensions • Anexpression that builds a dict from any iterable >>> s = 'abracadabra' >>> {c:ord(c) for c in s} {'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100} 20
  • 21.
    @ramalhoorg Syntactic support foriterables • Tuple unpacking, parallel assignment >>> a, b, c = 'XYZ' >>> a 'X' >>> b 'Y' >>> c 'Z' 21 >>> l = [(c, ord(c)) for c in 'XYZ'] >>> l [('X', 88), ('Y', 89), ('Z', 90)] >>> for char, code in l: ... print char, '->', code ... X -> 88 Y -> 89 Z -> 90
  • 22.
    @ramalhoorg Syntactic support foriterables (2) • Function calls: exploding arguments with * >>> import math >>> def hypotenuse(a, b): ... return math.sqrt(a*a + b*b) ... >>> hypotenuse(3, 4) 5.0 >>> sides = (3, 4) >>> hypotenuse(sides) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: hypotenuse() takes exactly 2 arguments (1 given) >>> hypotenuse(*sides) 5.0 22
  • 23.
    @ramalhoorg Built-in iterable types •basestring • str • unicode • dict • file • frozenset • list • set • tuple • xrange 23
  • 24.
    @ramalhoorg Built-in functions thattake iterable arguments • all • any • filter • iter • len • map • max • min • reduce • sorted • sum • zip unrelated to compression
  • 25.
  • 26.
    @ramalhoorg Iterator is... • aclassic design pattern Design Patterns Gamma, Helm, Johnson &Vlissides Addison-Wesley, ISBN 0-201-63361-2 26
  • 27.
    @ramalhoorg Head First Design PatternsPoster O'Reilly, ISBN 0-596-10214-3 27
  • 28.
    @ramalhoorg Head First Design PatternsPoster O'Reilly, ISBN 0-596-10214-3 28 “The Iterator Pattern provides a way to access the elements of an aggregate object sequentially without exposing the underlying representation.”
  • 29.
    An iterable Trainclass >>> train = Train(4) >>> for car in train: ... print(car) car #1 car #2 car #3 car #4 >>>
  • 30.
    @ramalhoorg class Train(object): def __init__(self,cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() An iterable Train with iterator iterable iterator
  • 31.
    @ramalhoorg Iterable ABC • collections.Iterableabstract base class • A concrete subclass of Iterable must implement .__iter__ • .__iter__ returns an Iterator • You don’t usually call .__iter__ directly • when needed, call iter(x) 31
  • 32.
    @ramalhoorg Iterator ABC • Iteratorprovides .next or .__next__ • .__next__ returns the next item • You don’t usually call .__next__ directly • when needed, call next(x) Python 3 Python 2 Python ≥ 2.6 32
  • 33.
    @ramalhoorg for car intrain: • calls iter(train) to obtain a TrainIterator • makes repeated calls to next(aTrainIterator) until it raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() Train with iterator 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  • 34.
  • 35.
  • 36.
    @ramalhoorg Design patterns indynamic languages • Dynamic languages: Lisp, Smalltalk, Python, Ruby, PHP, JavaScript... • Many features not found in C++, where most of the original 23 Design Patterns were identified • Java is more dynamic than C++, but much more static than Lisp, Python etc. 36 Gamma, Helm, Johnson, Vlissides a.k.a. the Gang of Four (GoF)
  • 37.
    Peter Norvig: “Design Patternsin Dynamic Languages”
  • 38.
    @ramalhoorg Dynamic types • Noneed to declare types or interfaces • It does not matter what an object claims do be, only what it is capable of doing 38
  • 39.
    @ramalhoorg Duck typing 39 “In otherwords, don't check whether it is-a duck: check whether it quacks- like-a duck, walks-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.” Alex Martelli comp.lang.python (2000)
  • 40.
    @ramalhoorg A Python iterableis... • An object from which the iter function can produce an iterator • The iter(x) call: • invokes x.__iter__() to obtain an iterator • but, if x has no __iter__: • iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]... sequence protocol Iterable interface 40
  • 41.
    @ramalhoorg Train: a sequenceof cars train = Train(4) 41 train[0] train[1] train[2] train[3]
  • 42.
    Train: a sequenceof cars >>> train = Train(4) >>> len(train) 4 >>> train[0] 'car #1' >>> train[3] 'car #4' >>> train[-1] 'car #4' >>> train[4] Traceback (most recent call last): ... IndexError: no car at 4 >>> for car in train: ... print(car) car #1 car #2 car #3 car #4
  • 43.
    Train: a sequenceof cars class Train(object): def __init__(self, cars): self.cars = cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) if __getitem__ exists, iteration “just works”
  • 44.
    @ramalhoorg The sequence protocolat work >>> t = Train(4) >>> len(t) 4 >>> t[0] 'car #1' >>> t[3] 'car #4' >>> t[-1] 'car #4' >>> for car in t: ... print(car) car #1 car #2 car #3 car #4 __len__ __getitem__ __getitem__
  • 45.
    @ramalhoorg Protocol • protocol: asynonym for interface used in dynamic languages like Smalltalk, Python, Ruby, Lisp... • not declared, and not enforced by static checks 45
  • 46.
    class Train(object): def __init__(self,cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence protocol __len__ and __getitem__ implement the immutable sequence protocol
  • 47.
    import collections class Train(collections.Sequence): def__init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class abstract methods Python ≥ 2.6
  • 48.
    import collections class Train(collections.Sequence): def__init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class implement these 2
  • 49.
    import collections class Train(collections.Sequence): def__init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class inherit these 5
  • 50.
    @ramalhoorg Sequence ABC • collections.Sequenceabstract base class >>> train = Train(4) >>> 'car #2' in train True >>> 'car #7' in train False >>> for car in reversed(train): ... print(car) car #4 car #3 car #2 car #1 >>> train.index('car #3') 2 50
  • 51.
  • 52.
  • 53.
    @ramalhoorg Iteration in C(example 2) #include <stdio.h> int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%d : %sn", i, argv[i]); return 0; } $ ./args2 alfa bravo charlie 0 : ./args2 1 : alfa 2 : bravo 3 : charlie
  • 54.
    @ramalhoorg Iteration in Python(ex. 2) import sys for i in range(len(sys.argv)): print i, ':', sys.argv[i] $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie 54 not Pythonic
  • 55.
    @ramalhoorg Iteration in Python(ex. 2) import sys for i, arg in enumerate(sys.argv): print i, ':', arg $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie 55 Pythonic!
  • 56.
    @ramalhoorg import sys for i,arg in enumerate(sys.argv): print i, ':', arg Iteration in Python (ex. 2) $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie this returns a lazy iterable object that object yields tuples (index, item) on demand, at each iteration 56
  • 57.
    @ramalhoorg What enumerate does >>>e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> enumerate builds an enumerate object 57
  • 58.
    @ramalhoorg What enumerate does isso constroi umgerador and that is iterable >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> for item in e: ... print item ... (0, 'T') (1, 'u') (2, 'r') (3, 'i') (4, 'n') (5, 'g') >>> 58 enumerate builds an enumerate object
  • 59.
    @ramalhoorg What enumerate does isso constroi umgerador the enumerate object produces an (index, item) tuple for each next(e) call >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> next(e) (0, 'T') >>> next(e) (1, 'u') >>> next(e) (2, 'r') >>> next(e) (3, 'i') >>> next(e) (4, 'n') >>> next(e) (5, 'g') >>> next(e) Traceback (most recent...): ... StopIteration • The enumerator object is an example of a generator
  • 60.
    @ramalhoorg Iterator x generator •By definition (in GoF) an iterator retrieves successive items from an existing collection • A generator implements the iterator interface (next) but produces items not necessarily in a collection • a generator may iterate over a collection, but return the items decorated in some way, skip some items... • it may also produce items independently of any existing data source (eg. Fibonacci sequence generator) 60
  • 61.
  • 62.
  • 63.
    @ramalhoorg Generator function • Any functionthat has the yield keyword in its body is a generator function 63 >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> the keyword gen was considered for defining generator functions, but def prevailed
  • 64.
    @ramalhoorg • When invoked,a generator function returns a generator object Generator function 64 >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> g = gen_123() >>> g <generator object gen_123 at ...>
  • 65.
    @ramalhoorg Generator function >>> def gen_123(): ...yield 1 ... yield 2 ... yield 3 ... >>> g = gen_123() >>> g <generator object gen_123 at ...> >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): ... StopIteration • Generator objects implement the Iterator interface 65
  • 66.
    @ramalhoorg Generator behavior • Note howthe output of the generator function is interleaved with the output of the calling code 66 >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> for c in gen_AB(): ... print('--->', c) ... START ---> A CONTINUE ---> B END. >>>
  • 67.
    @ramalhoorg Generator behavior • The bodyis executed only when next is called, and it runs only up to the following yield >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> g = gen_AB() >>> next(g) START 'A' >>>
  • 68.
    @ramalhoorg Generator behavior • When thebody of the function returns, the generator object throws StopIteration • The for statement catches that for you 68 >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> g = gen_AB() >>> next(g) START 'A' >>> next(g) CONTINUE 'B' >>> next(g) END. Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
  • 69.
    for car intrain: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  • 70.
    Classic iterator x generator classTrain(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) 2 classes, 12 lines of code 1 class, 3 lines of code
  • 71.
    class Train(object): def __init__(self,cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) The pattern just vanished
  • 72.
    class Train(object): def __init__(self,cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) “When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough -- often that I'm generating by hand the expansions of some macro that I need to write.” Paul Graham Revenge of the nerds (2002)
  • 73.
    Generator expression (genexp) >>>g = (c for c in 'ABC') >>> g <generator object <genexpr> at 0x10045a410> >>> for l in g: ... print(l) ... A B C >>>
  • 74.
    @ramalhoorg • When evaluated, returnsa generator object >>> g = (n for n in [1, 2, 3]) >>> g <generator object <genexpr> at 0x...> >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration Generator expression (genexp)
  • 75.
    for car intrain: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  • 76.
    for car intrain: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration 1 2 class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Train with generator expression >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3
  • 77.
    class Train(object): def __init__(self,cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Generator function x genexpclass Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)
  • 78.
    @ramalhoorg Built-in functions thatreturn iterables, iterators or generators • dict • enumerate • frozenset • list • reversed • set • tuple 78
  • 79.
    @ramalhoorg • boundless generators •count(), cycle(), repeat() • generators which combine several iterables: • chain(), tee(), izip(), imap(), product(), compress()... • generators which select or group items: • compress(), dropwhile(), groupby(), ifilter(), islice()... • generators producing combinations of items: • product(), permutations(), combinations()... The itertools module Don’t reinvent the wheel, use itertools! this was not reinvented: ported from Haskell great for MapReduce
  • 80.
    @ramalhoorg Generators in Python3 • Several functions and methods of the standard library that used to return lists, now return generators and other lazy iterables in Python 3 • dict.keys(), dict.items(), dict.values()... • range(...) • like xrange in Python 2.x (more than a generator) • If you really need a list, just pass the generator to the list constructor. Eg.: list(range(10)) 80
  • 81.
    @ramalhoorg A practical exampleusing generator functions • Generator functions to decouple reading and writing logic in a database conversion tool designed to handle large datasets https://github.com/ramalho/isis2json 81
  • 82.
  • 83.
  • 84.
  • 85.
    @ramalhoorg But what ifwe need to read another format?
  • 86.
    @ramalhoorg Functions in the script •iterMstRecords* • iterIsoRecords* • writeJsonArray • main * generator functions
  • 87.
  • 88.
    main: determine input format selectedgenerator function is passed as an argument input generator function is selected based on the input file extension
  • 89.
  • 90.
    writeJsonArray: iterates over oneof the input generator functions selected generator function received as an argument... and called to produce input generator
  • 91.
  • 92.
    @ramalhoorg iterIsoRecords yields one record, structuredas a dict creates a new dict in each iteration 92
  • 93.
  • 94.
    iterIsoRecordsiterMstRecords yields one record, structuredas a dict creates a new dict in each iteration
  • 95.
  • 96.
  • 97.
  • 98.
    @ramalhoorg We did notcover • other generator methods: • gen.close(): causes a GeneratorExit exception to be raised within the generator body, at the point where it is paused • gen.throw(e): causes any exception e to be raised within the generator body, at the point it where is paused Mostly useful for long-running processes. Often not needed in batch processing scripts. 98
  • 99.
    @ramalhoorg We did notcover • generator delegation with yield from • sending data into a generator function with the gen.send(x) method (instead of next(gen)), and using yield as an expression to get the data sent • using generator functions as coroutines not useful in the context of iteration Python ≥ 3.3 “Coroutines are not related to iteration” David Beazley 99
  • 100.
    @ramalhoorg How to learngenerators • Forget about .send() and coroutines: that is a completely different subject. Look into that only after mastering and becoming really confortable using generators for iteration. • Study and use the itertools module • Don’t worry about .close() and .throw() initially. You can be productive with generators without using these methods. • yield from is only available in Python 3.3, and only relevant if you need to use .close() and .throw() 100
  • 101.
    Q & A LucianoRamalho luciano@ramalho.org @ramalhoorg https://github.com/ramalho/isis2json