The Vanishing Pattern: from iterators to generators in Python

  • 2,506 views
Uploaded on

The core of the talk is refactoring a simple iterable class from the classic Iterator design pattern (as implemented in the GoF book) to compatible but less verbose implementations using generators. …

The core of the talk is refactoring a simple iterable class from the classic Iterator design pattern (as implemented in the GoF book) to compatible but less verbose implementations using generators. This provides a meaningful context to understand the value of generators. Along the way the behavior of the iter function, the Sequence protocol and the Iterable interface are presented. The motivating examples of this talk are database applications.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,506
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
15
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. TheVanishing Pattern from iterators to generators in Python Luciano Ramalho ramalho@turing.com.br @ramalhoorg
  • 2. @ramalhoorg Demo: laziness in the Django Shell 2
  • 3. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
  • 4. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet
  • 5. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet QuerySets are “lazy”: no database access so far
  • 6. >>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] this expression makes a Django QuerySet QuerySets are “lazy”: no database access so far the query is made only when we iterate over the results
  • 7. @ramalhoorg QuerySet is a lazy iterable 7
  • 8. @ramalhoorg QuerySet is a lazy iterable technical term 8
  • 9. @ramalhoorg Lazy • Avoids unnecessary work, by postponing it as long as possible • The opposite of eager 9 In Computer Science, being “lazy” is often a good thing!
  • 10. @ramalhoorg Now, back to basics... 10
  • 11. @ramalhoorg Iteration: C and Python #include <stdio.h> int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%sn", argv[i]); return 0; } import sys for arg in sys.argv: print arg
  • 12. @ramalhoorg Iteration: Java (classic) class Arguments { public static void main(String[] args) { for (int i=0; i < args.length; i++) System.out.println(args[i]); } } $ java Arguments alfa bravo charlie alfa bravo charlie
  • 13. @ramalhoorg Iteration: Java ≥1.5 $ java Arguments2 alfa bravo charlie alfa bravo charlie • Enhanced for (a.k.a. foreach) since 2004 class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } }
  • 14. @ramalhoorg Iteration: Java ≥1.5 • Enhanced for (a.k.a. foreach) class Arguments2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } } since 2004 import sys for arg in sys.argv: print arg since 1991
  • 15. @ramalhoorg You can iterate over many Python objects • strings • files • XML: ElementTree nodes • not limited to built-in types: • Django QuerySet • etc. 15
  • 16. @ramalhoorg So, what is an iterable? • Informal, recursive definition: • iterable: fit to be iterated • just as: edible: fit to be eaten 16
  • 17. @ramalhoorg The for loop statement is not the only construct that handles iterables... 17
  • 18. List comprehension ● Compreensão de lista ou abrangência ● Exemplo: usar todos os elementos: – L2 = [n*10 for n in L] List comprehension • An expression that builds a list from any iterable >>> s = 'abracadabra' >>> l = [ord(c) for c in s] >>> l [97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97] input: any iterable object output: a list (always)
  • 19. @ramalhoorg Set comprehension • An expression that builds a set from any iterable >>> s = 'abracadabra' >>> set(s) {'b', 'r', 'a', 'd', 'c'} >>> {ord(c) for c in s} {97, 98, 99, 100, 114} 19
  • 20. @ramalhoorg Dict comprehensions • An expression that builds a dict from any iterable >>> s = 'abracadabra' >>> {c:ord(c) for c in s} {'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100} 20
  • 21. @ramalhoorg Syntactic support for iterables • Tuple unpacking, parallel assignment >>> a, b, c = 'XYZ' >>> a 'X' >>> b 'Y' >>> c 'Z' 21 >>> l = [(c, ord(c)) for c in 'XYZ'] >>> l [('X', 88), ('Y', 89), ('Z', 90)] >>> for char, code in l: ... print char, '->', code ... X -> 88 Y -> 89 Z -> 90
  • 22. @ramalhoorg Syntactic support for iterables (2) • Function calls: exploding arguments with * >>> import math >>> def hypotenuse(a, b): ... return math.sqrt(a*a + b*b) ... >>> hypotenuse(3, 4) 5.0 >>> sides = (3, 4) >>> hypotenuse(sides) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: hypotenuse() takes exactly 2 arguments (1 given) >>> hypotenuse(*sides) 5.0 22
  • 23. @ramalhoorg Built-in iterable types • basestring • str • unicode • dict • file • frozenset • list • set • tuple • xrange 23
  • 24. @ramalhoorg Built-in functions that take iterable arguments • all • any • filter • iter • len • map • max • min • reduce • sorted • sum • zip unrelated to compression
  • 25. @ramalhoorg Classic iterables in Python 25
  • 26. @ramalhoorg Iterator is... • a classic design pattern Design Patterns Gamma, Helm, Johnson &Vlissides Addison-Wesley, ISBN 0-201-63361-2 26
  • 27. @ramalhoorg Head First Design Patterns Poster O'Reilly, ISBN 0-596-10214-3 27
  • 28. @ramalhoorg Head First Design Patterns Poster O'Reilly, ISBN 0-596-10214-3 28 “The Iterator Pattern provides a way to access the elements of an aggregate object sequentially without exposing the underlying representation.”
  • 29. An iterable Train class >>> train = Train(4) >>> for car in train: ... print(car) car #1 car #2 car #3 car #4 >>>
  • 30. @ramalhoorg class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() An iterable Train with iterator iterable iterator
  • 31. @ramalhoorg Iterable ABC • collections.Iterable abstract base class • A concrete subclass of Iterable must implement .__iter__ • .__iter__ returns an Iterator • You don’t usually call .__iter__ directly • when needed, call iter(x) 31
  • 32. @ramalhoorg Iterator ABC • Iterator provides .next or .__next__ • .__next__ returns the next item • You don’t usually call .__next__ directly • when needed, call next(x) Python 3 Python 2 Python ≥ 2.6 32
  • 33. @ramalhoorg for car in train: • calls iter(train) to obtain a TrainIterator • makes repeated calls to next(aTrainIterator) until it raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() Train with iterator 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  • 34. @ramalhoorg34 Richard Bartz/Wikipedia
  • 35. @ramalhoorg Iterable duck-like creatures 35
  • 36. @ramalhoorg Design patterns in dynamic languages • Dynamic languages: Lisp, Smalltalk, Python, Ruby, PHP, JavaScript... • Many features not found in C++, where most of the original 23 Design Patterns were identified • Java is more dynamic than C++, but much more static than Lisp, Python etc. 36 Gamma, Helm, Johnson, Vlissides a.k.a. the Gang of Four (GoF)
  • 37. Peter Norvig: “Design Patterns in Dynamic Languages”
  • 38. @ramalhoorg Dynamic types • No need to declare types or interfaces • It does not matter what an object claims do be, only what it is capable of doing 38
  • 39. @ramalhoorg Duck typing 39 “In other words, don't check whether it is-a duck: check whether it quacks- like-a duck, walks-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.” Alex Martelli comp.lang.python (2000)
  • 40. @ramalhoorg A Python iterable is... • An object from which the iter function can produce an iterator • The iter(x) call: • invokes x.__iter__() to obtain an iterator • but, if x has no __iter__: • iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]... sequence protocol Iterable interface 40
  • 41. @ramalhoorg Train: a sequence of cars train = Train(4) 41 train[0] train[1] train[2] train[3]
  • 42. Train: a sequence of cars >>> train = Train(4) >>> len(train) 4 >>> train[0] 'car #1' >>> train[3] 'car #4' >>> train[-1] 'car #4' >>> train[4] Traceback (most recent call last): ... IndexError: no car at 4 >>> for car in train: ... print(car) car #1 car #2 car #3 car #4
  • 43. Train: a sequence of cars class Train(object): def __init__(self, cars): self.cars = cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) if __getitem__ exists, iteration “just works”
  • 44. @ramalhoorg The sequence protocol at work >>> t = Train(4) >>> len(t) 4 >>> t[0] 'car #1' >>> t[3] 'car #4' >>> t[-1] 'car #4' >>> for car in t: ... print(car) car #1 car #2 car #3 car #4 __len__ __getitem__ __getitem__
  • 45. @ramalhoorg Protocol • protocol: a synonym for interface used in dynamic languages like Smalltalk, Python, Ruby, Lisp... • not declared, and not enforced by static checks 45
  • 46. class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence protocol __len__ and __getitem__ implement the immutable sequence protocol
  • 47. import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class abstract methods Python ≥ 2.6
  • 48. import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class implement these 2
  • 49. import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class inherit these 5
  • 50. @ramalhoorg Sequence ABC • collections.Sequence abstract base class >>> train = Train(4) >>> 'car #2' in train True >>> 'car #7' in train False >>> for car in reversed(train): ... print(car) car #4 car #3 car #2 car #1 >>> train.index('car #3') 2 50
  • 51. @ramalhoorg51 U.S. NRC/Wikipedia
  • 52. @ramalhoorg Generators 52
  • 53. @ramalhoorg Iteration in C (example 2) #include <stdio.h> int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%d : %sn", i, argv[i]); return 0; } $ ./args2 alfa bravo charlie 0 : ./args2 1 : alfa 2 : bravo 3 : charlie
  • 54. @ramalhoorg Iteration in Python (ex. 2) import sys for i in range(len(sys.argv)): print i, ':', sys.argv[i] $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie 54 not Pythonic
  • 55. @ramalhoorg Iteration in Python (ex. 2) import sys for i, arg in enumerate(sys.argv): print i, ':', arg $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie 55 Pythonic!
  • 56. @ramalhoorg import sys for i, arg in enumerate(sys.argv): print i, ':', arg Iteration in Python (ex. 2) $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie this returns a lazy iterable object that object yields tuples (index, item) on demand, at each iteration 56
  • 57. @ramalhoorg What enumerate does >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> enumerate builds an enumerate object 57
  • 58. @ramalhoorg What enumerate does isso constroi um gerador and that is iterable >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> for item in e: ... print item ... (0, 'T') (1, 'u') (2, 'r') (3, 'i') (4, 'n') (5, 'g') >>> 58 enumerate builds an enumerate object
  • 59. @ramalhoorg What enumerate does isso constroi um gerador the enumerate object produces an (index, item) tuple for each next(e) call >>> e = enumerate('Turing') >>> e <enumerate object at 0x...> >>> next(e) (0, 'T') >>> next(e) (1, 'u') >>> next(e) (2, 'r') >>> next(e) (3, 'i') >>> next(e) (4, 'n') >>> next(e) (5, 'g') >>> next(e) Traceback (most recent...): ... StopIteration • The enumerator object is an example of a generator
  • 60. @ramalhoorg Iterator x generator • By definition (in GoF) an iterator retrieves successive items from an existing collection • A generator implements the iterator interface (next) but produces items not necessarily in a collection • a generator may iterate over a collection, but return the items decorated in some way, skip some items... • it may also produce items independently of any existing data source (eg. Fibonacci sequence generator) 60
  • 61. Faraday disc (Wikipedia)
  • 62. @ramalhoorg Very simple generators 62
  • 63. @ramalhoorg Generator function • Any function that has the yield keyword in its body is a generator function 63 >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> the keyword gen was considered for defining generator functions, but def prevailed
  • 64. @ramalhoorg • When invoked, a generator function returns a generator object Generator function 64 >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> g = gen_123() >>> g <generator object gen_123 at ...>
  • 65. @ramalhoorg Generator function >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> g = gen_123() >>> g <generator object gen_123 at ...> >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): ... StopIteration • Generator objects implement the Iterator interface 65
  • 66. @ramalhoorg Generator behavior • Note how the output of the generator function is interleaved with the output of the calling code 66 >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> for c in gen_AB(): ... print('--->', c) ... START ---> A CONTINUE ---> B END. >>>
  • 67. @ramalhoorg Generator behavior • The body is executed only when next is called, and it runs only up to the following yield >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> g = gen_AB() >>> next(g) START 'A' >>>
  • 68. @ramalhoorg Generator behavior • When the body of the function returns, the generator object throws StopIteration • The for statement catches that for you 68 >>> def gen_AB(): ... print('START') ... yield 'A' ... print('CONTINUE') ... yield 'B' ... print('END.') ... >>> g = gen_AB() >>> next(g) START 'A' >>> next(g) CONTINUE 'B' >>> next(g) END. Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
  • 69. for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  • 70. Classic iterator x generator class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) 2 classes, 12 lines of code 1 class, 3 lines of code
  • 71. class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) The pattern just vanished
  • 72. class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) “When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough -- often that I'm generating by hand the expansions of some macro that I need to write.” Paul Graham Revenge of the nerds (2002)
  • 73. Generator expression (genexp) >>> g = (c for c in 'ABC') >>> g <generator object <genexpr> at 0x10045a410> >>> for l in g: ... print(l) ... A B C >>>
  • 74. @ramalhoorg • When evaluated, returns a generator object >>> g = (n for n in [1, 2, 3]) >>> g <generator object <genexpr> at 0x...> >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration Generator expression (genexp)
  • 75. for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 2
  • 76. for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration 1 2 class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Train with generator expression >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3
  • 77. class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Generator function x genexpclass Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1)
  • 78. @ramalhoorg Built-in functions that return iterables, iterators or generators • dict • enumerate • frozenset • list • reversed • set • tuple 78
  • 79. @ramalhoorg • boundless generators • count(), cycle(), repeat() • generators which combine several iterables: • chain(), tee(), izip(), imap(), product(), compress()... • generators which select or group items: • compress(), dropwhile(), groupby(), ifilter(), islice()... • generators producing combinations of items: • product(), permutations(), combinations()... The itertools module Don’t reinvent the wheel, use itertools! this was not reinvented: ported from Haskell great for MapReduce
  • 80. @ramalhoorg Generators in Python 3 • Several functions and methods of the standard library that used to return lists, now return generators and other lazy iterables in Python 3 • dict.keys(), dict.items(), dict.values()... • range(...) • like xrange in Python 2.x (more than a generator) • If you really need a list, just pass the generator to the list constructor. Eg.: list(range(10)) 80
  • 81. @ramalhoorg A practical example using generator functions • Generator functions to decouple reading and writing logic in a database conversion tool designed to handle large datasets https://github.com/ramalho/isis2json 81
  • 82. @ramalhoorg Main loop writes JSON file
  • 83. @ramalhoorg Another loop reads the input records
  • 84. @ramalhoorg One implementation: same loop reads/writes
  • 85. @ramalhoorg But what if we need to read another format?
  • 86. @ramalhoorg Functions in the script • iterMstRecords* • iterIsoRecords* • writeJsonArray • main * generator functions
  • 87. @ramalhoorg main: read command line arguments
  • 88. main: determine input format selected generator function is passed as an argument input generator function is selected based on the input file extension
  • 89. @ramalhoorg writeJsonArray: write JSON records 89
  • 90. writeJsonArray: iterates over one of the input generator functions selected generator function received as an argument... and called to produce input generator
  • 91. @ramalhoorg iterIsoRecords: read records from ISO-2709 format file generator function! 91
  • 92. @ramalhoorg iterIsoRecords yields one record, structured as a dict creates a new dict in each iteration 92
  • 93. @ramalhoorg iterMstRecords: read records from ISIS .MST file generator function!
  • 94. iterIsoRecordsiterMstRecords yields one record, structured as a dict creates a new dict in each iteration
  • 95. Generators at work
  • 96. Generators at work
  • 97. Generators at work
  • 98. @ramalhoorg We did not cover • other generator methods: • gen.close(): causes a GeneratorExit exception to be raised within the generator body, at the point where it is paused • gen.throw(e): causes any exception e to be raised within the generator body, at the point it where is paused Mostly useful for long-running processes. Often not needed in batch processing scripts. 98
  • 99. @ramalhoorg We did not cover • generator delegation with yield from • sending data into a generator function with the gen.send(x) method (instead of next(gen)), and using yield as an expression to get the data sent • using generator functions as coroutines not useful in the context of iteration Python ≥ 3.3 “Coroutines are not related to iteration” David Beazley 99
  • 100. @ramalhoorg How to learn generators • Forget about .send() and coroutines: that is a completely different subject. Look into that only after mastering and becoming really confortable using generators for iteration. • Study and use the itertools module • Don’t worry about .close() and .throw() initially. You can be productive with generators without using these methods. • yield from is only available in Python 3.3, and only relevant if you need to use .close() and .throw() 100
  • 101. Q & A Luciano Ramalho luciano@ramalho.org @ramalhoorg https://github.com/ramalho/isis2json