More topics in Advanced Python 
© 2014 Zaar Hai 
● Generators 
● Async programming 
© 2014 Zaar Hai
Appetizer – Slots vs Dictionaries 
(Almost) every python object has built-in __dict__ dictionary 
It can be memory wasteful for numerous objects having only 
small amount of attributes 
class A(object): 
class B(object): 
__slots__ = ["a","b"] 
>>> A().c = 1 
>>> B().c = 1 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
AttributeError: 'B' object has no attribute 'c' 
Slots come to save memory (and CPU) 
But do they really? 
© 2014 Zaar Hai
Slots vs Dictionaries - competitors 
class A(object): 
# __slots__ = ["a", "b", "c"] 
def __init__(self): 
self.a = "foot" 
self.b = 2 
self.c = True 
l = [] 
for i in xrange(50000000): 
import resource 
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
© 2014 Zaar Hai
Slots vs Dictionaries – memory 
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict 
Pypy dict 
1000 10000 100000 1000000 
© 2014 Zaar Hai 
Memory - megabytes
Slots vs Dictionaries – MEMORY 
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict 
1000 10000 100000 1000000 10000000 50000000 
© 2014 Zaar Hai 
Memory - megabytes
Slots vs Dictionaries – cpu 
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict 
1000 10000 100000 1000000 
© 2014 Zaar Hai 
Time - seconds
Slots vs Dictionaries – CPU 
Py 2.7 slots Py 3.4 slots Pypy slots 
Py 2.7 dict Py 3.4 dict Pypy dict 
1000 10000 100000 1000000 10000000 50000000 
© 2014 Zaar Hai 
Time - seconds
Slots vs Dictionaries - conclusions 
Slots vs dicts – and the winner is... PyPy 
Seriously – forget the slots, and just move to PyPy if 
performance becomes an issue. As a bonus you get 
performance improvements in other areas 
Most important – run your micro benchmarks before jumping 
into new stuff 
© 2014 Zaar Hai
© 2014 Zaar Hai 
The magic yield statement 
A function becomes a generator if it contains yield statement 
def gen(): 
yield 1 
yield 2 
When invoked - “nothing” happens. i.e. function code does 
not run yet 
>>> g = gen() 
>>> g 
<generator object gen at 0x7f423b1b3f00> 
next() method runs function until next yield statement and 
returns yielded value 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
© 2014 Zaar Hai
Generator exceptions 
StopIteration is raised when generator is exhausted 
for statement catches StopIteration automagically 
>>> for i in gen(): 
... print i 
If generator function raises exception, generator stops 
def gen2(): 
yield 1 
raise ValueError 
yield 2 
>>> g = gen2() 
1 >>> 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "<stdin>", line 3, in gen2 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
© 2014 Zaar Hai
Stopping generator prematurely 
def producer(): 
conn = db.connection() 
for row in conn.execute("SELECT * FROM t LIMIT 1000") 
yield row 
def consumer(): 
rows = producer() 
print "First row %s" % 
In the above example connection will never be closed. Fix: 
def producer(): 
conn = db.connection() 
for row in conn.execute("SELECT * FROM t LIMIT 1000") 
yield row 
def consumer(): 
rows = producer() 
print "First row %s" % 
rows.close() # Will raise GeneratorExit in producer code 
© 2014 Zaar Hai
Syntactic sugar 
Most of us use generators without even knowing about them 
>>> [i for i in [1,2,3]] 
[1, 2, 3] 
However there is generator inside […] above 
>>> ( i for i in [1,2,3] ) 
<generator object <genexpr> at 0x7f423b1b3f00> 
list's constructor detects that input argument is a sequence 
and iterates through it to create itself 
More goodies: 
>>> [i for i in range(6, 100) if i % 6 == i % 7 ] 
[42, 43, 44, 45, 46, 47, 84, 85, 86, 87, 88, 89] 
© 2014 Zaar Hai
Generators produce stuff on demand 
Writing Fibonacci series generator is a piece of cake: 
def fibogen(): 
a,b = 0,1 
yield a 
yield b 
while True: 
a, b = b, a + b 
yield b 
No recursion 
O(1) memory 
Generates as much as you want to consume 
© 2014 Zaar Hai
Returning value from a generator 
Only None can be returned from generator until Python 3.3 
Since 3.3 you can: 
def gen(): 
yield 1 
yield 2 
return 3 
>>> g=gen() 
>>> next(g) 
1 >>> next(g) 
2 >>> try: 
... next(g) 
... except StopIteration as e: 
... print(e.value) 
In earlier versions: 
class Return(Exception): 
def __init__(self, value): 
self.value = value 
Then raise it from generator and catch outside 
© 2014 Zaar Hai
Consumer generator 
You can send stuff back to generator 
def db_stream(): 
conn = db.connection() 
while True: 
row = yield 
conn.execute("INSERT INTO t VALUES(%s)", row) 
except ConnCommit: 
except ConnRollBack: 
except GeneratorExit: 
>>> g = db_stream() 
>>> g.send([1]) 
>>> g.throw(ConnCommit) 
>>> g.close() 
© 2014 Zaar Hai
Async programming approach 
© 2014 Zaar Hai
Async in the nutshell 
Technion CS “Introduction to Operating Systems”, HW 2 
import socket, select, time 
from collections import defaultdict, deque 
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
sock.bind(("", 1234)); 
rqueue = set([sock]); 
wqueue = set() 
pending = defaultdict(deque) 
© 2014 Zaar Hai
Async in the nutshell – event loop 
Technion CS “Introduction to Operating Systems”, HW 2 
while True: 
rq, wq, _ =, wqueue, []) 
for s in rq: 
if s == sock: 
new_sock, _ = sock.accept() 
data = s.recv(1024) 
if not data: 
for s in wq: 
if not pending[s]: 
data = pending[s].popleft() 
sent = s.send(data) 
if sent != len(data): 
data = data[sent:] 
© 2014 Zaar Hai
Why bother with async? 
Less memory resources 
Stack memory allocated for each spawned thread. 2Mb on 
x86 Linux 
For a server to handle 10k connection – 20Gb of memory 
required just for starters! 
Less CPU resources 
Context switching 10k threads is expensive 
Async moves switching logic for OS / interpreter level to 
application level – which is always more efficient 
© 2014 Zaar Hai
C10k problem 
The art of managing large amount of connections 
Why is that a problem? - long polling / websockets 
With modern live web applications, each client / browser 
holds an open connection to the server 
Gmail has 425 million active users 
I.e. gmail servers have to handle ~400 million active 
connections at any given time 
© 2014 Zaar Hai
Concurrency vs Parallelism 
Dealing with several tasks simultaneously 
But with one task a time 
All Intel processors up to Pentium were concurrent 
Dealing with several tasks simultaneously 
But with several tasks at any given time 
All Intel processors since Pentium can execute more then 
one instruction per clock cycle 
(C)Python is always concurrent 
Either with threads or with async approach 
© 2014 Zaar Hai
Thread abuse 
Naive approach – spawn a thread for every tiny task: 
Resource waste 
Burden on OS / Interpreter 
Good single-thread code can saturate a single core 
Usually you don't need more then 1 thread / process per CPU 
In web word 
Your application need to scale beyond single machine 
I.e. you'll have to run in multiple isolated processes anyway 
© 2014 Zaar Hai
Explicit vs Implicit context switching 
Implicit context switching 
OS / Interpreter decides when to switch 
Coder needs to assume he can use control any time 
Synchronization required – mutexes, etc 
Explicit context switching 
Coder decides when give up execution control 
No synchronization primitives required! 
© 2014 Zaar Hai
Explicit vs Implicit context switching 
Threads Explicit Async 
def transfer(acc_f, acc_t, sum): 
if acc_f.balance > sum: 
acc_f.balance -= sum 
acc_t.balance += sum 
def transfer(acc_f, acc_t, sum): 
if acc_f.balance > sum: 
acc_f.balance -= sum 
acc_t.balance += sum 
yield acc_f.commit_balance() 
yield acc_t.deposit(sum) 
© 2014 Zaar Hai
Practical approach 
Traditionally, async approach was implemented through 
In JavaScript it can get as nasty as this: 
button.on("click", function() { 
JQuery.ajax("http://...", { 
success: function(data) { 
// do something 
Thankfully, Python's support for anonymous functions is not 
that good 
© 2014 Zaar Hai
Back to fun – Async frameworks in python 
Tulip – part of Python standard lib since 3.4 
Gevent (for python < 3) 
© 2014 Zaar Hai
Tornado Hello World 
import tornado.ioloop 
import tornado.web 
class MainHandler(tornado.web.RequestHandler): 
def get(self): 
self.write("Hello, world") 
application = tornado.web.Application([ 
(r"/", MainHandler), 
if __name__ == "__main__": 
So far everything is synchronous 
© 2014 Zaar Hai
Tornado + database = async magic 
from tornado.get import coroutine 
from momoko.connections import Pool 
db = Pool(host=...) 
class MainHandler(tornado.web.RequestHandler): 
def get(self): 
cursor = yield db.execute("SELECT * FROM greetings") 
for row in cursor.fetchall() 
© 2014 Zaar Hai
Demystifying the magic 
Future – proxy to an object that will be available later 
AKA “promise” in JavaScript, “deferred” in Twisted 
Traditional thread-related usage: 
future = r.invoke("model_get") 
res = future.get_result() 
future = Future() 
r = _invoke(...) 
return future 
© 2014 Zaar Hai
Futures in async 
def get(self): 
rows = yield db.execute(...) 
def coroutine(func): 
def wrapper(func): 
gen = func() 
future = 
Runner(gen, future) 
return wrapper 
from tornado import IOloop 
class Runner(object): 
def __init__(self, gen, future): 
self.iploop = IOloop.instance() 
self.gen = gen 
self.future = future 
def run(self): 
value = future.result() 
next_future = self.gen.send(value) 
# check StopIteration 
self.future = next_future 
def handle_yield(self): 
if self.future.done(): 
© 2014 Zaar Hai
Now the magical db.execute(...) 
class Connection(object): 
def __init__(self, host=...): 
self.sock = … 
def execute(self, query): 
self.future = Future() 
self.query = query 
self.ioloop.add_handler(self.sock, self.handle_write, IOloop.WRITE) 
return self.future 
def handle_write(self): 
self.ioloop.add_handler(self.sock, self.handle_read, IOloop.READ) 
def handle_read(self): 
rows = 
© 2014 Zaar Hai
Writing async-ready libraries 
You have a library that uses, lets say, sockets 
You want to make it async compatible 
Two options: 
Either choose which ioloop implementation you use 
(Tornado IOLoop, Python 3.4 Tuplip, etc). But its hard 
choice, limiting your users 
Implementing library in a poll-able way. This way it can be 
plugged into any ioloop. 
© 2014 Zaar Hai
(dumb) Pollable example: psycopg2 async mode 
The following example is dumb, because it uses async in a 
sync way. But it demonstrates the principle 
from psycopg2.extensions import 
def wait(conn): 
while 1: 
state = conn.poll() 
if state == POLL_OK: 
elif state == POLL_WRITE:[], [conn.fileno()], []) 
elif state == POLL_READ:[conn.fileno()], [], []) 
raise psycopg2.OperationalError("...") 
>>> aconn = psycopg2.connect(database='test', async=1) 
>>> wait(aconn) 
>>> acurs = aconn.cursor() 
>>> acurs.execute("SELECT pg_sleep(5); SELECT 42;") 
>>> wait(acurs.connection) 
>>> acurs.fetchone()[0] 
© 2014 Zaar Hai
Pollable example – the goal 
class POLL_BASE(object): pass 
class POLL_OK(POLL_BASE): pass 
class POLL_READ(POLL_BASE): pass 
class POLL_WRITE(POLL_BASE): pass 
class Connection(object): 
conn = Connection(host, port, …) 
wait(conn) # poll, poll, poll 
print "Received: %s" % conn.buff 
© 2014 Zaar Hai
Pollable example - implementation 
class POLL_BASE(object): pass 
class POLL_OK(POLL_BASE): pass 
class POLL_READ(POLL_BASE): pass 
class POLL_WRITE(POLL_BASE): pass 
class Connection(object): 
def __init__(self, …): 
self.async_queue = deque() 
def _read(self, total): 
buff = [] 
left = total 
while left: 
yield POLL_READ 
data = self.sock.recv(left) 
left -= len(data) 
raise Return("".join(buff)) 
def _read_to_buff(self, total): 
self.buff = yield self._read(total) 
def read(self, total): 
© 2014 Zaar Hai
Pollable example – implementation cont 
def poll(self, value=None): 
if value: 
value = self.async_queue[0].send(value) 
# Because we can't send non-None values to not started gens 
value = next(self.async_queue[0]) 
except (Return, StopIteration) as err: 
value = getattr(err, "value", None) 
if not len(self.async_queue): 
return POLL_OK # All generators are done - operation finished 
if value in (POLL_READ, POLL_WRITE): 
return value # Need to wait for socket 
if isinstance(value, types.GeneratorType): 
return self.poll() # Continue "pulling" next generator 
# Pass return value to previous (caller) generator 
return self.poll(value) 
© 2014 Zaar Hai
© 2014 Zaar Hai 
Thank you …

