Python, MongoDB, andasynchronous web frameworks        A. Jesse Jiryu Davis        jesse@10gen.com         emptysquare.net
Agenda• Talk about web services in a really dumb  (“abstract”?) way• Explain when we need async web servers• Why is async ...
CPU-bound web service   Client                 Server               socket• No need for async• Just spawn one process per ...
Normal web service                                             Backend   Client                Server            (DB, web ...
What’s async for?• Minimize resources per connection• I.e., wait for backend as cheaply as possible
CPU- vs. Memory-boundCrypto         Most web services?           Chat                      •CPU-bound                     ...
HTTP long-polling (“COMET”)• E.g., chat server• Async’s killer app• Short-polling is CPU-bound: tradeoff between  latency ...
Why is async hard to code?Client                   Server                 Backend           request                       ...
Ways to store state                                    this slide is in beta                        MultithreadingMemory p...
What’s a greenlet?• A.K.A. “green threads”• A feature of Stackless Python, packaged as a  module for standard Python• Gree...
Threads:       State stored on OS stacks# pseudo-Pythonsock = listen()request = parse_http(sock.recv())mongo_data = db.col...
Gevent:      State stored on greenlet stacks# pseudo-Pythonimport gevent.monkey; monkey.patch_all()sock = listen()request ...
Tornado: State stored in RequestHandlerclass MainHandler(tornado.web.RequestHandler):  @tornado.web.asynchronous  def get(...
Tornado IOStreamclass IOStream(object):  def read_bytes(self, num_bytes, callback):    self.read_bytes = num_bytes    self...
Tornado IOLoopclass IOLoop(object):  def add_handler(self, fd, handler, events):    self._handlers[fd] = handler    # _imp...
Python, MongoDB, & concurrency• Threads work great with pymongo• Gevent works great with pymongo  – monkey.patch_socket();...
Introducing: “Motor”•   Mongo + Tornado•   Experimental•   Might be official in a few months•   Uses Tornado IOLoop and IO...
Motorclass MainHandler(tornado.web.RequestHandler):  def __init__(self):    self.c = MotorConnection()  @tornado.web.async...
Motor internals                   stack depth   Client       IOLoop              RequestHandler        greenlet        pym...
Motor internals: wrapperclass MotorCollection(object):  def insert(self, *args, **kwargs):    callback = kwargs[callback] ...
Motor internals: fake socketclass MotorSocket(object):  def __init__(self, socket):    # Makes socket non-blocking    self...
Motor• Shows a general method for asynchronizing  synchronous network APIs in Python• Who wants to try it with MySQL? Thri...
Questions?   A. Jesse Jiryu Davis   jesse@10gen.com    emptysquare.net(10gen is hiring, of course:   10gen.com/careers)
Upcoming SlideShare
Loading in...5
×

Python, async web frameworks, and MongoDB

13,348

Published on

A talk covering the state of the art for writing asynchronous web applications using Python and MongoDB.

Published in: Technology, Business
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
13,348
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
150
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide

Python, async web frameworks, and MongoDB

  1. 1. Python, MongoDB, andasynchronous web frameworks A. Jesse Jiryu Davis jesse@10gen.com emptysquare.net
  2. 2. Agenda• Talk about web services in a really dumb (“abstract”?) way• Explain when we need async web servers• Why is async hard?• What is Tornado and how does it work?• Why am I writing a new PyMongo wrapper to work with Tornado?• How does my wrapper work?
  3. 3. CPU-bound web service Client Server socket• No need for async• Just spawn one process per core
  4. 4. Normal web service Backend Client Server (DB, web service, socket socket SAN, …)• Assume backend is unbounded• Service is bound by: • Context-switching overhead • Memory!
  5. 5. What’s async for?• Minimize resources per connection• I.e., wait for backend as cheaply as possible
  6. 6. CPU- vs. Memory-boundCrypto Most web services? Chat •CPU-bound Memory-bound
  7. 7. HTTP long-polling (“COMET”)• E.g., chat server• Async’s killer app• Short-polling is CPU-bound: tradeoff between latency and load• Long-polling is memory bound• “C10K problem”: kegel.com/c10k.html• Tornado was invented for this
  8. 8. Why is async hard to code?Client Server Backend request requesttime store state response response
  9. 9. Ways to store state this slide is in beta MultithreadingMemory per connection Greenlets / Gevent Tornado, Node.js Coding difficulty
  10. 10. What’s a greenlet?• A.K.A. “green threads”• A feature of Stackless Python, packaged as a module for standard Python• Greenlet stacks are stored on heap, copied to / from OS stack on resume / pause• Cooperative• Memory-efficient
  11. 11. Threads: State stored on OS stacks# pseudo-Pythonsock = listen()request = parse_http(sock.recv())mongo_data = db.collection.find()response = format_response(mongo_data)sock.sendall(response)
  12. 12. Gevent: State stored on greenlet stacks# pseudo-Pythonimport gevent.monkey; monkey.patch_all()sock = listen()request = parse_http(sock.recv())mongo_data = db.collection.find()response = format_response(mongo_data)sock.sendall(response)
  13. 13. Tornado: State stored in RequestHandlerclass MainHandler(tornado.web.RequestHandler): @tornado.web.asynchronous def get(self): AsyncHTTPClient().fetch( "http://example.com", callback=self.on_response) def on_response(self, response): formatted = format_response(response) self.write(formatted) self.finish()
  14. 14. Tornado IOStreamclass IOStream(object): def read_bytes(self, num_bytes, callback): self.read_bytes = num_bytes self.read_callback = callback io_loop.add_handler( self.socket.fileno(), self.handle_events, events=READ) def handle_events(self, fd, events): data = self.socket.recv(self.read_bytes) self.read_callback(data)
  15. 15. Tornado IOLoopclass IOLoop(object): def add_handler(self, fd, handler, events): self._handlers[fd] = handler # _impl is epoll or kqueue or ... self._impl.register(fd, events) def start(self): while True: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)
  16. 16. Python, MongoDB, & concurrency• Threads work great with pymongo• Gevent works great with pymongo – monkey.patch_socket(); monkey.patch_thread()• Tornado works so-so – asyncmongo • No replica sets, only first batch, no SON manipulators, no document classes, … – pymongo • OK if all your queries are fast • Use extra Tornado processes
  17. 17. Introducing: “Motor”• Mongo + Tornado• Experimental• Might be official in a few months• Uses Tornado IOLoop and IOStream• Presents standard Tornado callback API• Stores state internally with greenlets• github.com/ajdavis/mongo-python-driver/tree/tornado_async
  18. 18. Motorclass MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection() @tornado.web.asynchronous def post(self): # No-op if already open self.c.open(callback=self.connected) def connected(self, c, error): self.c.collection.insert( {‘x’:1}, callback=self.inserted) def inserted(self, result, error): self.write(’OK’) self.finish()
  19. 19. Motor internals stack depth Client IOLoop RequestHandler greenlet pymongo request start switch() IOStream.sendall(callback) returntime callback() switch() parse Mongo response schedule callback callback() HTTP response
  20. 20. Motor internals: wrapperclass MotorCollection(object): def insert(self, *args, **kwargs): callback = kwargs[callback] 1 del kwargs[callback] kwargs[safe] = True def call_insert(): # Runs on child greenlet result, error = None, None try: sync_insert = self.sync_collection.insert 3 result = sync_insert(*args, **kwargs) except Exception, e: error = e # Schedule the callback to be run on the main greenlet tornado.ioloop.IOLoop.instance().add_callback( lambda: callback(result, error) 8 ) # Start child greenlet 2 greenlet.greenlet(call_insert).switch() 6 return
  21. 21. Motor internals: fake socketclass MotorSocket(object): def __init__(self, socket): # Makes socket non-blocking self.stream = tornado.iostream.IOStream(socket) def sendall(self, data): child_gr = greenlet.getcurrent() # This is run by IOLoop on the main greenlet # when data has been sent; # switch back to child to continue processing def sendall_callback(): child_gr.switch() 7 self.stream.write(data, callback=sendall_callback) 4 # Resume main greenlet child_gr.parent.switch() 5
  22. 22. Motor• Shows a general method for asynchronizing synchronous network APIs in Python• Who wants to try it with MySQL? Thrift?• (Bonus round: resynchronizing Motor for testing)
  23. 23. Questions? A. Jesse Jiryu Davis jesse@10gen.com emptysquare.net(10gen is hiring, of course: 10gen.com/careers)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×