Python, async web frameworks, and MongoDB

  • 12,354 views
Uploaded on

A talk covering the state of the art for writing asynchronous web applications using Python and MongoDB.

A talk covering the state of the art for writing asynchronous web applications using Python and MongoDB.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
12,354
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
140
Comments
0
Likes
17

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Python, MongoDB, andasynchronous web frameworks A. Jesse Jiryu Davis jesse@10gen.com emptysquare.net
  • 2. Agenda• Talk about web services in a really dumb (“abstract”?) way• Explain when we need async web servers• Why is async hard?• What is Tornado and how does it work?• Why am I writing a new PyMongo wrapper to work with Tornado?• How does my wrapper work?
  • 3. CPU-bound web service Client Server socket• No need for async• Just spawn one process per core
  • 4. Normal web service Backend Client Server (DB, web service, socket socket SAN, …)• Assume backend is unbounded• Service is bound by: • Context-switching overhead • Memory!
  • 5. What’s async for?• Minimize resources per connection• I.e., wait for backend as cheaply as possible
  • 6. CPU- vs. Memory-boundCrypto Most web services? Chat •CPU-bound Memory-bound
  • 7. HTTP long-polling (“COMET”)• E.g., chat server• Async’s killer app• Short-polling is CPU-bound: tradeoff between latency and load• Long-polling is memory bound• “C10K problem”: kegel.com/c10k.html• Tornado was invented for this
  • 8. Why is async hard to code?Client Server Backend request requesttime store state response response
  • 9. Ways to store state this slide is in beta MultithreadingMemory per connection Greenlets / Gevent Tornado, Node.js Coding difficulty
  • 10. What’s a greenlet?• A.K.A. “green threads”• A feature of Stackless Python, packaged as a module for standard Python• Greenlet stacks are stored on heap, copied to / from OS stack on resume / pause• Cooperative• Memory-efficient
  • 11. Threads: State stored on OS stacks# pseudo-Pythonsock = listen()request = parse_http(sock.recv())mongo_data = db.collection.find()response = format_response(mongo_data)sock.sendall(response)
  • 12. Gevent: State stored on greenlet stacks# pseudo-Pythonimport gevent.monkey; monkey.patch_all()sock = listen()request = parse_http(sock.recv())mongo_data = db.collection.find()response = format_response(mongo_data)sock.sendall(response)
  • 13. Tornado: State stored in RequestHandlerclass MainHandler(tornado.web.RequestHandler): @tornado.web.asynchronous def get(self): AsyncHTTPClient().fetch( "http://example.com", callback=self.on_response) def on_response(self, response): formatted = format_response(response) self.write(formatted) self.finish()
  • 14. Tornado IOStreamclass IOStream(object): def read_bytes(self, num_bytes, callback): self.read_bytes = num_bytes self.read_callback = callback io_loop.add_handler( self.socket.fileno(), self.handle_events, events=READ) def handle_events(self, fd, events): data = self.socket.recv(self.read_bytes) self.read_callback(data)
  • 15. Tornado IOLoopclass IOLoop(object): def add_handler(self, fd, handler, events): self._handlers[fd] = handler # _impl is epoll or kqueue or ... self._impl.register(fd, events) def start(self): while True: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)
  • 16. Python, MongoDB, & concurrency• Threads work great with pymongo• Gevent works great with pymongo – monkey.patch_socket(); monkey.patch_thread()• Tornado works so-so – asyncmongo • No replica sets, only first batch, no SON manipulators, no document classes, … – pymongo • OK if all your queries are fast • Use extra Tornado processes
  • 17. Introducing: “Motor”• Mongo + Tornado• Experimental• Might be official in a few months• Uses Tornado IOLoop and IOStream• Presents standard Tornado callback API• Stores state internally with greenlets• github.com/ajdavis/mongo-python-driver/tree/tornado_async
  • 18. Motorclass MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection() @tornado.web.asynchronous def post(self): # No-op if already open self.c.open(callback=self.connected) def connected(self, c, error): self.c.collection.insert( {‘x’:1}, callback=self.inserted) def inserted(self, result, error): self.write(’OK’) self.finish()
  • 19. Motor internals stack depth Client IOLoop RequestHandler greenlet pymongo request start switch() IOStream.sendall(callback) returntime callback() switch() parse Mongo response schedule callback callback() HTTP response
  • 20. Motor internals: wrapperclass MotorCollection(object): def insert(self, *args, **kwargs): callback = kwargs[callback] 1 del kwargs[callback] kwargs[safe] = True def call_insert(): # Runs on child greenlet result, error = None, None try: sync_insert = self.sync_collection.insert 3 result = sync_insert(*args, **kwargs) except Exception, e: error = e # Schedule the callback to be run on the main greenlet tornado.ioloop.IOLoop.instance().add_callback( lambda: callback(result, error) 8 ) # Start child greenlet 2 greenlet.greenlet(call_insert).switch() 6 return
  • 21. Motor internals: fake socketclass MotorSocket(object): def __init__(self, socket): # Makes socket non-blocking self.stream = tornado.iostream.IOStream(socket) def sendall(self, data): child_gr = greenlet.getcurrent() # This is run by IOLoop on the main greenlet # when data has been sent; # switch back to child to continue processing def sendall_callback(): child_gr.switch() 7 self.stream.write(data, callback=sendall_callback) 4 # Resume main greenlet child_gr.parent.switch() 5
  • 22. Motor• Shows a general method for asynchronizing synchronous network APIs in Python• Who wants to try it with MySQL? Thrift?• (Bonus round: resynchronizing Motor for testing)
  • 23. Questions? A. Jesse Jiryu Davis jesse@10gen.com emptysquare.net(10gen is hiring, of course: 10gen.com/careers)