PyCon US 2012 - State of WSGI 2


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • \n
  • Back in time, web servers capable of running Python web applications used different interfaces. Web applications had to support more than one if wanted to be portable. Original WSGI PEP created late 2003 with the goal of standardising on one interface. Different approaches though meant it ended up being a series of compromises.\n
  • Although not ideal, the WSGI specification achieved its goal. Then Python 3 came along though. Python 3 was a problem because of change to make default string type be Unicode. After much teeth pulling and unwelcome distractions, a revised PEP was produced to affirm how WSGI should work with Python 3.\n
  • What were the problems with WSGI? Apart from various ambiguities and corner cases that weren't really addressed, one of the bigger issues was that writing WSGI middleware that was correct was hard. This is what in part has driven the idea that we need a WSGI 2.0.\n
  • The main suggestion that arises to make WSGI simpler is to remove the duplication in having a write() callback as well as being able to return an iterable. Do away with write() and the next step is to remove start_response(). An application therefore returns a tuple alone, consisting of status line, headers and iterable for content.\n
  • Conceptually not too hard, but every time there was discussion about improving WSGI, people would step up with their own wish list for some specific requirement they had. The loudest of these was those pushing for full support for async systems. Reality is that a simple interface which supports both sync and async is hard.\n
  • The other idea that kept popping up was standardised high level request/response objects. The difficulty here is that you are introducing something that would likely need to keep evolving, which is a bad thing to have in a specification because you are really forcing people to one implementation rather than to a specification.\n
  • Personally I thought there were a range of other aspects to WSGI which were more fundamental where we could do better. These include resource management, ensuring output filters don't consume/generate too much data and inability to have certain types of request content or mutating input filters.\n
  • So, there are simple changes that could be made and there are also deeper changes we could make to try and address some of the shortcomings and make it somewhat better. The big question now though is whether anyone cares and whether it is all just too late.\n
  • I had my own ideas for changes that could be made to WSGI. Not having a lot of time and being burned out by the whole WSGI on Python 3 discussions, I have refrained from pushing them. Some of those ideas I played around with though are these listed, but there are others I have had.\n
  • It is perhaps not used often and so is of academic interest, but being able to specify code which runs on completion of a request is just way too hard. It is not sufficient to do it when application returns. It has to be done when the iterable is consumed.\n
  • To do this requires wrapping the existing iterable and in that wrapper implementing a close() method. One immediate problem with that is that it means that wsgi.file_wrapper as an optimisation for serving up files no longer works, as the server doesn't see the original iterable type.\n
  • If we actually want this to work as a generic middleware rather than force a change to the application, then we also need to provide a wrapper for the application.\n
  • Back when WSGI was created, nice ways of executing code at completion of a task didn't really exist, thus why close() method was used. Since then though we got context managers. If we have a response object where content is an attribute, then we can make it a context manager as well, allowing both entry and exit actions to be specified.\n
  • Turns out that one can also do the same thing for request as well. If we are going to make such a change, lets be more radical though and make access to request content be an iterable just like with a response. If we make content length advisory only, then we can support chunked requests and mutating input filters.\n
  • Using objects for both request and response a hello world applications becomes this. The arguments are the request object, but also a response object. The request object has as attributes the environ dictionary and content. The response object is used to created the response, but can also be used for more than that.\n
  • As mentioned before, request content length is advisory only and wouldn't need to be used to read request content. You would only check it if generating 413 response or to decide how you might handle content. Using iterable for request content does mean need new cgi.FieldStorage, or use a file object like wrapper.\n
  • Using iterables for both request and response content now means we have symmetry. You can have one type of filter that can be applied to both. One could even technically send request content direct back out to a client via a filter.\n
  • Middleware is still possible. The status line, headers and content are just attributes of the response, but being an object now, we can simplify things by allowing one to make a call against the response to substitute just parts of it and create a new response object. Dealing with content length though is still very painful.\n
  • We can make this simpler though by separating content length out and treating it as a special attribute. Being an attribute, the iterator for content can then enforce it, only returning the specified amount of data when consuming the content.\n
  • For our middleware, if we don't know what the resulting content length might be after filter applied, we can just set it to None. That or we set it to the new known length. Do note, it is not the intent to create a full on request/response object. We are only embodying the absolute minimum to enforce correct protocol and usage.\n
  • As well as starting to use the request/response object to enforce correct usage, the way they are passed around can be used to implement new behaviour. In mod_python a handler could decline a request and subsequent one used instead. Paste had this as Cascade middleware, but we can make that cleaner using response as a marker.\n
  • So could this be an improvement to WSGI? Does anyone care anymore? It isn't possible to do it on top of WSGI as intent is to open up the server interface to features that can't be done with WSGI now. If you are interested, then come talk to me. History means I am quite wary of opening this one up on the WEB-SIG list.\n
  • If you do come and talk to me, let it be known that I don't care what colour it is. Arguments over what parameters are called or how they are composed together are largely not important. I want us to solve the underlying issues I can see rather than whether it looks pretty and there are more issues than just what I have covered here.\n
  • PyCon US 2012 - State of WSGI 2

    1. 1. State of WSGI 2 Graham Dumpleton US PyCon March 2012 @GrahamDumpleton
    2. 2. 1.0 (PEP 333)PEP created December 2003.A series of compromises. Iterable vs write(). Limited async capability.
    3. 3. 1.0.1 (PEP 3333)PEP created September 2010.Clarifications to PEP 333 to handle str/bytesdistinction in Python 3.X.Apache/mod_wsgi had already beenproviding support for Python 3.X since April2009.Was a quick and painless process. NOT!
    4. 4. ShortcomingsHard to write middleware correctly.
    5. 5. Easy targetsDrop start_response() and write().def app(environ): status = 200 OK content = Hello world headers = [ (Content-type, text/plain), (Content-length, str(len(content)))] return (status, headers, [content])
    6. 6. Unwanted advancesSupport for async systems. The killer for async is that how you interact with wsgi.input means it is blocking. Hard to change it to make it work for async systems without making it too complicated for sync systems.
    7. 7. Wishful thinkingStandardised high level request/responseobjects. WebOb Werkzeug
    8. 8. Interesting problemsCleaning up at end of request.Middleware and response content length.Unknown request content length. No compressed request content. No chunked requests. No full duplex HTTP.
    9. 9. Has the boat sailed?Too much legacy code relying on WSGI 1.0.Potential missed opportunity for significantchange when updating specification forPython 3.X.The possibilities of what could have beendone are now probably only of academicinterest.
    10. 10. Ideas I toyed withUse context managers to improve resourcemanagement.Implementing wsgi.input as an iterable.Support for chunked request content.Iterating through a list of applications untilone agrees to handle it.
    11. 11. Resource managementNeed to override close() of the iterable.Way too complicated for mere mortals.Makes wsgi.file_wrapper impotent.
    12. 12. How complicatedclass Generator: def __init__(self, iterable, callback, environ): self.__iterable = iterable self.__callback = callback self.__environ = environ def __iter__(self): for item in self.__iterable: yield item def close(self): try: if hasattr(self.__iterable, close): self.__iterable.close() finally: self.__callback(self.__environ)
    13. 13. Not done yetclass ExecuteOnCompletion: def __init__(self, application, callback): self.__application = application self.__callback = callback def __call__(self, environ, start_response): try: result = self.__application(environ, start_response) except: self.__callback(environ) raise return Generator(result, self.__callback, environ)
    14. 14. Response as objectMake response a context manager.Response content is an attribute.Consumer of response executes: with response: for data in response.content: process(data)Can override __enter__()/__exit__().
    15. 15. Request as objectAlso make request a context manager.Request content is an attribute, but alsomake it an iterable.Reading all request content becomes: with request: content = .join(request.content)Content length becomes advisory only, so canimplement chunked content and mutatinginput filters.
    16. 16. Hello worlddef application(request, response): status = 200 OK headers = [(Content-type, text/plain),] output = Hello world! return response(status, headers, [output])
    17. 17. Consuming inputdef application(request, response): status = 200 OK headers = [(Content-type, text/plain),] with request: output = .join(request.content) return response(status, headers, [output])
    18. 18. Content filtersclass Filter(object): def __init__(self, source, filter): self.source = source self.filter = filter def __iter__(self): with self.source: for data in self.source.content: yield self.filter(data)def application(request, response): status = 200 OK headers = [(Content-type, text/plain),] return response(status, headers, Filter(request, str))
    19. 19. Middlewaredef middleware(application): def _middleware(request, response): response = application(request, response) headers = list(itertools.dropwhile( lambda h: h[0].lower() == content-length, response.headers)) return response(headers=headers, content=Filter(response, lambda x: 2*x)) return _middlewareapplication = middleware(application)
    20. 20. Content length is painfuldef application(request, response): status = 200 OK headers = [(Content-type, text/plain),] output = Hello world! return response(status, headers, [output], len(output))
    21. 21. Hiding the paindef middleware(application): def _middleware(request, response): response = application(request, response) return response( content=Filter(response, lambda x: 2*x), length=None) return _middlewareapplication = middleware(application)
    22. 22. I decline your request!def declined(request, response): return responsedef middleware(*applications): def _middleware(request, response): for application in applications: interim = application(request, response) if interim is not response: return interim else: return response return _middlewareapplication = middleware(declined, application)
    23. 23. Is it a lost cause?Is this all too late?Is anyone interested interested in WSGI 2.0anymore?Although an interesting API, it cannot bedone on top of WSGI as changes how oneinteracts with the web server.So, try and do this and it does need tosupplant WSGI 1.0.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.