Using Coroutines to Create Efficient, High-Concurrency Web Applications<br />Matt Spitz<br />meebo, inc.<br />
What’s a Web Application, Anyway?<br />2<br />Application<br />Database<br />Application<br />
High-Concurrency Web Applications<br />3<br />Application<br />Database<br />Application<br />
High-Concurrency Web Applications<br />Many requests per second<br />Optimization opportunities<br />Hardware cost<br />Re...
5<br />Meebo Bar<br />
Meebo Bar<br />1000+ sites<br />Quantcast: 197 MM monthly uniques*<br />LOTS of pageviews<br />LOTS of ad requests<br />6<...
Meebo’s Ad Server<br />Given<br />User features<br />Available ads<br />Objective<br />Maximize revenue<br />P(click)<br /...
Sample App: FortuneTeller<br />8<br />
Sample App: FortuneTeller<br />Given<br />Username<br />Available fortunes<br />Objective<br />Select fortune for user<br ...
Hosting FortuneTeller<br />Apache CGI<br />Apache mod_wsgi<br />Twisted<br />gevent + gunicorn<br />10<br />
Hosting FortuneTeller<br />Evaluation metrics<br />Code complexity<br />Library support<br />Memory efficiency<br />Multi-...
Take One: Apache CGI<br />One process per request<br />O/S schedules CPU<br />12<br />
Take One: Apache CGI<br />Advantages<br />Straightforward, synchronous code<br />Isolated requests<br />Disadvantages<br /...
Evaluation<br />14<br />
Performance<br />Environment<br />4-core VM, 1 GB RAM<br />Ubuntu Server 10.10<br />MySQL on host machine<br />25ms interf...
Performance<br />16<br />
Take Two: Apache mod_wsgi<br />Using mpm_prefork<br />Worker processes handle requests<br />One concurrent request per pro...
Take Two: Apache mod_wsgi<br />Advantages<br />Straightforward, synchronous code<br />Cached memory<br />Disadvantages<br ...
Evaluation<br />19<br />
Performance<br />20<br />
Take Three: Twisted<br />Asynchronous framework<br />Events and callbacks<br />Twisted orchestrates context switches<br />...
Quick Break: Event Loops<br />s = socket.socket(…)<br />s.setblocking(ISBLOCKING)<br />s.connect((HOST, PORT))<br />greeti...
Quick Break: Event Loops<br />Nonblocking sockets in an event loop<br />23<br />1.<br />f(x):<br />s = NonBlockingSocket(…...
Take Three: Twisted<br />Asynchronous framework<br />Events and callbacks<br />Twisted orchestrates context switches<br />...
Take Three: Twisted<br />Advantages<br />Shared memory<br />User space context switches<br />Disadvantages<br />Develop as...
Evaluation<br />26<br />
Performance<br />27<br />
Take Four: gevent + gunicorn<br />gevent<br />Networking library<br />Uses event loop<br />Synchronous API<br />Synchronou...
Take Four: gevent + gunicorn<br />gunicorn (“Green Unicorn”)<br />Lightweight WSGI server<br />Multiple worker processes<b...
Take Four: gevent + gunicorn<br />Advantages<br />Best of both worlds!<br />mod_wsgi<br />Straightforward, synchronous cod...
Evaluation<br />31<br />
Performance<br />32<br />
Performance<br />33<br />
Performance<br />34<br />
Performance<br />35<br />
Performance<br />36<br />
“Evented” Development<br />Synchronous code still runs asynchronously<br />Requests aren’t independent<br />Things to keep...
gunicorn + gevent in Production<br />Managing gunicorn<br />greins<br />Randall Leeds (tilgovi): github/meebo/greins<br />...
Load-tested, unicorn-approved!<br />Blocking code is simple<br />Nonblocking code is efficient<br />gevent + gunicorn<br /...
Load-tested, unicorn-approved!<br />40<br />
Thanks!<br />41<br />
Upcoming SlideShare
Loading in …5
×

Using Coroutines to Create Efficient, High-Concurrency Web Applications

7,930
-1

Published on

0 Comments
21 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,930
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
98
Comments
0
Likes
21
Embeds 0
No embeds

No notes for slide
  • IntroductionMatt Spitz. Software Engineer at meebo. Here today to talk about how building web applications in python and the pros/cons of the various means by which we can serve them up.
  • Users make requests to an application, which uses a shared storage backend.
  • Same thing, just lots and lots and lots of concurrent requests
  • With such a large-scale application, small optimizations can have a huge impactSave money on hardware (machines, RAM, CPU)Faster response time, better user experienceHandling more concurrent requestsSubstantially decrease impact on shared resourcesone example of a high-concurrency web application is theadserver we run at meebobefore I talk about the adserver, let me introduce the meebo bar
  • Themeebo bar is deployed to our partner sites and offers a neat way to share content on the site and allows users to chat with other members of the site.
  • Show off the chat in the corner, the sharing buttons, and the ad unitCan’t give you numbers, but suffice it to say that any adserver to which you’re making those calls can be considered a “high-concurrency web application”
  • Selecting the ad a user is most likely to click onServing the most valuable ads (e.g. highest CPC)Respect whatever targeting the advertisers have selectedEnsuring smooth, complete delivery for each ad campaignTheadserver is a pretty complicated beast and I think that going through it wouldn’t really help in making my point for this talk, so I wrote a sample application that has a similar structure and resource-usage patterns
  • describeJaccardSimilarity (size(intersection(x,y))/size(union(x,y))) super arbitrary, just to represent some CPU processing in the applicationSHOW OFF THE CODE(make sure to show off the user fortune caching)
  • We’re gonna try four different serving implementations
  • How difficult is it to write code for these applications?What’s the extent to which these applications allow us to use 3rd party libraries?How efficient is the application in terms of memory?Can we take advantage of multi-core machines?
  • SHOW OFF THE CODE
  • Simple to writeRequests don’t affect one another--Need to reload all working set (all fortunes) with each requestNo database connection cachingIt’s a start, but it doesn’t scale
  • Before I show you a performance graph, want to go over the benchmarks25ms delay on interface between guest and host to exaggerate the effects of I/O on response time
  • 8 processes maximumRequires loading all fortunes with each request
  • Apache spins up a number of worker processes to handle requestsWorkers handle a configurable number of requests before being replacedWorkers handle exactly one request at a timeMemory is cached in the worker, so we can re-use the set of fortunes between requestsOperating system handles schedulingMAKE SURE TO SHOW OFF THE HANDLER
  • Using almost the same simple, synchronous code as we had in the CGIMemory is cached across requests in the same workerNo shared memory between workersNeed to load set of all fortunes in each workerMore workers requires more RAMEach worker load requires a DB requestHammers the database on apache restart
  • Using 8 worker processes
  • Twisted is an asynchronous framework for building network applications Developer structures code as events and callbacksTwisted orchestrates context switches among requests, typically on things that take a long time (I/O)Twisted server Single event loop =&gt; single process Handles multiple requests simultaneously in the event loopAnd since we’re all in one process, memory is shared among requests
  • Some of this may be review, but it’s important that everyone understands thisBlocking: connect and recv wait until their actions complete before returningNonblocking: connect and recv initiate the action (if it hasn’t been already) and return the data or raise an exception immediatelyRequires a lot more plumbing than the example above
  • …so let’s go back to this slide (the next one)
  • Twisted is a framework built around an event loopProvides a nice interface for setting up your functions and callbacks (for success or error)Keeps track of multiple execution paths simultaneously, just as we saw in the previous exampleThe big problem with Twisted is that you can’t just plug in your synchronous app. You have to set up these events and callbacks for every piece of code might block.MAKE SURE TO SHOW OFF THE CODE AND HOW MUCH OF A PAIN IT IS
  • AdvantagesMemory is shared among requests (we only have to load the fortunes once to service many simultaneous requests)Context switches happen in user space (fast)DisadvantagesNeed to rewrite code to be asynchronous Guido sez: “I hate callback-based programming.” It’s hard to wrap your brain around. stuck in the framework– everything has to be asynchronous, you have to use Twisted’s standard libraries, which may not behave quite as you’d like3rd- party libraries must also be asynchronous No I/O in C libraries (at least not out of the box)CPU-intense requests monopolize the processormod_wsgi: O/S handles scheduling, processes scheduled at any time, and CPU time is shared “fairly” Twisted: CPU scheduled explicitly, CPU-bound blocks of code prevent other requests from runningTaking advantage of multiple cores isn’t trivial-- load balancer? multiprocessing module?
  • Note that Twisted is running only on a single core
  • geventNetworking library using libevent Has an event loop, but its API is synchronousTransforms synchronous applications to be asynchronous automatically!!!“Monkey patches” python system modules (socket)Rewrites socket calls to set up a callback and a context after writing the request to the socketFunction context in coroutinesThink of coroutines as lightweight threadsPointer to code + context, no stacke.g. Closures and generatorsUses an event loop to manage all concurrent requestsContext switch on network I/O (just like Twisted)
  • gunicornFast, lightweight WSGI server written by Benoit Uses multiple workers to handle requestsBig win: Supports gevent workers out of the boxEach worker maintains a pool of coroutines to handle incoming requests Those workers share memory among requestsAt this point, we look at the code.
  • AdvantagesBest of both worlds!mod_wsgiEasy to writeNo framework to do everything asynchronously, just pythonCan take advantage of multiple coresTwistedShared memory among requests within each workerContext switches in user spaceDisadvantagesSimilar to TwistedNo I/O in C librariesCPU-intense requests monopolize the processor
  • gunicorn_1 is comparable to TwistedNegligible performance impact when the application is made asynchronous
  • gunicorn_4-8 is faster than mod_wsgiMaking context switch deterministically and in user space is more efficient than OS scheduling
  • gevent takes care of transforming synchronous code, but it’s still executed in an event loop Synchronous code is not necessarily executed synchronouslyDuplicated loads: simultaneous database requests 1) no fortunes? load up the fortunes! 2) no fortunes? load up the fortunes! =&gt; use “events” to protect duplicate effortsSocket caching: can’t naively cache socketsCan’t use the same socket for two simultaneous operationsMust create a new socket per connection or use a poolCPU hogging Might want to offload CPU-intense things to another daemon/process
  • Managing gunicorngreinsRandall LeedsEnables running multiple apps in a single gunicorn instanceRoutes traffic based on URLAllows for global and per-app server hooksOn worker startup (preloading a working set)Pre/post requests (Apache-style request logging)Provides standard start/stop/reload/restart interface to gunicornDebugging gevent applicationsgevent-profilerShaun LindsayProvides a linear trace of all function calls and context switchesAnalyzes where CPU time is spent in a given application
  • Blocking code is easy to understand, but traditional deployments aren’t very efficientAsynchronous applications make the best use of resources, but they’re a pain to writeRunning gevent workers in gunicorn is both simple and efficient, as it allows you to write blocking code that is converted to be asynchronous automatically.At meebo, we&apos;ve found this setup to be amazingly efficient and reliable, even under extreme loadA number of our mission-critical, high-concurrency web applications have been running under this setup for the last 7 months with no major issues or outages. Been able to save money on hardware with no impact on response time…we even got a Halloween costume out of it.
  • Using Coroutines to Create Efficient, High-Concurrency Web Applications

    1. 1. Using Coroutines to Create Efficient, High-Concurrency Web Applications<br />Matt Spitz<br />meebo, inc.<br />
    2. 2. What’s a Web Application, Anyway?<br />2<br />Application<br />Database<br />Application<br />
    3. 3. High-Concurrency Web Applications<br />3<br />Application<br />Database<br />Application<br />
    4. 4. High-Concurrency Web Applications<br />Many requests per second<br />Optimization opportunities<br />Hardware cost<br />Response time<br />Concurrency<br />Database impact<br />4<br />
    5. 5. 5<br />Meebo Bar<br />
    6. 6. Meebo Bar<br />1000+ sites<br />Quantcast: 197 MM monthly uniques*<br />LOTS of pageviews<br />LOTS of ad requests<br />6<br />* http://bit.ly/xAPXx<br />
    7. 7. Meebo’s Ad Server<br />Given<br />User features<br />Available ads<br />Objective<br />Maximize revenue<br />P(click)<br />Price<br />Satisfy advertisers<br />Respect targeting<br />Smooth campaign delivery<br />Complex application<br />Lots of concurrent requests<br />7<br />
    8. 8. Sample App: FortuneTeller<br />8<br />
    9. 9. Sample App: FortuneTeller<br />Given<br />Username<br />Available fortunes<br />Objective<br />Select fortune for user<br />JaccardSimilarity(username, fortune)<br />username=PyConIsForLovers<br />“Generosity and perfection are your everlasting goals.”<br />9<br />
    10. 10. Hosting FortuneTeller<br />Apache CGI<br />Apache mod_wsgi<br />Twisted<br />gevent + gunicorn<br />10<br />
    11. 11. Hosting FortuneTeller<br />Evaluation metrics<br />Code complexity<br />Library support<br />Memory efficiency<br />Multi-core support<br />11<br />
    12. 12. Take One: Apache CGI<br />One process per request<br />O/S schedules CPU<br />12<br />
    13. 13. Take One: Apache CGI<br />Advantages<br />Straightforward, synchronous code<br />Isolated requests<br />Disadvantages<br />Process overhead<br />Cold cache<br />13<br />
    14. 14. Evaluation<br />14<br />
    15. 15. Performance<br />Environment<br />4-core VM, 1 GB RAM<br />Ubuntu Server 10.10<br />MySQL on host machine<br />25ms interface delay<br />1024 requests, X concurrent<br />15<br />
    16. 16. Performance<br />16<br />
    17. 17. Take Two: Apache mod_wsgi<br />Using mpm_prefork<br />Worker processes handle requests<br />One concurrent request per process<br />Memory cached between requests<br />O/S schedules CPU<br />17<br />
    18. 18. Take Two: Apache mod_wsgi<br />Advantages<br />Straightforward, synchronous code<br />Cached memory<br />Disadvantages<br />Resource inefficient<br />Need working set in each process<br />Cold cache on restart<br />Managing worker count<br />Too few: 502<br />Too many: OOM? Database DoS?<br />18<br />
    19. 19. Evaluation<br />19<br />
    20. 20. Performance<br />20<br />
    21. 21. Take Three: Twisted<br />Asynchronous framework<br />Events and callbacks<br />Twisted orchestrates context switches<br />Twisted server<br />Single event loop<br />Concurrent requests<br />21<br />
    22. 22. Quick Break: Event Loops<br />s = socket.socket(…)<br />s.setblocking(ISBLOCKING)<br />s.connect((HOST, PORT))<br />greeting = s.recv(1024)<br />s.close()<br />Blocking<br />Wait for data<br />Nonblocking<br />Initiate, return immediately<br />Data (if available)<br />Exception: “I’m not done yet”<br />Requires more plumbing<br />22<br />
    23. 23. Quick Break: Event Loops<br />Nonblocking sockets in an event loop<br />23<br />1.<br />f(x):<br />s = NonBlockingSocket(…)<br /> greeting = s.recv(1024)<br /> print x, “|”, greeting<br />Events<br />fd=5, fp=g, {s: ‘hi’, a: 5}<br />fd=5, fp=g, {s: ‘hi’, a: 5}<br />2.<br />Call recv().<br />fd=2, fp=f, {x: 8080}<br />3.<br />Create context, add to the event loop.<br />fd=3, fp=myfunc, {}<br />fd=3, fp=myfunc, {}<br />4.<br />Process events that are ready (select/poll).<br />fd=18, fp=f, {x: 80}<br />fd=18, fp=f, {x: 80}<br />fd=18, fp=f, {x: 80}<br />5.<br />Return to context when data is ready.<br />6.<br />“80 | Hello from socket s!”<br />
    24. 24. Take Three: Twisted<br />Asynchronous framework<br />Events and callbacks<br />Twisted orchestrates context switches<br />Twisted server<br />Single event loop<br />Concurrent requests<br />24<br />
    25. 25. Take Three: Twisted<br />Advantages<br />Shared memory<br />User space context switches<br />Disadvantages<br />Develop asynchronously<br />Stuck in the framework<br />Asynchronous libraries<br />No I/O in C<br />Unfair scheduling<br />Using multiple cores<br />25<br />
    26. 26. Evaluation<br />26<br />
    27. 27. Performance<br />27<br />
    28. 28. Take Four: gevent + gunicorn<br />gevent<br />Networking library<br />Uses event loop<br />Synchronous API<br />Synchronous code running asynchronously<br />Monkey patching<br />Rewrites standard modules<br />Coroutines for function context<br />Lightweight threads, no stack<br />greenlet implementation<br />28<br />
    29. 29. Take Four: gevent + gunicorn<br />gunicorn (“Green Unicorn”)<br />Lightweight WSGI server<br />Multiple worker processes<br />Share queued requests<br />gevent support<br />29<br />
    30. 30. Take Four: gevent + gunicorn<br />Advantages<br />Best of both worlds!<br />mod_wsgi<br />Straightforward, synchronous code<br />No framework, just python<br />Multicore support<br />Twisted<br />Shared memory<br />User space context switches<br />Disadvantages<br />Pure-python libraries<br />Unfair scheduling<br />30<br />
    31. 31. Evaluation<br />31<br />
    32. 32. Performance<br />32<br />
    33. 33. Performance<br />33<br />
    34. 34. Performance<br />34<br />
    35. 35. Performance<br />35<br />
    36. 36. Performance<br />36<br />
    37. 37. “Evented” Development<br />Synchronous code still runs asynchronously<br />Requests aren’t independent<br />Things to keep in mind<br />Duplicate work<br />Socket caching<br />CPU hogging<br />37<br />
    38. 38. gunicorn + gevent in Production<br />Managing gunicorn<br />greins<br />Randall Leeds (tilgovi): github/meebo/greins<br />Multiple apps<br />URL routing<br />Server hooks<br />Worker launch<br />Pre/post requests<br />Daemon interface<br />Debugging gevent<br />gevent-profiler<br />Shaun Lindsay (srlindsay): github/meebo/gevent-profiler<br />Execution trace<br />Time spent<br />38<br />
    39. 39. Load-tested, unicorn-approved!<br />Blocking code is simple<br />Nonblocking code is efficient<br />gevent + gunicorn<br />Simple <br />Efficient<br />Reliable<br />39<br />
    40. 40. Load-tested, unicorn-approved!<br />40<br />
    41. 41. Thanks!<br />41<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×