• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Using Coroutines to Create Efficient, High-Concurrency Web Applications
 

Using Coroutines to Create Efficient, High-Concurrency Web Applications

on

  • 5,943 views

 

Statistics

Views

Total Views
5,943
Views on SlideShare
5,943
Embed Views
0

Actions

Likes
18
Downloads
75
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • IntroductionMatt Spitz. Software Engineer at meebo. Here today to talk about how building web applications in python and the pros/cons of the various means by which we can serve them up.
  • Users make requests to an application, which uses a shared storage backend.
  • Same thing, just lots and lots and lots of concurrent requests
  • With such a large-scale application, small optimizations can have a huge impactSave money on hardware (machines, RAM, CPU)Faster response time, better user experienceHandling more concurrent requestsSubstantially decrease impact on shared resourcesone example of a high-concurrency web application is theadserver we run at meebobefore I talk about the adserver, let me introduce the meebo bar
  • Themeebo bar is deployed to our partner sites and offers a neat way to share content on the site and allows users to chat with other members of the site.
  • Show off the chat in the corner, the sharing buttons, and the ad unitCan’t give you numbers, but suffice it to say that any adserver to which you’re making those calls can be considered a “high-concurrency web application”
  • Selecting the ad a user is most likely to click onServing the most valuable ads (e.g. highest CPC)Respect whatever targeting the advertisers have selectedEnsuring smooth, complete delivery for each ad campaignTheadserver is a pretty complicated beast and I think that going through it wouldn’t really help in making my point for this talk, so I wrote a sample application that has a similar structure and resource-usage patterns
  • describeJaccardSimilarity (size(intersection(x,y))/size(union(x,y))) super arbitrary, just to represent some CPU processing in the applicationSHOW OFF THE CODE(make sure to show off the user fortune caching)
  • We’re gonna try four different serving implementations
  • How difficult is it to write code for these applications?What’s the extent to which these applications allow us to use 3rd party libraries?How efficient is the application in terms of memory?Can we take advantage of multi-core machines?
  • SHOW OFF THE CODE
  • Simple to writeRequests don’t affect one another--Need to reload all working set (all fortunes) with each requestNo database connection cachingIt’s a start, but it doesn’t scale
  • Before I show you a performance graph, want to go over the benchmarks25ms delay on interface between guest and host to exaggerate the effects of I/O on response time
  • 8 processes maximumRequires loading all fortunes with each request
  • Apache spins up a number of worker processes to handle requestsWorkers handle a configurable number of requests before being replacedWorkers handle exactly one request at a timeMemory is cached in the worker, so we can re-use the set of fortunes between requestsOperating system handles schedulingMAKE SURE TO SHOW OFF THE HANDLER
  • Using almost the same simple, synchronous code as we had in the CGIMemory is cached across requests in the same workerNo shared memory between workersNeed to load set of all fortunes in each workerMore workers requires more RAMEach worker load requires a DB requestHammers the database on apache restart
  • Using 8 worker processes
  • Twisted is an asynchronous framework for building network applications Developer structures code as events and callbacksTwisted orchestrates context switches among requests, typically on things that take a long time (I/O)Twisted server Single event loop => single process Handles multiple requests simultaneously in the event loopAnd since we’re all in one process, memory is shared among requests
  • Some of this may be review, but it’s important that everyone understands thisBlocking: connect and recv wait until their actions complete before returningNonblocking: connect and recv initiate the action (if it hasn’t been already) and return the data or raise an exception immediatelyRequires a lot more plumbing than the example above
  • …so let’s go back to this slide (the next one)
  • Twisted is a framework built around an event loopProvides a nice interface for setting up your functions and callbacks (for success or error)Keeps track of multiple execution paths simultaneously, just as we saw in the previous exampleThe big problem with Twisted is that you can’t just plug in your synchronous app. You have to set up these events and callbacks for every piece of code might block.MAKE SURE TO SHOW OFF THE CODE AND HOW MUCH OF A PAIN IT IS
  • AdvantagesMemory is shared among requests (we only have to load the fortunes once to service many simultaneous requests)Context switches happen in user space (fast)DisadvantagesNeed to rewrite code to be asynchronous Guido sez: “I hate callback-based programming.” It’s hard to wrap your brain around. stuck in the framework– everything has to be asynchronous, you have to use Twisted’s standard libraries, which may not behave quite as you’d like3rd- party libraries must also be asynchronous No I/O in C libraries (at least not out of the box)CPU-intense requests monopolize the processormod_wsgi: O/S handles scheduling, processes scheduled at any time, and CPU time is shared “fairly” Twisted: CPU scheduled explicitly, CPU-bound blocks of code prevent other requests from runningTaking advantage of multiple cores isn’t trivial-- load balancer? multiprocessing module?
  • Note that Twisted is running only on a single core
  • geventNetworking library using libevent Has an event loop, but its API is synchronousTransforms synchronous applications to be asynchronous automatically!!!“Monkey patches” python system modules (socket)Rewrites socket calls to set up a callback and a context after writing the request to the socketFunction context in coroutinesThink of coroutines as lightweight threadsPointer to code + context, no stacke.g. Closures and generatorsUses an event loop to manage all concurrent requestsContext switch on network I/O (just like Twisted)
  • gunicornFast, lightweight WSGI server written by Benoit Uses multiple workers to handle requestsBig win: Supports gevent workers out of the boxEach worker maintains a pool of coroutines to handle incoming requests Those workers share memory among requestsAt this point, we look at the code.
  • AdvantagesBest of both worlds!mod_wsgiEasy to writeNo framework to do everything asynchronously, just pythonCan take advantage of multiple coresTwistedShared memory among requests within each workerContext switches in user spaceDisadvantagesSimilar to TwistedNo I/O in C librariesCPU-intense requests monopolize the processor
  • gunicorn_1 is comparable to TwistedNegligible performance impact when the application is made asynchronous
  • gunicorn_4-8 is faster than mod_wsgiMaking context switch deterministically and in user space is more efficient than OS scheduling
  • gevent takes care of transforming synchronous code, but it’s still executed in an event loop Synchronous code is not necessarily executed synchronouslyDuplicated loads: simultaneous database requests 1) no fortunes? load up the fortunes! 2) no fortunes? load up the fortunes! => use “events” to protect duplicate effortsSocket caching: can’t naively cache socketsCan’t use the same socket for two simultaneous operationsMust create a new socket per connection or use a poolCPU hogging Might want to offload CPU-intense things to another daemon/process
  • Managing gunicorngreinsRandall LeedsEnables running multiple apps in a single gunicorn instanceRoutes traffic based on URLAllows for global and per-app server hooksOn worker startup (preloading a working set)Pre/post requests (Apache-style request logging)Provides standard start/stop/reload/restart interface to gunicornDebugging gevent applicationsgevent-profilerShaun LindsayProvides a linear trace of all function calls and context switchesAnalyzes where CPU time is spent in a given application
  • Blocking code is easy to understand, but traditional deployments aren’t very efficientAsynchronous applications make the best use of resources, but they’re a pain to writeRunning gevent workers in gunicorn is both simple and efficient, as it allows you to write blocking code that is converted to be asynchronous automatically.At meebo, we've found this setup to be amazingly efficient and reliable, even under extreme loadA number of our mission-critical, high-concurrency web applications have been running under this setup for the last 7 months with no major issues or outages. Been able to save money on hardware with no impact on response time…we even got a Halloween costume out of it.

Using Coroutines to Create Efficient, High-Concurrency Web Applications Using Coroutines to Create Efficient, High-Concurrency Web Applications Presentation Transcript

  • Using Coroutines to Create Efficient, High-Concurrency Web Applications
    Matt Spitz
    meebo, inc.
  • What’s a Web Application, Anyway?
    2
    Application
    Database
    Application
  • High-Concurrency Web Applications
    3
    Application
    Database
    Application
  • High-Concurrency Web Applications
    Many requests per second
    Optimization opportunities
    Hardware cost
    Response time
    Concurrency
    Database impact
    4
  • 5
    Meebo Bar
  • Meebo Bar
    1000+ sites
    Quantcast: 197 MM monthly uniques*
    LOTS of pageviews
    LOTS of ad requests
    6
    * http://bit.ly/xAPXx
  • Meebo’s Ad Server
    Given
    User features
    Available ads
    Objective
    Maximize revenue
    P(click)
    Price
    Satisfy advertisers
    Respect targeting
    Smooth campaign delivery
    Complex application
    Lots of concurrent requests
    7
  • Sample App: FortuneTeller
    8
  • Sample App: FortuneTeller
    Given
    Username
    Available fortunes
    Objective
    Select fortune for user
    JaccardSimilarity(username, fortune)
    username=PyConIsForLovers
    “Generosity and perfection are your everlasting goals.”
    9
  • Hosting FortuneTeller
    Apache CGI
    Apache mod_wsgi
    Twisted
    gevent + gunicorn
    10
  • Hosting FortuneTeller
    Evaluation metrics
    Code complexity
    Library support
    Memory efficiency
    Multi-core support
    11
  • Take One: Apache CGI
    One process per request
    O/S schedules CPU
    12
  • Take One: Apache CGI
    Advantages
    Straightforward, synchronous code
    Isolated requests
    Disadvantages
    Process overhead
    Cold cache
    13
  • Evaluation
    14
  • Performance
    Environment
    4-core VM, 1 GB RAM
    Ubuntu Server 10.10
    MySQL on host machine
    25ms interface delay
    1024 requests, X concurrent
    15
  • Performance
    16
  • Take Two: Apache mod_wsgi
    Using mpm_prefork
    Worker processes handle requests
    One concurrent request per process
    Memory cached between requests
    O/S schedules CPU
    17
  • Take Two: Apache mod_wsgi
    Advantages
    Straightforward, synchronous code
    Cached memory
    Disadvantages
    Resource inefficient
    Need working set in each process
    Cold cache on restart
    Managing worker count
    Too few: 502
    Too many: OOM? Database DoS?
    18
  • Evaluation
    19
  • Performance
    20
  • Take Three: Twisted
    Asynchronous framework
    Events and callbacks
    Twisted orchestrates context switches
    Twisted server
    Single event loop
    Concurrent requests
    21
  • Quick Break: Event Loops
    s = socket.socket(…)
    s.setblocking(ISBLOCKING)
    s.connect((HOST, PORT))
    greeting = s.recv(1024)
    s.close()
    Blocking
    Wait for data
    Nonblocking
    Initiate, return immediately
    Data (if available)
    Exception: “I’m not done yet”
    Requires more plumbing
    22
  • Quick Break: Event Loops
    Nonblocking sockets in an event loop
    23
    1.
    f(x):
    s = NonBlockingSocket(…)
    greeting = s.recv(1024)
    print x, “|”, greeting
    Events
    fd=5, fp=g, {s: ‘hi’, a: 5}
    fd=5, fp=g, {s: ‘hi’, a: 5}
    2.
    Call recv().
    fd=2, fp=f, {x: 8080}
    3.
    Create context, add to the event loop.
    fd=3, fp=myfunc, {}
    fd=3, fp=myfunc, {}
    4.
    Process events that are ready (select/poll).
    fd=18, fp=f, {x: 80}
    fd=18, fp=f, {x: 80}
    fd=18, fp=f, {x: 80}
    5.
    Return to context when data is ready.
    6.
    “80 | Hello from socket s!”
  • Take Three: Twisted
    Asynchronous framework
    Events and callbacks
    Twisted orchestrates context switches
    Twisted server
    Single event loop
    Concurrent requests
    24
  • Take Three: Twisted
    Advantages
    Shared memory
    User space context switches
    Disadvantages
    Develop asynchronously
    Stuck in the framework
    Asynchronous libraries
    No I/O in C
    Unfair scheduling
    Using multiple cores
    25
  • Evaluation
    26
  • Performance
    27
  • Take Four: gevent + gunicorn
    gevent
    Networking library
    Uses event loop
    Synchronous API
    Synchronous code running asynchronously
    Monkey patching
    Rewrites standard modules
    Coroutines for function context
    Lightweight threads, no stack
    greenlet implementation
    28
  • Take Four: gevent + gunicorn
    gunicorn (“Green Unicorn”)
    Lightweight WSGI server
    Multiple worker processes
    Share queued requests
    gevent support
    29
  • Take Four: gevent + gunicorn
    Advantages
    Best of both worlds!
    mod_wsgi
    Straightforward, synchronous code
    No framework, just python
    Multicore support
    Twisted
    Shared memory
    User space context switches
    Disadvantages
    Pure-python libraries
    Unfair scheduling
    30
  • Evaluation
    31
  • Performance
    32
  • Performance
    33
  • Performance
    34
  • Performance
    35
  • Performance
    36
  • “Evented” Development
    Synchronous code still runs asynchronously
    Requests aren’t independent
    Things to keep in mind
    Duplicate work
    Socket caching
    CPU hogging
    37
  • gunicorn + gevent in Production
    Managing gunicorn
    greins
    Randall Leeds (tilgovi): github/meebo/greins
    Multiple apps
    URL routing
    Server hooks
    Worker launch
    Pre/post requests
    Daemon interface
    Debugging gevent
    gevent-profiler
    Shaun Lindsay (srlindsay): github/meebo/gevent-profiler
    Execution trace
    Time spent
    38
  • Load-tested, unicorn-approved!
    Blocking code is simple
    Nonblocking code is efficient
    gevent + gunicorn
    Simple
    Efficient
    Reliable
    39
  • Load-tested, unicorn-approved!
    40
  • Thanks!
    41