Asynchronous
Architectures for
Implementing
Scalable Cloud
Services
Designing for Graceful
Degradation


EVAN COOKE
CO-FOUNDER & CTO         twilio
                         CLOUD COMMUNICATIONS
Cloud services power the apps that are
the backbone of modern society. How
   we work, play, and communicate.
Cloud Workloads
    Can Be
 Unpredictable
SMS API Usage




                6x spike in 5 mins
Danger!
             Load higher than
             instantaneous throughput

 Load


             FAIL
Request
Latency


          Time
Don’t Fail
Requests
Incoming Requests
                           Load
                          Balancer               Worker
                                                  Pool
  AAA



                          ...
               AAA                   AAA
Throttling   Throttling          Throttling     W
                                                    W
 App          App                 App         W App
                                                 W
Server       Server              Server       Server
                                              W
                                                    W
                                                W W
Throttling   Throttling          Throttling
Worker Pools
      e.g., Apache/Nginx
                              Failed
                             Requests




                     100%+
            70%

10%

             Time
Problem Summary

• Cloud services often use worker
 pools to handle incoming requests
• When load goes beyond size of the
 worker pool, requests fail
What next?
A few observations based on work
implementing and scaling the Twilio API
over the past 4 years...

 • Twilio Voice/SMS Cloud APIs
 • 100,000 Twilio Developers
 • 100+ employees
Observation 1

For many APIs, taking more time to
service a request is better than failing that
request
Implication: in many cases, it is better
to service a request with some delay
rather than failing it
Observation 2

Matching the amount of available
resources precisely to the size of incoming
request worker pools is challenging
Implication: under load, it may be
possible delay or drop only those
requests that truly impact resources
What are we going to do?

Suggestion: if request concurrency was
very cheap, we could implement delay
and finer-grained resource controls much
more easily...
Event-driven programming
 and the Reactor Pattern
Event-driven programming
 and the Reactor Pattern
         Worker              Time
 req = ‘GET /’;            1
 req.append(‘/r/n/r/n’);   1
 socket.write(req);        10000x
 resp = socket.read();     10000000x
 print(resp);              10
Event-driven programming
 and the Reactor Pattern
                                  Time
 req = ‘GET /’;                1
 req.append(‘/r/n/r/n’);       1
 socket.write(req);            10000x
 resp = socket.read();         10000000x
 print(resp);                  10
             Huge IO latency
              blocks worker
Event-driven programming
 and the Reactor Pattern

 req = ‘GET /’;
 req.append(‘/r/n/r/n’);
 socket.write(req, fn() {   Make IO
   socket.read(fn(resp) {   operations
     print(resp);           async and
     });                    “callback”
 });                        when done
Event-driven programming
 and the Reactor Pattern

req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() {
   socket.read(fn(resp) {
     print(resp);
     });                 Central dispatch
});                      to coordinate
reactor.run_forever(); event callbacks
Event-driven programming
 and the Reactor Pattern
                                 Time
req = ‘GET /’;              1
req.append(‘/r/n/r/n’);     1
socket.write(req, fn() {    10
   socket.read(fn(resp) {   10
     print(resp);           10
     });
                            Result: we
});
                            don’t block
reactor.run_forever();      the worker
(Some)
Reactor Pattern Frameworks
          c/libevent
          c/libev
        java/nio/netty
          js/node.js
       ruby/eventmachine
     python/twisted
     python/gevent
The Callback Mess
      Python Twisted
req = ‘GET /’
req += ‘/r/n/r/n’

def r(resp):
  print resp

def w():
  socket.read().addCallback(r)

socket.write().addCallback(w)
The Callback Mess
      Python Twisted
req = ‘GET /’
req += ‘/r/n/r/n’

yield socket.write()
resp = yield socket.read()
print resp
               Use deferred
               generators and
               inline callbacks
The Callback Mess
      Python Twisted
req = ‘GET /’
req += ‘/r/n/r/n’

yield socket.write()
resp = yield socket.read()
print resp

     Easy sequential
    programming with
  mostly implicit async IO
Enter gevent
 “gevent is a coroutine-based Python
  networking library that uses greenlet
to provide a high-level synchronous API
   on top of the libevent event loop.”

          Natively Async
      socket.write()
      resp = socket.read()
      print resp
Enter gevent
               Simple Echo Server
from gevent.server import StreamServer

def echo(socket, address):
    print ('New connection from %s:%s' % address)
    socket.sendall('Welcome to the echo server!rn')
    line = fileobj.readline()
    fileobj.write(line)
    fileobj.flush()
    print ("echoed %r" % line)

if __name__ == '__main__':
    server = StreamServer(('0.0.0.0', 6000), echo)
    server.serve_forever()


              Easy sequential model
                    Fully async
Async Services with Ginkgo

 Ginkgo is a simple framework for composing
     async gevent services with common
    configuration, logging, demonizing etc.
       https://github.com/progrium/ginkgo



       Let’s look a simple example
       that implements a TCP and
              HTTP server...
Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    print 'new http request!'
    return ["hello world"]

def handle_tcp(socket, address):
    print 'new tcp connection!'
    while True:
        socket.send('hellon')
        gevent.sleep(1)

app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
                                         Import WSGI/TCP
                                             Servers
from ginkgo.core import Service

def handle_http(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    print 'new http request!'
    return ["hello world"]

def handle_tcp(socket, address):
    print 'new tcp connection!'
    while True:
        socket.send('hellon')
        gevent.sleep(1)

app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer

from ginkgo.core import Service
                                         HTTP Handler
def handle_http(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    print 'new http request!'
    return ["hello world"]

def handle_tcp(socket, address):
    print 'new tcp connection!'
    while True:
        socket.send('hellon')
        gevent.sleep(1)

app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    print 'new http request!'
    return ["hello world"]

def handle_tcp(socket, address):         TCP Handler
    print 'new tcp connection!'
    while True:
        socket.send('hellon')
        gevent.sleep(1)

app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
Async Services with Ginkgo
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    print 'new http request!'
    return ["hello world"]

def handle_tcp(socket, address):
    print 'new tcp connection!'
    while True:
        socket.send('hellon')                Service
        gevent.sleep(1)
                                            Composition
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234),
handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
Incoming Requests
                    Load
                   Balancer



 Async
 Server
          Async
          Server
                   ...    Async
                          Server


Using our async reactor-based
approach let’s redesign our serving
infrastructure
Incoming Requests
                    Load
                   Balancer

  AAA



                   ...
           AAA                AAA



 Async    Async           Async
 Server   Server          Server


Step 1: define an authentication and
authorization layer that will identify
the user and the resource being
requested
Incoming Requests
                            Load
                           Balancer

   AAA



                           ...
                AAA                   AAA
 Throttling   Throttling          Throttling

 Async        Async               Async        Concurrency
 Server       Server              Server        Manager


Step 2: add a throttling layer and
concurrency manager
Concurrency
       Admission Control
• Goal: limit concurrency by delaying
    or selectively failing requests
• Common metrics
- By Account
- By Resource Type
- By Availability of Dependent Resources

•   What we’ve found useful
- By (Account, Resource Type)
Delay - delay responses without failing
requests

  Load


Latency
Deny - deny requests based on resource
 usage

     Load
Latency /x       Fail
Latency /*
Incoming Requests
                            Load
                           Balancer

   AAA



                           ...
                AAA                   AAA
 Throttling   Throttling          Throttling

  App          App                 App         Concurrency
 Server       Server              Server        Manager
 Throttling   Throttling          Throttling


Step 3: allow backend resources to
throttle requests
         Dependent
          Services
Summary
Async frameworks like gevent allow you
to easily decouple a request from access
to constrained resources
                      Service-wide
                         Failure
Request
Latency


                  Time
Don’t Fail
 Requests
 Decrease
Performance
twilio
   Evan Cooke
            @emcooke


CONTENTS CONFIDENTIAL & COPYRIGHT © TWILIO INC. 2012

Asynchronous Architectures for Implementing Scalable Cloud Services - Evan Cooke - Gluecon 2012

  • 1.
    Asynchronous Architectures for Implementing Scalable Cloud Services Designingfor Graceful Degradation EVAN COOKE CO-FOUNDER & CTO twilio CLOUD COMMUNICATIONS
  • 3.
    Cloud services powerthe apps that are the backbone of modern society. How we work, play, and communicate.
  • 4.
    Cloud Workloads Can Be Unpredictable
  • 5.
    SMS API Usage 6x spike in 5 mins
  • 6.
    Danger! Load higher than instantaneous throughput Load FAIL Request Latency Time
  • 7.
  • 8.
    Incoming Requests Load Balancer Worker Pool AAA ... AAA AAA Throttling Throttling Throttling W W App App App W App W Server Server Server Server W W W W Throttling Throttling Throttling
  • 9.
    Worker Pools e.g., Apache/Nginx Failed Requests 100%+ 70% 10% Time
  • 10.
    Problem Summary • Cloudservices often use worker pools to handle incoming requests • When load goes beyond size of the worker pool, requests fail
  • 11.
    What next? A fewobservations based on work implementing and scaling the Twilio API over the past 4 years... • Twilio Voice/SMS Cloud APIs • 100,000 Twilio Developers • 100+ employees
  • 12.
    Observation 1 For manyAPIs, taking more time to service a request is better than failing that request Implication: in many cases, it is better to service a request with some delay rather than failing it
  • 13.
    Observation 2 Matching theamount of available resources precisely to the size of incoming request worker pools is challenging Implication: under load, it may be possible delay or drop only those requests that truly impact resources
  • 14.
    What are wegoing to do? Suggestion: if request concurrency was very cheap, we could implement delay and finer-grained resource controls much more easily...
  • 15.
    Event-driven programming andthe Reactor Pattern
  • 16.
    Event-driven programming andthe Reactor Pattern Worker Time req = ‘GET /’; 1 req.append(‘/r/n/r/n’); 1 socket.write(req); 10000x resp = socket.read(); 10000000x print(resp); 10
  • 17.
    Event-driven programming andthe Reactor Pattern Time req = ‘GET /’; 1 req.append(‘/r/n/r/n’); 1 socket.write(req); 10000x resp = socket.read(); 10000000x print(resp); 10 Huge IO latency blocks worker
  • 18.
    Event-driven programming andthe Reactor Pattern req = ‘GET /’; req.append(‘/r/n/r/n’); socket.write(req, fn() { Make IO socket.read(fn(resp) { operations print(resp); async and }); “callback” }); when done
  • 19.
    Event-driven programming andthe Reactor Pattern req = ‘GET /’; req.append(‘/r/n/r/n’); socket.write(req, fn() { socket.read(fn(resp) { print(resp); }); Central dispatch }); to coordinate reactor.run_forever(); event callbacks
  • 20.
    Event-driven programming andthe Reactor Pattern Time req = ‘GET /’; 1 req.append(‘/r/n/r/n’); 1 socket.write(req, fn() { 10 socket.read(fn(resp) { 10 print(resp); 10 }); Result: we }); don’t block reactor.run_forever(); the worker
  • 21.
    (Some) Reactor Pattern Frameworks c/libevent c/libev java/nio/netty js/node.js ruby/eventmachine python/twisted python/gevent
  • 22.
    The Callback Mess Python Twisted req = ‘GET /’ req += ‘/r/n/r/n’ def r(resp): print resp def w(): socket.read().addCallback(r) socket.write().addCallback(w)
  • 23.
    The Callback Mess Python Twisted req = ‘GET /’ req += ‘/r/n/r/n’ yield socket.write() resp = yield socket.read() print resp Use deferred generators and inline callbacks
  • 24.
    The Callback Mess Python Twisted req = ‘GET /’ req += ‘/r/n/r/n’ yield socket.write() resp = yield socket.read() print resp Easy sequential programming with mostly implicit async IO
  • 25.
    Enter gevent “geventis a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.” Natively Async socket.write() resp = socket.read() print resp
  • 26.
    Enter gevent Simple Echo Server from gevent.server import StreamServer def echo(socket, address): print ('New connection from %s:%s' % address) socket.sendall('Welcome to the echo server!rn') line = fileobj.readline() fileobj.write(line) fileobj.flush() print ("echoed %r" % line) if __name__ == '__main__': server = StreamServer(('0.0.0.0', 6000), echo) server.serve_forever() Easy sequential model Fully async
  • 27.
    Async Services withGinkgo Ginkgo is a simple framework for composing async gevent services with common configuration, logging, demonizing etc. https://github.com/progrium/ginkgo Let’s look a simple example that implements a TCP and HTTP server...
  • 28.
    Async Services withGinkgo import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"] def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hellon') gevent.sleep(1) app = Service() app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp)) app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http)) app.serve_forever()
  • 29.
    Async Services withGinkgo import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer Import WSGI/TCP Servers from ginkgo.core import Service def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"] def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hellon') gevent.sleep(1) app = Service() app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp)) app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http)) app.serve_forever()
  • 30.
    Async Services withGinkgo import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service HTTP Handler def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"] def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hellon') gevent.sleep(1) app = Service() app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp)) app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http)) app.serve_forever()
  • 31.
    Async Services withGinkgo import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"] def handle_tcp(socket, address): TCP Handler print 'new tcp connection!' while True: socket.send('hellon') gevent.sleep(1) app = Service() app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp)) app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http)) app.serve_forever()
  • 32.
    Async Services withGinkgo import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"] def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hellon') Service gevent.sleep(1) Composition app = Service() app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp)) app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http)) app.serve_forever()
  • 33.
    Incoming Requests Load Balancer Async Server Async Server ... Async Server Using our async reactor-based approach let’s redesign our serving infrastructure
  • 34.
    Incoming Requests Load Balancer AAA ... AAA AAA Async Async Async Server Server Server Step 1: define an authentication and authorization layer that will identify the user and the resource being requested
  • 35.
    Incoming Requests Load Balancer AAA ... AAA AAA Throttling Throttling Throttling Async Async Async Concurrency Server Server Server Manager Step 2: add a throttling layer and concurrency manager
  • 36.
    Concurrency Admission Control • Goal: limit concurrency by delaying or selectively failing requests • Common metrics - By Account - By Resource Type - By Availability of Dependent Resources • What we’ve found useful - By (Account, Resource Type)
  • 37.
    Delay - delayresponses without failing requests Load Latency
  • 38.
    Deny - denyrequests based on resource usage Load Latency /x Fail Latency /*
  • 39.
    Incoming Requests Load Balancer AAA ... AAA AAA Throttling Throttling Throttling App App App Concurrency Server Server Server Manager Throttling Throttling Throttling Step 3: allow backend resources to throttle requests Dependent Services
  • 40.
    Summary Async frameworks likegevent allow you to easily decouple a request from access to constrained resources Service-wide Failure Request Latency Time
  • 41.
    Don’t Fail Requests Decrease Performance
  • 43.
    twilio Evan Cooke @emcooke CONTENTS CONFIDENTIAL & COPYRIGHT © TWILIO INC. 2012