Building Web APIs that ScaleDesigning for Graceful DegradationEvan Cooke, Twilio, CTO@emcooke
Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contai...
Cloud services and the APIs they power are becoming the backbone of modern society. APIssupport the apps that structure ho...
TwilioObservations today based onexperience building @twilio• Founded in 2008• Infrastructure APIs to automate  phone and ...
Cloud Workloads    Can Be Unpredictable
Twilio SMS API TrafficNo time for…•a human to respond to a pager•to boot new servers                          6x spike in ...
Typical Scenario                   Danger!                   Load higher than                   instantaneous throughput  ...
Goal Today     Support graceful degradation of API      performance under extreme load                  No Failure
IncomingWhy Failures?                                   Requests                               Load                       ...
Worker Pools      e.g., Apache/Nginx                               Failed                               Requests          ...
Problem Summary  • Cloud services often use worker pools    to handle incoming requests  • When load goes beyond size of t...
Queues to the rescue?  Incoming                                       Process &  Requests                                 ...
Observation 1 A synchronous web API is often much easier for developers to integrate due   additional complexity of callba...
Synchronous vs. Asynchronous InterfacesTake POST data from a web form, send it to a geo lookup API, store the             ...
Observation 2For many APIs, taking additional time to service a request is better than failing          that specific requ...
Observation 3 It is better to fail some requests than all              incoming requestsImplication Under load, it may bet...
Event-driven programming and the         Reactor Pattern
Thread/Worker Model                 Worker         Time   req = ‘GET /’;            1   req.append(‘/r/n/r/n’);   1   sock...
Thread/Worker Model                 Worker            Time   req = ‘GET /’;               1   req.append(‘/r/n/r/n’);     ...
Event-based Programming   req = ‘GET /’;             Make IO   req.append(‘/r/n/r/n’);    operations async   socket.write(...
Reactor Dispatcher   req = ‘GET /’;             Central dispatch   req.append(‘/r/n/r/n’);    to coordinate   socket.write...
Non-blocking IO                                        Time   req = ‘GET /’;                  1   req.append(‘/r/n/r/n’); ...
Request Response Decoupling                              Using this   req = ‘GET /’;             approach we can   req.app...
(Some) Reactor-Pattern Frameworks                  c/libevent                  c/libev              java/nio/netty        ...
Callback Spaghetti   req = ‘GET /’                   Example of   req += ‘/r/n/r/n’                                   call...
inlineCallbacks to the Rescue   req = ‘GET /’                We can clean up   req += ‘/r/n/r/n’                          ...
Easy Sequential Programming   req = ‘GET /’                Easy sequential   req += ‘/r/n/r/n’                            ...
(Some) Reactor-Pattern Frameworks                  c/libevent                  c/libev              java/nio/netty        ...
Event Python gevent   “gevent is a coroutine-based Python   networking library that uses greenlet to   provide a high-leve...
gevent Example                                                  Easy sequential Simple Echo Server                        ...
gevent Example Simple Echo Server from gevent.server import StreamServer   However, gevent requires                    dae...
Async Services with Ginkgo   Ginkgo is a simple framework for   composing asynchronous gevent services   with common confi...
Ginkgo Example import gevent from gevent.pywsgi import WSGIServer     Import from gevent.server import StreamServer from g...
Ginkgo Example import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core ...
Ginkgo Example import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core ...
Ginkgo Example import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core ...
Toward Fully a Asynchronous API  Using Ginkgo or another async  framework let’s look at our web-worker  architecture and s...
IncomingThe Old Way                                   Requests                               Load                         ...
Incoming                          Requests                     Load                    Balancer Async Server           Asy...
Incoming                          Requests                                      Huzzah, now                     Load      ...
Incoming                         Requests                     Load                    Balancer  AAA        AAA            ...
AAA Manager  Goal Perform authentication,  authorization and accounting for each  incoming API request  Extract key parame...
Incoming                                Requests                            Load                           Balancer   AAA ...
Concurrency Manager  Goal determine whether to delay or drop  an individual request to limit access to  API resources  Pos...
Concurrency Manager  What we’ve found useful  •Tuple (Account, Resource Type)  Supports multi-tenancy  • Protection betwee...
Concurrency Manager  Concurrency manager returns one of  1. Allow the request immediately  2. Delay the request before bei...
Step 4 – provide for                 Incomingconcurrency control                  Requestsbetween the servers             ...
Conclusion 1 A synchronous web API is often much easier for developers to integrate due   additional complexity of callbac...
Conclusion 2For many APIs, taking additional time to service a request is better than failing          that specific reque...
Example of Delay Injection   Load Latency                   Spread load across a                    longer time period
Conclusion 3     It is better to fail some incoming     requests than to fail all requestsThe proposed asynchronous API fr...
Example of Dropping Requests     LoadLatency /x                Dropped Latency /*              Drop only the requests that...
SummaryAsync frameworks like gevent allow you to easilydecouple a request from access to constrainedresources             ...
Building Web APIs that Scale
Building Web APIs that Scale
Building Web APIs that Scale
Building Web APIs that Scale
Upcoming SlideShare
Loading in …5
×

Building Web APIs that Scale

1,753 views

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,753
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
34
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Building Web APIs that Scale

  1. 1. Building Web APIs that ScaleDesigning for Graceful DegradationEvan Cooke, Twilio, CTO@emcooke
  2. 2. Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward- looking statements.
  3. 3. Cloud services and the APIs they power are becoming the backbone of modern society. APIssupport the apps that structure how we work, play, and communicate.
  4. 4. TwilioObservations today based onexperience building @twilio• Founded in 2008• Infrastructure APIs to automate phone and SMS communications• 120 Employees• >1000 servers running 24x7
  5. 5. Cloud Workloads Can Be Unpredictable
  6. 6. Twilio SMS API TrafficNo time for…•a human to respond to a pager•to boot new servers 6x spike in 5 mins
  7. 7. Typical Scenario Danger! Load higher than instantaneous throughput Load FAIL Request Latency
  8. 8. Goal Today Support graceful degradation of API performance under extreme load No Failure
  9. 9. IncomingWhy Failures? Requests Load Balancer Worker Pool AAA AAA AAA WW ... Throttling Throttling Throttling App App App W App W Server Server Server W W Server WW Throttling Throttling Throttling
  10. 10. Worker Pools e.g., Apache/Nginx Failed Requests 100%+ 70%10% Time
  11. 11. Problem Summary • Cloud services often use worker pools to handle incoming requests • When load goes beyond size of the worker pool, requests fail
  12. 12. Queues to the rescue? Incoming Process & Requests Respond 1. If we synchronously respond, each item in the queue still ties up a worker. Doh 2. If we close the incoming connection and free the worker then we need an asynchronous callback to respond to the request Doh
  13. 13. Observation 1 A synchronous web API is often much easier for developers to integrate due additional complexity of callbacksImplication Responding to requestssynchronously is often preferable to queuingthe request and responding with anasynchronous callback
  14. 14. Synchronous vs. Asynchronous InterfacesTake POST data from a web form, send it to a geo lookup API, store the result DB and return status page to user Sync Async d = read_form(); d = read_form(); geo = api->lookup(d); api->lookup(d); db->store(d, geo); return “success”; # in /geo-result db->store(d, geo); ws->send(“success”);Async interface need a separate URL handler,and websocket connection to return the result
  15. 15. Observation 2For many APIs, taking additional time to service a request is better than failing that specific requestImplication In many cases, it is better to servicea request with some delay rather than failing it
  16. 16. Observation 3 It is better to fail some requests than all incoming requestsImplication Under load, it may better toselectively drop expensive requests that can’tbe serviced and allow others
  17. 17. Event-driven programming and the Reactor Pattern
  18. 18. Thread/Worker Model Worker Time req = ‘GET /’; 1 req.append(‘/r/n/r/n’); 1 socket.write(req); 10000x resp = socket.read(); 10000000x print(resp); 10
  19. 19. Thread/Worker Model Worker Time req = ‘GET /’; 1 req.append(‘/r/n/r/n’); 1 socket.write(req); 10000x resp = socket.read(); 10000000x print(resp); 10 Huge IO latency blocks worker
  20. 20. Event-based Programming req = ‘GET /’; Make IO req.append(‘/r/n/r/n’); operations async socket.write(req, fn() { socket.read(fn(resp) { and “callback” print(resp); when done }); });
  21. 21. Reactor Dispatcher req = ‘GET /’; Central dispatch req.append(‘/r/n/r/n’); to coordinate socket.write(req, fn() { socket.read(fn(resp) { event callbacks print(resp); }); }); reactor.run_forever();
  22. 22. Non-blocking IO Time req = ‘GET /’; 1 req.append(‘/r/n/r/n’); 1 socket.write(req, fn() { 10 socket.read(fn(resp) { 10 print(resp); 10 }); }); No delay blocking reactor.run_forever(); the worker waiting for IO
  23. 23. Request Response Decoupling Using this req = ‘GET /’; approach we can req.append(‘/r/n/r/n’); decouple the socket.write(req, fn() { socket of an socket.read(fn(resp) { print(resp); incoming }); connection from }); reactor.run_forever(); the processing of that connection
  24. 24. (Some) Reactor-Pattern Frameworks c/libevent c/libev java/nio/netty js/node.js Goliath ruby/eventmachine Cramp python/twisted python/gevent
  25. 25. Callback Spaghetti req = ‘GET /’ Example of req += ‘/r/n/r/n’ callback nesting def r(resp): complexity with print resp Python Twisted def w(): (Also node.js) socket.read().addCallback(r) socket.write().addCallback(w)
  26. 26. inlineCallbacks to the Rescue req = ‘GET /’ We can clean up req += ‘/r/n/r/n’ the callbacks yield socket.write() using deferred resp = yield socket.read() generators and print resp inline callbacks (similar frameworks also exist for js)
  27. 27. Easy Sequential Programming req = ‘GET /’ Easy sequential req += ‘/r/n/r/n’ programming yield socket.write() with mostly resp = yield socket.read() implicit print resp asynchronous IO
  28. 28. (Some) Reactor-Pattern Frameworks c/libevent c/libev java/nio/netty js/node.js Goliath ruby/eventmachine Cramp python/twisted python/gevent
  29. 29. Event Python gevent “gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.” Natively asynchronous socket.write() resp = socket.read() print resp
  30. 30. gevent Example Easy sequential Simple Echo Server model yet fully from gevent.server import StreamServer asynchronous def echo(socket, address): print (New connection from %s:%s % address) socket.sendall(Welcome to the echo server!rn) line = fileobj.readline() fileobj.write(line) fileobj.flush() print ("echoed %r" % line) if __name__ == __main__: server = StreamServer((0.0.0.0, 6000), echo) server.serve_forever()
  31. 31. gevent Example Simple Echo Server from gevent.server import StreamServer However, gevent requires daemonization, logging and def echo(socket, address): print (New connection from %s:%s % address) other servicification functionality socket.sendall(Welcome to the echo server!rn) line = fileobj.readline() for production use such fileobj.write(line) fileobj.flush() print ("echoed %r" % line)Twisted’s twistd if __name__ == __main__: server = StreamServer((0.0.0.0, 6000), echo) server.serve_forever()
  32. 32. Async Services with Ginkgo Ginkgo is a simple framework for composing asynchronous gevent services with common configuration, logging, demonizing etc. https://github.com/progrium/ginkgo Let’s look a simple example that implements a TCP and HTTP server...
  33. 33. Ginkgo Example import gevent from gevent.pywsgi import WSGIServer Import from gevent.server import StreamServer from ginkgo.core import Service WSGI/TCP Servers
  34. 34. Ginkgo Example import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service def handle_http(env, start_response): start_response(200 OK, [(Content-Type, text/html)]) print new http request!’ return ["hello world”] HTTP Handler
  35. 35. Ginkgo Example import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service def handle_http(env, start_response): start_response(200 OK, [(Content-Type, text/html)]) print new http request!’ return ["hello world"] def handle_tcp(socket, address): print new tcp connection!’ while True: socket.send(hellon’) TCP Handler gevent.sleep(1)
  36. 36. Ginkgo Example import gevent from gevent.pywsgi import WSGIServer from gevent.server import StreamServer from ginkgo.core import Service def handle_http(env, start_response): start_response(200 OK, [(Content-Type, text/html)]) print new http request!’ return ["hello world"] def handle_tcp(socket, address): print new tcp connection!’ while True: socket.send(hellon’) gevent.sleep(1) Service app = Service() Composition app.add_service(StreamServer((127.0.0.1, 1234), handle_tcp)) app.add_service(WSGIServer((127.0.0.1, 8080), handle_http)) app.serve_forever()
  37. 37. Toward Fully a Asynchronous API Using Ginkgo or another async framework let’s look at our web-worker architecture and see how we can modify it to become fully asynchronous WW WW W W WW
  38. 38. IncomingThe Old Way Requests Load Balancer Worker Pool AAA AAA AAA WW ... Throttling Throttling Throttling App App App W App W Server Server Server W W Server WW Throttling Throttling Throttling
  39. 39. Incoming Requests Load Balancer Async Server Async Server ... Async ServerStep 1 - Let’s start by replacing our threadedworkers with asynchronous app servers
  40. 40. Incoming Requests Huzzah, now Load idle open Balancer connections will use very few server Async Server Async Server ... Async Server resourcesStep 1 - Let’s start by replacing our threadedworkers with asynchronous app servers
  41. 41. Incoming Requests Load Balancer AAA AAA AAA Async Server Async Server ... Async ServerStep 2 – Define authentication and authorizationlayer to identify the user and resource requested
  42. 42. AAA Manager Goal Perform authentication, authorization and accounting for each incoming API request Extract key parameters • Account • Resource Type
  43. 43. Incoming Requests Load Balancer AAA AAA AAA ... Throttling Throttling Throttling Async Async Concurrency Async Manager Server Server ServerStep 3 – Add a concurrency manager thatdetermines whether to throttle each request
  44. 44. Concurrency Manager Goal determine whether to delay or drop an individual request to limit access to API resources Possible inputs • By Account • By Resource Type • By Availability of Dependent Resources
  45. 45. Concurrency Manager What we’ve found useful •Tuple (Account, Resource Type) Supports multi-tenancy • Protection between Accounts • Protect within an account between resource types e.g., Calls & SMS
  46. 46. Concurrency Manager Concurrency manager returns one of 1. Allow the request immediately 2. Delay the request before being processed 3. Drop the request and return an error HTTP 429 - Concurrency Limit Reached
  47. 47. Step 4 – provide for Incomingconcurrency control Requestsbetween the servers Loadand backend Balancerresources AAA AAA AAA ... Throttling Throttling Throttling Async Async Concurrency Async Manager Server Server Server Throttling Throttling Throttling Dependent Services
  48. 48. Conclusion 1 A synchronous web API is often much easier for developers to integrate due additional complexity of callbacksThe proposed asynchronous API frameworkallows provides for synchronous API callswithout worrying about worker pools filling up.It is also easy to add callback where needed.
  49. 49. Conclusion 2For many APIs, taking additional time to service a request is better than failing that specific requestThe proposed asynchronous API frameworkprovides the ability to inject into delay theprocessing of incoming requests rather thandropping them.
  50. 50. Example of Delay Injection Load Latency Spread load across a longer time period
  51. 51. Conclusion 3 It is better to fail some incoming requests than to fail all requestsThe proposed asynchronous API frameworkprovides the ability to selectively drop requeststo limit contention on limited resources
  52. 52. Example of Dropping Requests LoadLatency /x Dropped Latency /* Drop only the requests that we must due to scare backend resources
  53. 53. SummaryAsync frameworks like gevent allow you to easilydecouple a request from access to constrainedresources API outageRequestLatency Time

×