Successfully reported this slideshow.
Your SlideShare is downloading. ×

Hopperx1 Seattle 2019 - Don't Let Clients Get You Down

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Software Testing
Software Testing
Loading in …3
×

Check these out next

1 of 38 Ad
Advertisement

More Related Content

Advertisement

Hopperx1 Seattle 2019 - Don't Let Clients Get You Down

  1. 1. Don't Let Clients Get You Down Configuring Fault-Tolerant Clients in Resilient Systems Clare Liguori Principal Engineer, Amazon Web Services #Hopperx1Seattle
  2. 2. Clients! Clients!
  3. 3. LB
  4. 4. LB =(
  5. 5. LB =(
  6. 6. LB =(
  7. 7. X X LB =(
  8. 8. Retry #Hopperx1Seattle Timeout Backoff Don’t Let Clients Get You Down
  9. 9. Retry #Hopperx1Seattle Timeout Backoff Don’t Let Clients Get You Down
  10. 10. LB =( If at first you don’t succeed
  11. 11. LB =( Retry!
  12. 12. $ npm install npm ERR! fetch failed git+ssh://...
  13. 13. ~/.ssh/config: Host * ConnectionAttempts 3
  14. 14. LB =( =( =( 10 Retries = 100 Requests x 10 Retries Set sane retry limits!
  15. 15. LB =( Only retry on idempotent APIs Pizzas ordered: 0 =(
  16. 16. LB =( Only retry on idempotent APIs Pizzas ordered: 0 =( Try #1: Connection drops
  17. 17. LB =( Only retry on idempotent APIs Pizzas ordered: 0 =( Try #2: Connection drops
  18. 18. LB =( Only retry on idempotent APIs Pizzas ordered: 1 =( Try #3: Success!
  19. 19. LB Only retry on idempotent APIs Pizzas ordered: 3 Uh oh, those servers are healthy again
  20. 20. Retry #Hopperx1Seattle Timeout Backoff Don’t Let Clients Get You Down
  21. 21. LB
  22. 22. connection timeout socket timeout read timeout write timeout individual request timeout overall timeout including retries
  23. 23. Time ThreadsInUse =( =) Max Threads
  24. 24. $ curl https://checkip.amazonaws.com
  25. 25. $ curl --retry 3 --connect-timeout 10 --max-time 20 https://checkip.amazonaws.com
  26. 26. Retry #Hopperx1Seattle Timeout Backoff Don’t Let Clients Get You Down
  27. 27. LB =( =( =( 10 Retries = 100 Requests x 10 Retries
  28. 28. sleep(2)
  29. 29. sleep(0.5) sleep(1) sleep(2) sleep(4) sleep(8)
  30. 30. sleep(0.5) sleep(1) sleep(2) sleep(4) sleep(8) sleep(0.5) sleep(1) sleep(2) sleep(4) sleep(8) sleep(0.5) sleep(1) sleep(2) sleep(4) sleep(8) sleep(0.5) sleep(1) sleep(2) sleep(4) sleep(8)
  31. 31. Time Requests =( =) =( =( =( =( =( =) =) =) =)
  32. 32. sleep(0.6) sleep(1.1) sleep(2.1) sleep(4.1) sleep(8.1) sleep(0.7) sleep(1.2) sleep(2.2) sleep(4.2) sleep(8.2) sleep(0.8) sleep(1.3) sleep(2.3) sleep(4.3) sleep(8.3) sleep(0.9) sleep(1.4) sleep(2.4) sleep(4.4) sleep(8.4) Add jitter!
  33. 33. Time Requests =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =) =)
  34. 34. $ npm install retry-cli $ retry –-retries=3 --factor=2 --randomize -- curl --connect-timeout 10 https://checkip.amazonaws.com
  35. 35. $ npm install retry-cli $ retry –-retries=3 --factor=2 --randomize -- curl --connect-timeout 10 https://checkip.amazonaws.com Retry Exponential Backoff Backoff Jitter Timeout
  36. 36. Retry #Hopperx1Seattle Timeout Backoff Don’t Let Clients Get You Down
  37. 37. Thank you! #Hopperx1Seattle

Editor's Notes

  • Intro: I’m Clare
    Why do clients matter to resiliency?
  • Example multi-tier system with microservices
  • There are lots of clients in this system
  • Drill into service: load balancer + multiple servers
  • What happens when a single server has problems?
  • Hopefully something like health checks realizes its down and removes it from load balancer
  • BUT until then... clients are still going to that server!
  • And the failure cascades all the way up the system!
    So, how can we configure clients to be resilient to failures, so that our entire system is resilient?
  • Today we’ll talk about three important aspects of client resiliency
  • Let’s start with retries
  • Retries are self-explanatory
  • Retrying likely gets request directed to healthy node
  • Real world example: Installing npm modules during a cron job
  • Enable retries in SSH
  • Set retry limits (3-5), don’t make a bad situation worse
  • What happens when an API doesn’t de-duplicate requests?
  • For example, if a network connection drops, you don’t have confirmation of whether the request is still alive on that server
  • The servers may have just been processing requests slowly, but eventually complete them. Idempotent APIs will have a field like “idempotency token” or “retry token” where you can provide a token that signifies only ONE request with this token should be processed.
  • Timeouts give opportunity to retry in case of networking or load problems where requests are very slow. Timeouts also ensure slow requests don't hog all your threads
  • Lots of different types of timeouts, varies by client library how much control you get over each of these
  • Favorite is socket read timeout: many systems have a default timeout of infinity (FOREVER!). Can result in hung threads that stack up over time
  • Real world example: curl script would randomly hang forever
  • Configure timeouts and retries
  • Too many retries can get you throttled, so backoff lets you retry only the number of times you need to
  • Simplest backoff, so you’re not hammering the servers
  • Better: exponential backoff: delay more every time you retry
  • What happens if every client backs off the same amount?
  • They all cause high load at the same time
  • Add jitter: each client randomizes the amount they delay a little bit
  • Smoothes out the request load
  • Use retry-cli to get exponential backoff + randomized delay (i.e. jitter)
  • Use retry-cli to get exponential backoff + randomized delay (i.e. jitter)
  • We’ve covered retries, timeouts, and backoffs. Go inspect your clients!
  • Questions?

×