Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building and Monitoring Services at Lithium

846 views

Published on

Paul Cichonski's presentation from SF CloudOps Meetup on building and monitoring fault tolerant systems. (http://www.meetup.com/CloudOps/events/159397622/)

Published in: Technology, Business
  • Best dissertation help you can get, thank god a friend suggested me ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ otherwise I could have never completed my dissertation on time.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Building and Monitoring Services at Lithium

  1. 1. Building and Monitoring Services at Lithium (fault tolerance, resiliency and monitoring) Paul Cichonski, Senior Software Engineer @paulcichonski
  2. 2. Services at Lithium Use: 2
  3. 3. Failure is a Constant, Need to Avoid Cascading Failure Image Source: Netflix Hystrix: https://github.com/Netflix/Hystrix/wiki 3
  4. 4. We All Know How to Simulate Failure: 4
  5. 5. But how do we develop code to deal with failure? 5
  6. 6. Need to build fault tolerant and resilient services... How? Clustering, for high-availability, is not enough to protect against cascading failure 6
  7. 7. #1 Fail Fast: use timeouts aggressively 7
  8. 8. #2 Use circuit breakers on network calls 8
  9. 9. #3 Use async communication when possible 9
  10. 10. #4 Have well thought-out backpressure mechanisms 10
  11. 11. #5 Use cross-region (or crossdatacenter) replication 11
  12. 12. #6 Failure models should be built into the business requirements of a service 12
  13. 13. Read: 13
  14. 14. Even with all of that, your app will still fail, so how do you recover quickly? 14
  15. 15. Devops/Cloudops Model: OODA 15
  16. 16. Observe and Orient: you need metrics and dashboards 16
  17. 17. You Need Metrics • Reduce “map/territory” confusion • We use Yammer Metrics – Timers – Meters – Histograms • We use them a lot – Every class has at least one metric, most have multiple 17
  18. 18. You Need to Visualize the Metrics 18
  19. 19. You Need Dashboards Keyed to Business Functionality 19
  20. 20. Use alerting as a last resort (because sometimes we need to sleep) 20
  21. 21. Decide and Act: you need robust CI and fast code roll-outs 21
  22. 22. Rinse and Repeat 22

×