Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Donatas Mažionis, Building low latency web APIs

602 views

Published on

Published in: Technology

Donatas Mažionis, Building low latency web APIs

  1. 1. This talk is not a hardcore latency talk I will not talk about: •CPU caches •System.nanoTime •lockless concurrent queues •magic low latency framework
  2. 2. This talk is not a hardcore latency talk Scaling from 500 to 150K QPS, the hard way
  3. 3. Latency a size telling us how long something took
  4. 4. http://www.techempower.com/benchmarks/#section=data-r9&hw=peak&test=json Typical latency benchmark on the internet
  5. 5. Why average is a common metric? •Everyone understands it •It’s easy to calculate
  6. 6. Why average is a common metric? •Everyone understands it •It’s easy to calculate •It can also hide important unwanted behaviour of the system!
  7. 7. Imagine we have a service with the following response latencies
  8. 8. Calculating latency average
  9. 9. Calculating latency average 20% of the requests got latency twice as above 10 ms
  10. 10. Percentiles The value below which a given percentage of observations in a group of observations fall Like p50% = the max value of 50% of the values
  11. 11. Percentiles
  12. 12. Percentiles in real life
  13. 13. Libraries for tracking latencies HdrHistogram: http://hdrhistogram.github.io/HdrHistogram/ Uses fixed memory and constant CPU for recording (C, Java, C# work in progress). Finagle: https://twitter.github.io/finagle/ Scala, Java RPC framework by Twitter, has built in stats and latency tracking.
  14. 14. APIs in online advertising
  15. 15. APIs in online advertising 98% of requests under 100 ms
  16. 16. APIs in online advertising 98% of requests under 100 ms HTTP
  17. 17. APIs in online advertising 98% of requests under 100 ms HTTP JSON
  18. 18. APIs in online advertising 98% of requests under 100 ms HTTP JSON Protocol Buffers
  19. 19. Real-time bidding API How much would you pay if you give us an ad of size 200x120 to show it on youtube.com for a user from Belgium, who is interested in Sports and Culture?
  20. 20. Real-time bidding API
  21. 21. 1.Deserialize request 2.Process some rules 3.Get pre-calculated bid price from storage 4.Calculate some more 5.Serialize response Real-time bidding request processing All rest 40 ms for network latency 40 ms 60 ms
  22. 22. LVS + keepalived Profiler API User profiles Bid price calculators Bidder API Ad serving
  23. 23. Redis in 50 words or less Redis is an open source, BSD licensed, advanced key-value cache and store.
  24. 24. Redis as key-value store •Append write, flush every second •Operations on multiple keys •Works great, but watch out when writing/reading on the same node simultaneously
  25. 25. Redis latencies Simultaneous writes and reads on the same node
  26. 26. Cassandra in 50 words or less Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database
  27. 27. Why Cassandra is good •Fast writes •User profile is a natural key-value model •Easy to scale (especially with virtual nodes) •Seemed the most mature at that time (started using from v0.7) •Runs on a legacy spare HW •Runs on Windows :)
  28. 28. Why Cassandra is good •Fast writes •User profile is a natural key-value model •All nice features mentioned before •Seemed the most mature at that time (started using from v0.7) •Runs on a legacy spare HW •Runs on Windows :)
  29. 29. Why Cassandra is not so good
  30. 30. Why Cassandra is not so good GC pauses
  31. 31. Cassandra tuning tricks that worked •LeveledCompactionStrategy •Changing Java heap size (8 GB) •Client direct read of data (token aware strategy)
  32. 32. Cassandra tuning tricks that did not work GC tuning
  33. 33. Cassandra tuning tricks that did not work GC tuning 20% of requests exceeding 40 ms
  34. 34. Connecting to Cassandra Thrift version
  35. 35. Fail fast plan 1.Set a TSocket timeout to 10 ms 2.If node does not answer under 10 ms, try another from the same range 3.Repeat this 3 times
  36. 36. Timeouts in .NET are broken •.NET Socket SendReceiveTimeout does not work for values less than 500 ms •Same applies to SocketAsyncEventArgs •Async version even worse (timer queues, etc.)
  37. 37. Thing that worked Socket.Poll(int microseconds, SelectMode mode) allows to block until data is available or timeout occurs
  38. 38. Blocking is not always bad •Timeouts between 0 and 2% •Scale by adding new servers
  39. 39. Or scale by adding less servers •Cassandra is not very good at deterministic low latencies •We switched to Aerospike, same number of QPS, 2x less servers, p99% for reads <= 10 ms •The whole story here: “Married to Cassandra” http://vimeo.com/101290545
  40. 40. Takeaways •Don’t measure latency averages •It’s expensive to scale in .NET: •No decent Cassandra library, have to roll your own (while Java devs having fun with astyanax, datastax driver, etc.) •Even though we have rewritten our WCF based bidder to HttpListener (saved 10% CPU), netty throughput is 15% better •Finagle is a great framework
  41. 41. Takeaways •Blocking is not always bad, measure •Choose the right NoSQL(s) for the job
  42. 42. Thank you!

×