Donatas Mažionis, Building low latency web APIs

This talk is not a hardcore latency talk
I will not talk about:
•CPU caches
•System.nanoTime
•lockless concurrent queues
•magic low latency framework

This talk is not a hardcore latency talk
Scaling from 500 to 150K QPS, the hard way

Latency
a size telling us how long something took

http://www.techempower.com/benchmarks/#section=data-r9&hw=peak&test=json
Typical latency benchmark on the internet

Why average is a common metric?
•Everyone understands it
•It’s easy to calculate

Why average is a common metric?
•Everyone understands it
•It’s easy to calculate
•It can also hide important unwanted behaviour of the system!

Imagine we have a service
with the following response latencies

Calculating latency average
20% of the requests got latency twice as above 10 ms

Percentiles
The value below which a given percentage of observations in a group of observations fall
Like p50% = the max value of 50% of the values

Libraries for tracking latencies
HdrHistogram: http://hdrhistogram.github.io/HdrHistogram/
Uses fixed memory and constant CPU for recording (C, Java, C# work in progress).
Finagle: https://twitter.github.io/finagle/
Scala, Java RPC framework by Twitter, has built in stats and latency tracking.

APIs in online advertising
98% of requests under 100 ms

HTTP

HTTP
JSON

HTTP
JSON
Protocol Buffers

Real-time bidding API
How much would you pay if you give us an ad of size 200x120 to show it on youtube.com for a user from Belgium, who is interested in Sports and Culture?

1.Deserialize request
2.Process some rules
3.Get pre-calculated bid price from storage
4.Calculate some more
5.Serialize response
Real-time bidding request processing
All rest 40 ms for network latency
40 ms
60 ms

LVS + keepalived
Profiler API
User profiles
Bid price calculators
Bidder API
Ad serving

Redis in 50 words or less
Redis is an open source, BSD licensed, advanced key-value cache and store.

Redis as key-value store
•Append write, flush every second
•Operations on multiple keys
•Works great, but watch out when writing/reading on the same node simultaneously

Redis latencies
Simultaneous writes and reads on the same node

Cassandra in 50 words or less
Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database

Why Cassandra is good
•Fast writes
•User profile is a natural key-value model
•Easy to scale (especially with virtual nodes)
•Seemed the most mature at that time (started using from v0.7)
•Runs on a legacy spare HW
•Runs on Windows :)

Why Cassandra is good
•Fast writes
•User profile is a natural key-value model
•All nice features mentioned before
•Seemed the most mature at that time (started using from v0.7)
•Runs on a legacy spare HW
•Runs on Windows :)

Why Cassandra is not so good
GC pauses

Cassandra tuning tricks that worked
•LeveledCompactionStrategy
•Changing Java heap size (8 GB)
•Client direct read of data (token aware strategy)

Cassandra tuning tricks that did not work
GC tuning

Cassandra tuning tricks that did not work
GC tuning
20% of requests exceeding 40 ms

Connecting to Cassandra
Thrift version

Fail fast plan
1.Set a TSocket timeout to 10 ms
2.If node does not answer under 10 ms, try another from the same range
3.Repeat this 3 times

Timeouts in .NET are broken
•.NET Socket SendReceiveTimeout does not work for values less than 500 ms
•Same applies to SocketAsyncEventArgs
•Async version even worse (timer queues, etc.)

Thing that worked
Socket.Poll(int microseconds, SelectMode mode) allows to block until data is available or timeout occurs

Blocking is not always bad
•Timeouts between 0 and 2%
•Scale by adding new servers

Or scale by adding less servers
•Cassandra is not very good at deterministic low latencies
•We switched to Aerospike, same number of QPS, 2x less servers, p99% for reads <= 10 ms
•The whole story here: “Married to Cassandra” http://vimeo.com/101290545

Takeaways
•Don’t measure latency averages
•It’s expensive to scale in .NET:
•No decent Cassandra library, have to roll your own (while Java devs having fun with astyanax, datastax driver, etc.)
•Even though we have rewritten our WCF based bidder to HttpListener (saved 10% CPU), netty throughput is 15% better
•Finagle is a great framework

Takeaways
•Blocking is not always bad, measure
•Choose the right NoSQL(s) for the job

Donatas Mažionis, Building low latency web APIs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Donatas Mažionis, Building low latency web APIs

Similar to Donatas Mažionis, Building low latency web APIs (20)

More from Tanya Denisyuk

More from Tanya Denisyuk (15)

Recently uploaded

Recently uploaded (20)

Donatas Mažionis, Building low latency web APIs