PAIN-FREE PIPELINING
John Loehrer
Joya Communications
efficient Redis network i/o
PAIN-FREE PIPELINING
WHAT IS PIPELINING?
• Client: INCR X
• Server: 1
• Client: INCR X
• Server: 2
• Client: INCR X
• Server: 3
• Client: INCR X
• Client: INCR X
• Client: INCR X
• Server: 1
• Server: 2
• Server: 3
SINGLE VS. PIPELINE
WHY PIPELINE?
RTT
COMPARE THROUGHPUT
Single commands vs Pipelined
• single: 17k req/s
• pipelined: 260k req/s
QUICK BENCHMARK
redis-benchmark -t get -c 1
SO WHAT?
fast enough without pipelining
LOOPBACK != PROD
need to simulate real network conditions
TOXIPROXY THROTTLING
https://github.com/Shopify/toxiproxy
CONFIGURE TOXIPROXY
• toxiproxy-cli create redis -l localhost:26379 -u localhost:6379
• toxiproxy-cli toxic add redis -t latency -a latency=1
• single: 216 req/s
• pipelined: 4300 req/s
NETWORK BENCHMARK
1 ms throttle
0 15 30 45 60
localhost
productio
n
CPU Network
LOW-HANGING FRUIT
Optimize CPU or Network?
OPTIMIZE RTT
mitigate cost of network round-trips
PIPELINES FTW!
BUT …
PREPARE FOR THE PAIN
GIVE UP ENCAPSULATION?
REFACTORING HURTS
pipelining api is different from single calls
WHY NOT ASYNC?
• one connection for each command
• requires different programming approach
ASYNC I/O ISSUES
MY PARADIGM SHIFT
pipeline everything
LEARN FROM MY MISERY
After years of effort …
Reference Implementation
redpipe.readthedocs.io
FAMILIAR API
RedPipe works almost like redis-py
HOW IS IT DIFFERENT?
• redis-py pipelines make you wait till
payday
• RedPipe gives you instant credit
REDPIPE RETURNS FUTURES
WHAT’S THE POINT?
Code Reuse & Flexibility
• assign data before you have it
• return the response before it has been
calculated
• allows logical encapsulation even while
pipelining
FUTURES WRAP RESPONSES
Use them like the real thing
If it quacks …
DUCK TYPING
USE CALLBACKS TO BUILD
YOUR OWN FUTURES
NESTED PIPELINES
WRAP NOTHING
instantly executes
WRAP ANOTHER PIPE
passes the commands upstream
WHAT ELSE?
RedPipe has some other fun things …
KEYSPACES
still works like redis-py
DEFINE YOUR DATA TYPE
• strings
• lists
• sets
• hashes
• sorted sets
• hyperloglog
• geo (in progress)
SPECIFY A CONNECTION
can talk to multiple backends transparently
CHARACTER ENCODING
translate fields to a consistent data-type
SUPPORT COMPLEX
TYPES
hash fields support python primitives
• bool
• int
• float
• list
• dict
WRITE YOUR OWN
fields just need an encode/decode method
STRUCTS
ability to manipulate redis like a dictionary …
And still pipeline it all
QUESTIONS?

RedisConf17 - Pain-free Pipelining

Editor's Notes

  • #3 We built our code so that any function can be pipelined with another. Took a lot of trial and error to realize we could do that and find efficient patterns to do so. Hindsight is 20/20.
  • #4 the pic is of keystone XL pipeline. Also think of HTTP pipelining. Also called boxcarring.
  • #6 Total API request time is often a function of number of network round trips * network latency. Tell story of fighting fire behind my house as a kid, with burlap sacks. Most people think about transaction support but I’ve always found LUA scripting a much better way to accomplish this. Pipelining is about improving network efficiency. You could try to do it with LUA, but then you can’t chain together smaller bits of logic together flexibly.
  • #8 unscientific, but you can test it in your own environment
  • #9 it all depends on what proportion of an api call is spent waiting on redis responses We use redis and found that a big chunk of time was spent talking to redis.
  • #14 Made up numbers, but illustrates a point. On localhost I profile my app and find that a particular API call takes 6ms, and 1ms is spent making calls to redis. So I think I should spend my time optimizing CPU. Maybe I should rewrite my python app in Go (actually, maybe I should, but that’s a different talk). But if you profile a production environment, you see that network is actually a much bigger slice of the pie. Now I still spend 5ms on computation in python, but I spend 50ms talking to redis. Network is clearly the more fertile ground for optimization.
  • #24 Tell story of the gate in west texas.
  • #26 rewrite of all the ideas I’ve learned so far. What I would have written 3 years ago if I knew what I know now.
  • #42 once you know the data type, you can make assumptions about the data going in and out of redis. Before that, you have to assume it’s all bytes.