• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Indexing thousands of writes per second with redis
 

Indexing thousands of writes per second with redis

on

  • 10,884 views

My talk from RailsConf 2011. Indexing thousands of writes per second with Redis.

My talk from RailsConf 2011. Indexing thousands of writes per second with Redis.

Statistics

Views

Total Views
10,884
Views on SlideShare
10,649
Embed Views
235

Actions

Likes
22
Downloads
297
Comments
0

9 Embeds 235

http://en.oreilly.com 181
http://svarr.heroku.com 20
http://uokada.tumblr.com 13
http://teamco-anthill.blogspot.com 12
http://twitter.com 4
https://twitter.com 2
http://paper.li 1
http://www.tumblr.com 1
http://teamco-anthill.blogspot.in 1
More...

Accessibility

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Indexing thousands of writes per second with redis Indexing thousands of writes per second with redis Presentation Transcript

  • Indexing Thousands of Writes per Second with Redis Paul Dix paul@pauldix.net @pauldix http://pauldix.net
  • I’m Paul Dix
  • I wrote this book
  • Benchmark Solutions* who I work for* we’re hiring, duh! email: paul@benchmarksolutions.com
  • had a spiel about thesuit..Before we get to the talk...
  • That bastard stole my thunder!
  • You don’t think of suit wearing badassSeñor Software Engineer
  • I work
  • Finance
  • the janitors, cleaning staff, and the 18 year old intern get this title too...Vice President
  • How could I wear anything but a suit?Finance + VP + Suit = douchebag
  • Distractionhttp://www.flickr.com/photos/33562486@N07/4288275204/
  • Bethttp://www.flickr.com/photos/11448492@N07/2825781502/
  • Barhttp://www.flickr.com/photos/11448492@N07/2825781502/
  • @flavorjones coauthor of Nokogiricredit: @ebiltwin
  • JSON vs. XML
  • XML Sucks Hard
  • JSON is teh awesome
  • XML parsing S-L-O-W
  • 10x slower
  • Mike called BS
  • A bet!
  • and I was like: “sure, for a beer”
  • and Mike was all like:“ok, but that’s lame”
  • “let’s make itinteresting. Loser wears my daughter’s fairy wings during your talk”
  • Sure, that’ll be funny and original...
  • Dr. Nic in fairy wings
  • That bastard stole my thunder!
  • Nic may have done it as part of the talk, but he didn’t lose a bet... put wings on in red-faced shame.So who won?
  • credit: @jonathanpberger
  • Nokogiri ~ 6.8x slower
  • REXML(ActiveRecord.from_xml) ~ 400x slower
  • Lesson:Always use JSON
  • Lesson:Don’t make bar bets
  • However, the bet saidnothing about my slides
  • Aaron Patterson father of nokogiri 3 slides with @tenderlove’s picture? wtf?!!
  • Called Mike:“Nokogiri’s mother”
  • Fairy Godmother?
  • Lesson: Learn Photoshop(this shit is embarrassing)
  • Anyway, the point of the suit...
  • take me seriously,dammit!
  • On to the actual talk...
  • it’s about...
  • Redis
  • Sustained write load of ~ 5k per second
  • Redis + other datastores = bad assery
  • @flavorjones and maybe about Mike being some kind of virgin mothercredit: @ebiltwin
  • Lesson: Be specific about the terms of a bet(because at least someone can use Photoshop)
  • Who’s used Redis?
  • NoSQL
  • Key/Value Store
  • Created bySalvatore Sanfilippo @antirez
  • Data structure store
  • Basics
  • Keysrequire redisredis = Redis.newredis.set("mike", "grants wishes") # => OKredis.get("mike") # => "grants wishes"
  • Countersredis.incr("fairy_references") # => 1redis.decr("dignity") # => -1redis.incrby("fairy_references", 23) # => 24redis.decrby("dignity", 56) # => 57
  • Expirationredis.expire("mike", 120)redis.expireat("mike", 1.day.from_now.midnight)
  • Hashesredis.hset("paul", "has_wings", true)redis.hget("paul", "has_wings") # => "true"redis.hmset("paul", :location, "Baltimore", :twitter, "@pauldix")redis.hvals("paul") # => {# "has_wings"=>"true",# "location"=>"Baltimore",# "twitter"=>"@pauldix" }redis.hlen("paul") # => 3
  • Listsredis.lpush("events", "first") # => 1redis.lpush("events", "second") # => 2redis.lrange("events", 0, -1) # => ["second", "first"]redis.rpush("events", "third") # => 3redis.lrange("events", 0, -1) # => ["second", "first", "third"]redis.lpop("events") # => "second"redis.lrange("events", 0, -1) # => ["first", "third"]redis.rpoplpush("events", "fourth") # => "third"
  • Setsredis.sadd("user_ids", 1) # => trueredis.scard("user_ids") # => 1redis.smembers("user_ids") # => ["1"]redis.sismember(1) # => trueredis.srem("user_ids", 1) # => true
  • Sets Continued# know_paul ["1", "3", "4"]# know_mike ["3", "5"]redis.sinter("know_paul", "know_mike") # =>["3"]redis.sdiff("know_paul", "know_mike") # =>["1", "4"]redis.sdiff("know_mike", "know_paul") # =>["5"]redis.sunion("know_paul", "know_mike") # =>["1", "3", "4", "5"]
  • Sorted Setsredis.zadd("wish_counts", 2, "paul") # => trueredis.zcard("wish_counts") # => 1redis.zismember("paul") # => trueredis.zrem("wish_counts", "paul") # => true
  • Sorted Sets Continuedredis.zadd("wish_counts", 12, "rubyland")redis.zrange("wish_counts", 0, -1) # => ["paul", "rubyland"]redis.zrange("wish_counts", 0, -1, :with_scores => true) # => ["paul", "2", "rubyland", "12"]redis.zrevrange("wish_counts", 0, -1) # => ["rubyland", "paul"]
  • Sorted Sets Continuedredis.zrevrangebyscore("wish_counts", "+inf", "-inf") # => ["rubyland", "paul"]redis.zrevrangebyscore("wish_counts", "+inf", "10") # => ["rubyland"]redis.zrevrangebyscore("wish_counts", "+inf", "-inf", :limit => [0, 1]) # => ["rubyland"]
  • Lesson:Keeping examples consistent with a stupid story is hard
  • pubsub, transactions,more commadnds.not covered here, leave mealone There’s more
  • Crazy Fast
  • Faster than a greased cheetah
  • or a Delorean with 1.21 gigawatts
  • OMG Scaling Sprinkles!
  • No Wishes Grantedf-you, f-ball!
  • Lesson:Getting someone to pose is easier (also, learn Photoshop)
  • Still monolithicnot horizontallyscalable, oh noes!
  • Can shard in client like memcachedI know haters, you cando this
  • Still not highly available
  • Still susceptible to partitions
  • However, it’s wicked cool
  • Why Index with Redis?
  • Don’tyou probably don’t needit http://www.flickr.com/photos/34353483@N00/205467442/
  • and you’re all like,“Paul, ...” But I have to SCALE!
  • No you don’t
  • Trust me, I’m wearing a suit
  • that means I haveauthority and... I know shit
  • and still you cry: But no, really...
  • Sad SQL is Sadthousands ofwrites persecond? No megusto!
  • ok, fine.
  • My Use Cases
  • 40k unique things
  • Updating every 10 seconds
  • Plus other updates...
  • Average write load of 3k-5k writes per second
  • LVCredis.hset("bonds|1", "bid_price", 96.01)redis.hset("bonds|1", "ask_price", 97.53)redis.hset("bonds|2", "bid_price", 90.50)redis.hset("bonds|2", "ask_price", 92.25)redis.sadd("bond_ids", 1)redis.sadd("bond_ids", 2)
  • Index on the fly
  • SORTredis.sort("bond_ids", :by => "bonds|*->bid_price") # => ["2", "1"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :get => "bonds|*->bid_price") # =>["90.5", "96.01"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :get => ["bonds|*->bid_price", "#"]) # =>["90.5", "2", "96.01", "1"]
  • SORT Continuedredis.sort("bond_ids", :by => "bonds|*->bid_price", :limit => [0, 1]) # => ["2"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :order => "desc") # => ["1", "2"]redis.sort("bond_ids", :by => "bonds|*->ask_price") # => ["1", "2"]redis.sort("bond_ids", :by => "bonds|*->ask_price", :store => "bond_ids_sorted_by_ask_price", :expire => 300) # => 2
  • Getting Recordsids = redis_sort_results.map {|id| id.to_i}bonds = Bond.find(ids) note that prices (high write volume data) comebond_ids_to_bond = {} from elsewhere (not the SQL db)bonds.each do |bond| bond_ids_to_bond[bond.id] = bondendresults = ids.map do |id| bond_ids_to_bond[id]end
  • Getting From Redisredis.hset("bonds|2", "values", data.to_json)raw_json = redis.sort("bond_ids", However, then you have to worry about keeping the t wo data stores in :get => "bonds|*->bid_price", sync. We’ll talk about it later :get => "bonds|*->values")results = raw_json.map do |json| DataObject.new(JSON.parse(json))end
  • Pre-Indexing
  • Rolling Index
  • Last n Events
  • Activity Log
  • News Feed
  • Use a List O(1) constant time complexity to add O(start + n) for readingN = 500size = redis.lpush("bond_trades|1", trade_id)# roll the indexredis.rpop("bond_trades|1") if size > N# get resultsredis.lrange("bond_trades|1", 0, 49)
  • Indexing Events Since Time T
  • Using a Listredis.lpush("bond_trades|1|2011-05-19-10", trade_id)redis.lrange("bond_trades|1|2011-05-19-10", 0, -1)results = redis.pipelined do redis.lrange("bond_trades|1|2011-05-19-10", 0, -1) redis.lrange("bond_trades|1|2011-05-19-09", 0, -1)end.flatten
  • Rolling the Index# when something tradesredis.sadd("bonds_traded|2011-05-19-10", bond_id)# cron task to remove old datatraded_ids = redis.smembers( "bonds_traded|2011-05-19-10")keys = traded_ids.map do |id| "bond_trades|#{id}|2011-05-19-10"endkeys << "bonds_traded|2011-05-19-10"redis.del(*keys)
  • Using a Sorted Set# Time based Rolling Index using sorted setredis.zadd("bond_trades|1", O(log(n)) writes Time.now.to_i, trade_id) O(log(n) + M) reads# last 20 tradesredis.zrevrange("bond_trades|1", 0, 20)# trades in the last hourredis.zrevrangebyscore("bond_trades|1", "+inf", 1.hour.ago.to_i)
  • Rolling the Index# cron task to roll the indexbond_ids = redis.smembers("bond_ids")remove_since_time = 24.hours.ago.to_iredis.pipelined do bond_ids.each do |id| redis.zremrangebyscore( "bond_trades|#{id}", "-inf", remove_since_time) endend
  • Or roll on read or writeredis.zadd("bond_trades|1", Time.now.to_i, trade_id)redis.zremrangebyscore("bond_trades|1", "-inf", 24.hours.ago)
  • Indexing N Valuesredis.zadd("highest_follower_counts", 2300, 20)redis.zadd("lowest_follower_counts", 2300, 20)# rolling the indexes# keep the lowestsize = redis.zcard("lowest_follower_counts")redis.zremrangebyrank("lowest_follower_counts", N, -1) if size > N# keep the highestsize = redis.zcard("highest_follower_counts")redis.zremrangebyrank("highest_follower_counts", 0, size - N) if size > N
  • rolling requires moreroundtrips 2 roundtrips (only with complex pipelining)
  • Roll indexes with only one trip
  • Tweet to @antirez that you want scripting
  • Keeping it Consistent
  • create/update/destroy
  • database transactions can’t help you here, you’ll have to put them into your application logicNo transactions,application logic
  • Disaster Recovery
  • Two failure scenarios
  • Web app dies
  • Redis server dies
  • Could result in index inconsistency
  • Simple recovery script
  • Write Index Timesredis.set("last_bond_trade_indexed", trade.created_at.to_i)
  • Restore Each Indextime_int = redis.get("last_bond_trade_indexed").to_iindex_time = Time.at(time_int)trades = Trade.where( "created_at > :index_time AND created_at <= :now", {:index_time => index_time, :now => Time.now})trades.each do |trade| list you have to run while not writing new data. trade.index_in_redis Set can be made to runend while writing new data
  • Our scale
  • Single Process
  • Sets don’t work withintersection, union, ordiff.SORT won’t work unlessall those keys fall on thesame server Easy to Scale (consistent hashing)
  • Works like a champ
  • Final Thoughts
  • Use Only if you have to!
  • Index the minimum tokeep memory footprint down use rolling indexes, don’t keep more shit in memory than you need. Users won’t page through 20 pages of results, so don’t store that many
  • Plan for disaster andconsistency checking
  • Finally...
  • Look at my circle, bitches!
  • Lesson:Never trust a guy in a suitnot pull a fast one on you
  • Thanks! Paul Dixpaul@pauldix.net @pauldixhttp://pauldix.net