Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Indexing thousands of writes per second with redis

20,482 views

Published on

My talk from RailsConf 2011. Indexing thousands of writes per second with Redis.

  • Brent Forsman is paralyzed from a hunting accident. He now uses the Demolisher system to generate income. He says: "I believe there is no way this system can fail!"  http://t.cn/A6zP2GDT
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Jeb Andrews, PhD, CEO of Clinical Trials of America, sent me this touching handwritten letter after he won over $5,000 betting conservatively using my "Demolisher" Baseball Betting System: ■■■ http://t.cn/A6zP2GDT
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Read this heartfelt letter below from Sonasi Samita, a disease-ridden man stricken with kidney failure, diabetes, gout, heart problems, and blindness. He tells his tear-jerking story on how the Demolisher system has totally changed his life! Sonasi says he's convinced that the Demolisher system is God's answer to his prayers! ♥♥♥ http://t.cn/A6zP2wH9
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • There is a REAL system that is helping thousands of people, just like you, earn REAL money right from the comfort of their own homes. The entire system is made up with PROVEN ways for regular people just like you to get started making money online... the RIGHT way... the REAL way. ◆◆◆ http://t.cn/AisJWYf4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ♥♥♥ http://bit.ly/2u6xbL5 ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Indexing thousands of writes per second with redis

  1. Indexing Thousands of Writes per Second with Redis Paul Dix paul@pauldix.net @pauldix http://pauldix.net
  2. I’m Paul Dix
  3. I wrote this book
  4. Benchmark Solutions* who I work for* we’re hiring, duh! email: paul@benchmarksolutions.com
  5. had a spiel about thesuit..Before we get to the talk...
  6. That bastard stole my thunder!
  7. You don’t think of suit wearing badassSeñor Software Engineer
  8. I work
  9. Finance
  10. the janitors, cleaning staff, and the 18 year old intern get this title too...Vice President
  11. How could I wear anything but a suit?Finance + VP + Suit = douchebag
  12. Distractionhttp://www.flickr.com/photos/33562486@N07/4288275204/
  13. Bethttp://www.flickr.com/photos/11448492@N07/2825781502/
  14. Barhttp://www.flickr.com/photos/11448492@N07/2825781502/
  15. @flavorjones coauthor of Nokogiricredit: @ebiltwin
  16. JSON vs. XML
  17. XML Sucks Hard
  18. JSON is teh awesome
  19. XML parsing S-L-O-W
  20. 10x slower
  21. Mike called BS
  22. A bet!
  23. and I was like: “sure, for a beer”
  24. and Mike was all like:“ok, but that’s lame”
  25. “let’s make itinteresting. Loser wears my daughter’s fairy wings during your talk”
  26. Sure, that’ll be funny and original...
  27. Dr. Nic in fairy wings
  28. That bastard stole my thunder!
  29. Nic may have done it as part of the talk, but he didn’t lose a bet... put wings on in red-faced shame.So who won?
  30. credit: @jonathanpberger
  31. Nokogiri ~ 6.8x slower
  32. REXML(ActiveRecord.from_xml) ~ 400x slower
  33. Lesson:Always use JSON
  34. Lesson:Don’t make bar bets
  35. However, the bet saidnothing about my slides
  36. Aaron Patterson father of nokogiri 3 slides with @tenderlove’s picture? wtf?!!
  37. Called Mike:“Nokogiri’s mother”
  38. Fairy Godmother?
  39. Lesson: Learn Photoshop(this shit is embarrassing)
  40. Anyway, the point of the suit...
  41. take me seriously,dammit!
  42. On to the actual talk...
  43. it’s about...
  44. Redis
  45. Sustained write load of ~ 5k per second
  46. Redis + other datastores = bad assery
  47. @flavorjones and maybe about Mike being some kind of virgin mothercredit: @ebiltwin
  48. Lesson: Be specific about the terms of a bet(because at least someone can use Photoshop)
  49. Who’s used Redis?
  50. NoSQL
  51. Key/Value Store
  52. Created bySalvatore Sanfilippo @antirez
  53. Data structure store
  54. Basics
  55. Keysrequire redisredis = Redis.newredis.set("mike", "grants wishes") # => OKredis.get("mike") # => "grants wishes"
  56. Countersredis.incr("fairy_references") # => 1redis.decr("dignity") # => -1redis.incrby("fairy_references", 23) # => 24redis.decrby("dignity", 56) # => 57
  57. Expirationredis.expire("mike", 120)redis.expireat("mike", 1.day.from_now.midnight)
  58. Hashesredis.hset("paul", "has_wings", true)redis.hget("paul", "has_wings") # => "true"redis.hmset("paul", :location, "Baltimore", :twitter, "@pauldix")redis.hvals("paul") # => {# "has_wings"=>"true",# "location"=>"Baltimore",# "twitter"=>"@pauldix" }redis.hlen("paul") # => 3
  59. Listsredis.lpush("events", "first") # => 1redis.lpush("events", "second") # => 2redis.lrange("events", 0, -1) # => ["second", "first"]redis.rpush("events", "third") # => 3redis.lrange("events", 0, -1) # => ["second", "first", "third"]redis.lpop("events") # => "second"redis.lrange("events", 0, -1) # => ["first", "third"]redis.rpoplpush("events", "fourth") # => "third"
  60. Setsredis.sadd("user_ids", 1) # => trueredis.scard("user_ids") # => 1redis.smembers("user_ids") # => ["1"]redis.sismember(1) # => trueredis.srem("user_ids", 1) # => true
  61. Sets Continued# know_paul ["1", "3", "4"]# know_mike ["3", "5"]redis.sinter("know_paul", "know_mike") # =>["3"]redis.sdiff("know_paul", "know_mike") # =>["1", "4"]redis.sdiff("know_mike", "know_paul") # =>["5"]redis.sunion("know_paul", "know_mike") # =>["1", "3", "4", "5"]
  62. Sorted Setsredis.zadd("wish_counts", 2, "paul") # => trueredis.zcard("wish_counts") # => 1redis.zismember("paul") # => trueredis.zrem("wish_counts", "paul") # => true
  63. Sorted Sets Continuedredis.zadd("wish_counts", 12, "rubyland")redis.zrange("wish_counts", 0, -1) # => ["paul", "rubyland"]redis.zrange("wish_counts", 0, -1, :with_scores => true) # => ["paul", "2", "rubyland", "12"]redis.zrevrange("wish_counts", 0, -1) # => ["rubyland", "paul"]
  64. Sorted Sets Continuedredis.zrevrangebyscore("wish_counts", "+inf", "-inf") # => ["rubyland", "paul"]redis.zrevrangebyscore("wish_counts", "+inf", "10") # => ["rubyland"]redis.zrevrangebyscore("wish_counts", "+inf", "-inf", :limit => [0, 1]) # => ["rubyland"]
  65. Lesson:Keeping examples consistent with a stupid story is hard
  66. pubsub, transactions,more commadnds.not covered here, leave mealone There’s more
  67. Crazy Fast
  68. Faster than a greased cheetah
  69. or a Delorean with 1.21 gigawatts
  70. OMG Scaling Sprinkles!
  71. No Wishes Grantedf-you, f-ball!
  72. Lesson:Getting someone to pose is easier (also, learn Photoshop)
  73. Still monolithicnot horizontallyscalable, oh noes!
  74. Can shard in client like memcachedI know haters, you cando this
  75. Still not highly available
  76. Still susceptible to partitions
  77. However, it’s wicked cool
  78. Why Index with Redis?
  79. Don’tyou probably don’t needit http://www.flickr.com/photos/34353483@N00/205467442/
  80. and you’re all like,“Paul, ...” But I have to SCALE!
  81. No you don’t
  82. Trust me, I’m wearing a suit
  83. that means I haveauthority and... I know shit
  84. and still you cry: But no, really...
  85. Sad SQL is Sadthousands ofwrites persecond? No megusto!
  86. ok, fine.
  87. My Use Cases
  88. 40k unique things
  89. Updating every 10 seconds
  90. Plus other updates...
  91. Average write load of 3k-5k writes per second
  92. LVCredis.hset("bonds|1", "bid_price", 96.01)redis.hset("bonds|1", "ask_price", 97.53)redis.hset("bonds|2", "bid_price", 90.50)redis.hset("bonds|2", "ask_price", 92.25)redis.sadd("bond_ids", 1)redis.sadd("bond_ids", 2)
  93. Index on the fly
  94. SORTredis.sort("bond_ids", :by => "bonds|*->bid_price") # => ["2", "1"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :get => "bonds|*->bid_price") # =>["90.5", "96.01"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :get => ["bonds|*->bid_price", "#"]) # =>["90.5", "2", "96.01", "1"]
  95. SORT Continuedredis.sort("bond_ids", :by => "bonds|*->bid_price", :limit => [0, 1]) # => ["2"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :order => "desc") # => ["1", "2"]redis.sort("bond_ids", :by => "bonds|*->ask_price") # => ["1", "2"]redis.sort("bond_ids", :by => "bonds|*->ask_price", :store => "bond_ids_sorted_by_ask_price", :expire => 300) # => 2
  96. Getting Recordsids = redis_sort_results.map {|id| id.to_i}bonds = Bond.find(ids) note that prices (high write volume data) comebond_ids_to_bond = {} from elsewhere (not the SQL db)bonds.each do |bond| bond_ids_to_bond[bond.id] = bondendresults = ids.map do |id| bond_ids_to_bond[id]end
  97. Getting From Redisredis.hset("bonds|2", "values", data.to_json)raw_json = redis.sort("bond_ids", However, then you have to worry about keeping the t wo data stores in :get => "bonds|*->bid_price", sync. We’ll talk about it later :get => "bonds|*->values")results = raw_json.map do |json| DataObject.new(JSON.parse(json))end
  98. Pre-Indexing
  99. Rolling Index
  100. Last n Events
  101. Activity Log
  102. News Feed
  103. Use a List O(1) constant time complexity to add O(start + n) for readingN = 500size = redis.lpush("bond_trades|1", trade_id)# roll the indexredis.rpop("bond_trades|1") if size > N# get resultsredis.lrange("bond_trades|1", 0, 49)
  104. Indexing Events Since Time T
  105. Using a Listredis.lpush("bond_trades|1|2011-05-19-10", trade_id)redis.lrange("bond_trades|1|2011-05-19-10", 0, -1)results = redis.pipelined do redis.lrange("bond_trades|1|2011-05-19-10", 0, -1) redis.lrange("bond_trades|1|2011-05-19-09", 0, -1)end.flatten
  106. Rolling the Index# when something tradesredis.sadd("bonds_traded|2011-05-19-10", bond_id)# cron task to remove old datatraded_ids = redis.smembers( "bonds_traded|2011-05-19-10")keys = traded_ids.map do |id| "bond_trades|#{id}|2011-05-19-10"endkeys << "bonds_traded|2011-05-19-10"redis.del(*keys)
  107. Using a Sorted Set# Time based Rolling Index using sorted setredis.zadd("bond_trades|1", O(log(n)) writes Time.now.to_i, trade_id) O(log(n) + M) reads# last 20 tradesredis.zrevrange("bond_trades|1", 0, 20)# trades in the last hourredis.zrevrangebyscore("bond_trades|1", "+inf", 1.hour.ago.to_i)
  108. Rolling the Index# cron task to roll the indexbond_ids = redis.smembers("bond_ids")remove_since_time = 24.hours.ago.to_iredis.pipelined do bond_ids.each do |id| redis.zremrangebyscore( "bond_trades|#{id}", "-inf", remove_since_time) endend
  109. Or roll on read or writeredis.zadd("bond_trades|1", Time.now.to_i, trade_id)redis.zremrangebyscore("bond_trades|1", "-inf", 24.hours.ago)
  110. Indexing N Valuesredis.zadd("highest_follower_counts", 2300, 20)redis.zadd("lowest_follower_counts", 2300, 20)# rolling the indexes# keep the lowestsize = redis.zcard("lowest_follower_counts")redis.zremrangebyrank("lowest_follower_counts", N, -1) if size > N# keep the highestsize = redis.zcard("highest_follower_counts")redis.zremrangebyrank("highest_follower_counts", 0, size - N) if size > N
  111. rolling requires moreroundtrips 2 roundtrips (only with complex pipelining)
  112. Roll indexes with only one trip
  113. Tweet to @antirez that you want scripting
  114. Keeping it Consistent
  115. create/update/destroy
  116. database transactions can’t help you here, you’ll have to put them into your application logicNo transactions,application logic
  117. Disaster Recovery
  118. Two failure scenarios
  119. Web app dies
  120. Redis server dies
  121. Could result in index inconsistency
  122. Simple recovery script
  123. Write Index Timesredis.set("last_bond_trade_indexed", trade.created_at.to_i)
  124. Restore Each Indextime_int = redis.get("last_bond_trade_indexed").to_iindex_time = Time.at(time_int)trades = Trade.where( "created_at > :index_time AND created_at <= :now", {:index_time => index_time, :now => Time.now})trades.each do |trade| list you have to run while not writing new data. trade.index_in_redis Set can be made to runend while writing new data
  125. Our scale
  126. Single Process
  127. Sets don’t work withintersection, union, ordiff.SORT won’t work unlessall those keys fall on thesame server Easy to Scale (consistent hashing)
  128. Works like a champ
  129. Final Thoughts
  130. Use Only if you have to!
  131. Index the minimum tokeep memory footprint down use rolling indexes, don’t keep more shit in memory than you need. Users won’t page through 20 pages of results, so don’t store that many
  132. Plan for disaster andconsistency checking
  133. Finally...
  134. Look at my circle, bitches!
  135. Lesson:Never trust a guy in a suitnot pull a fast one on you
  136. Thanks! Paul Dixpaul@pauldix.net @pauldixhttp://pauldix.net

×