Indexing thousands of writes per second with redis

18,069 views
16,817 views

Published on

My talk from RailsConf 2011. Indexing thousands of writes per second with Redis.

0 Comments
25 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
18,069
On SlideShare
0
From Embeds
0
Number of Embeds
239
Actions
Shares
0
Downloads
328
Comments
0
Likes
25
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Indexing thousands of writes per second with redis

    1. Indexing Thousands of Writes per Second with Redis Paul Dix paul@pauldix.net @pauldix http://pauldix.net
    2. I’m Paul Dix
    3. I wrote this book
    4. Benchmark Solutions* who I work for* we’re hiring, duh! email: paul@benchmarksolutions.com
    5. had a spiel about thesuit..Before we get to the talk...
    6. That bastard stole my thunder!
    7. You don’t think of suit wearing badassSeñor Software Engineer
    8. I work
    9. Finance
    10. the janitors, cleaning staff, and the 18 year old intern get this title too...Vice President
    11. How could I wear anything but a suit?Finance + VP + Suit = douchebag
    12. Distractionhttp://www.flickr.com/photos/33562486@N07/4288275204/
    13. Bethttp://www.flickr.com/photos/11448492@N07/2825781502/
    14. Barhttp://www.flickr.com/photos/11448492@N07/2825781502/
    15. @flavorjones coauthor of Nokogiricredit: @ebiltwin
    16. JSON vs. XML
    17. XML Sucks Hard
    18. JSON is teh awesome
    19. XML parsing S-L-O-W
    20. 10x slower
    21. Mike called BS
    22. A bet!
    23. and I was like: “sure, for a beer”
    24. and Mike was all like:“ok, but that’s lame”
    25. “let’s make itinteresting. Loser wears my daughter’s fairy wings during your talk”
    26. Sure, that’ll be funny and original...
    27. Dr. Nic in fairy wings
    28. That bastard stole my thunder!
    29. Nic may have done it as part of the talk, but he didn’t lose a bet... put wings on in red-faced shame.So who won?
    30. credit: @jonathanpberger
    31. Nokogiri ~ 6.8x slower
    32. REXML(ActiveRecord.from_xml) ~ 400x slower
    33. Lesson:Always use JSON
    34. Lesson:Don’t make bar bets
    35. However, the bet saidnothing about my slides
    36. Aaron Patterson father of nokogiri 3 slides with @tenderlove’s picture? wtf?!!
    37. Called Mike:“Nokogiri’s mother”
    38. Fairy Godmother?
    39. Lesson: Learn Photoshop(this shit is embarrassing)
    40. Anyway, the point of the suit...
    41. take me seriously,dammit!
    42. On to the actual talk...
    43. it’s about...
    44. Redis
    45. Sustained write load of ~ 5k per second
    46. Redis + other datastores = bad assery
    47. @flavorjones and maybe about Mike being some kind of virgin mothercredit: @ebiltwin
    48. Lesson: Be specific about the terms of a bet(because at least someone can use Photoshop)
    49. Who’s used Redis?
    50. NoSQL
    51. Key/Value Store
    52. Created bySalvatore Sanfilippo @antirez
    53. Data structure store
    54. Basics
    55. Keysrequire redisredis = Redis.newredis.set("mike", "grants wishes") # => OKredis.get("mike") # => "grants wishes"
    56. Countersredis.incr("fairy_references") # => 1redis.decr("dignity") # => -1redis.incrby("fairy_references", 23) # => 24redis.decrby("dignity", 56) # => 57
    57. Expirationredis.expire("mike", 120)redis.expireat("mike", 1.day.from_now.midnight)
    58. Hashesredis.hset("paul", "has_wings", true)redis.hget("paul", "has_wings") # => "true"redis.hmset("paul", :location, "Baltimore", :twitter, "@pauldix")redis.hvals("paul") # => {# "has_wings"=>"true",# "location"=>"Baltimore",# "twitter"=>"@pauldix" }redis.hlen("paul") # => 3
    59. Listsredis.lpush("events", "first") # => 1redis.lpush("events", "second") # => 2redis.lrange("events", 0, -1) # => ["second", "first"]redis.rpush("events", "third") # => 3redis.lrange("events", 0, -1) # => ["second", "first", "third"]redis.lpop("events") # => "second"redis.lrange("events", 0, -1) # => ["first", "third"]redis.rpoplpush("events", "fourth") # => "third"
    60. Setsredis.sadd("user_ids", 1) # => trueredis.scard("user_ids") # => 1redis.smembers("user_ids") # => ["1"]redis.sismember(1) # => trueredis.srem("user_ids", 1) # => true
    61. Sets Continued# know_paul ["1", "3", "4"]# know_mike ["3", "5"]redis.sinter("know_paul", "know_mike") # =>["3"]redis.sdiff("know_paul", "know_mike") # =>["1", "4"]redis.sdiff("know_mike", "know_paul") # =>["5"]redis.sunion("know_paul", "know_mike") # =>["1", "3", "4", "5"]
    62. Sorted Setsredis.zadd("wish_counts", 2, "paul") # => trueredis.zcard("wish_counts") # => 1redis.zismember("paul") # => trueredis.zrem("wish_counts", "paul") # => true
    63. Sorted Sets Continuedredis.zadd("wish_counts", 12, "rubyland")redis.zrange("wish_counts", 0, -1) # => ["paul", "rubyland"]redis.zrange("wish_counts", 0, -1, :with_scores => true) # => ["paul", "2", "rubyland", "12"]redis.zrevrange("wish_counts", 0, -1) # => ["rubyland", "paul"]
    64. Sorted Sets Continuedredis.zrevrangebyscore("wish_counts", "+inf", "-inf") # => ["rubyland", "paul"]redis.zrevrangebyscore("wish_counts", "+inf", "10") # => ["rubyland"]redis.zrevrangebyscore("wish_counts", "+inf", "-inf", :limit => [0, 1]) # => ["rubyland"]
    65. Lesson:Keeping examples consistent with a stupid story is hard
    66. pubsub, transactions,more commadnds.not covered here, leave mealone There’s more
    67. Crazy Fast
    68. Faster than a greased cheetah
    69. or a Delorean with 1.21 gigawatts
    70. OMG Scaling Sprinkles!
    71. No Wishes Grantedf-you, f-ball!
    72. Lesson:Getting someone to pose is easier (also, learn Photoshop)
    73. Still monolithicnot horizontallyscalable, oh noes!
    74. Can shard in client like memcachedI know haters, you cando this
    75. Still not highly available
    76. Still susceptible to partitions
    77. However, it’s wicked cool
    78. Why Index with Redis?
    79. Don’tyou probably don’t needit http://www.flickr.com/photos/34353483@N00/205467442/
    80. and you’re all like,“Paul, ...” But I have to SCALE!
    81. No you don’t
    82. Trust me, I’m wearing a suit
    83. that means I haveauthority and... I know shit
    84. and still you cry: But no, really...
    85. Sad SQL is Sadthousands ofwrites persecond? No megusto!
    86. ok, fine.
    87. My Use Cases
    88. 40k unique things
    89. Updating every 10 seconds
    90. Plus other updates...
    91. Average write load of 3k-5k writes per second
    92. LVCredis.hset("bonds|1", "bid_price", 96.01)redis.hset("bonds|1", "ask_price", 97.53)redis.hset("bonds|2", "bid_price", 90.50)redis.hset("bonds|2", "ask_price", 92.25)redis.sadd("bond_ids", 1)redis.sadd("bond_ids", 2)
    93. Index on the fly
    94. SORTredis.sort("bond_ids", :by => "bonds|*->bid_price") # => ["2", "1"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :get => "bonds|*->bid_price") # =>["90.5", "96.01"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :get => ["bonds|*->bid_price", "#"]) # =>["90.5", "2", "96.01", "1"]
    95. SORT Continuedredis.sort("bond_ids", :by => "bonds|*->bid_price", :limit => [0, 1]) # => ["2"]redis.sort("bond_ids", :by => "bonds|*->bid_price", :order => "desc") # => ["1", "2"]redis.sort("bond_ids", :by => "bonds|*->ask_price") # => ["1", "2"]redis.sort("bond_ids", :by => "bonds|*->ask_price", :store => "bond_ids_sorted_by_ask_price", :expire => 300) # => 2
    96. Getting Recordsids = redis_sort_results.map {|id| id.to_i}bonds = Bond.find(ids) note that prices (high write volume data) comebond_ids_to_bond = {} from elsewhere (not the SQL db)bonds.each do |bond| bond_ids_to_bond[bond.id] = bondendresults = ids.map do |id| bond_ids_to_bond[id]end
    97. Getting From Redisredis.hset("bonds|2", "values", data.to_json)raw_json = redis.sort("bond_ids", However, then you have to worry about keeping the t wo data stores in :get => "bonds|*->bid_price", sync. We’ll talk about it later :get => "bonds|*->values")results = raw_json.map do |json| DataObject.new(JSON.parse(json))end
    98. Pre-Indexing
    99. Rolling Index
    100. Last n Events
    101. Activity Log
    102. News Feed
    103. Use a List O(1) constant time complexity to add O(start + n) for readingN = 500size = redis.lpush("bond_trades|1", trade_id)# roll the indexredis.rpop("bond_trades|1") if size > N# get resultsredis.lrange("bond_trades|1", 0, 49)
    104. Indexing Events Since Time T
    105. Using a Listredis.lpush("bond_trades|1|2011-05-19-10", trade_id)redis.lrange("bond_trades|1|2011-05-19-10", 0, -1)results = redis.pipelined do redis.lrange("bond_trades|1|2011-05-19-10", 0, -1) redis.lrange("bond_trades|1|2011-05-19-09", 0, -1)end.flatten
    106. Rolling the Index# when something tradesredis.sadd("bonds_traded|2011-05-19-10", bond_id)# cron task to remove old datatraded_ids = redis.smembers( "bonds_traded|2011-05-19-10")keys = traded_ids.map do |id| "bond_trades|#{id}|2011-05-19-10"endkeys << "bonds_traded|2011-05-19-10"redis.del(*keys)
    107. Using a Sorted Set# Time based Rolling Index using sorted setredis.zadd("bond_trades|1", O(log(n)) writes Time.now.to_i, trade_id) O(log(n) + M) reads# last 20 tradesredis.zrevrange("bond_trades|1", 0, 20)# trades in the last hourredis.zrevrangebyscore("bond_trades|1", "+inf", 1.hour.ago.to_i)
    108. Rolling the Index# cron task to roll the indexbond_ids = redis.smembers("bond_ids")remove_since_time = 24.hours.ago.to_iredis.pipelined do bond_ids.each do |id| redis.zremrangebyscore( "bond_trades|#{id}", "-inf", remove_since_time) endend
    109. Or roll on read or writeredis.zadd("bond_trades|1", Time.now.to_i, trade_id)redis.zremrangebyscore("bond_trades|1", "-inf", 24.hours.ago)
    110. Indexing N Valuesredis.zadd("highest_follower_counts", 2300, 20)redis.zadd("lowest_follower_counts", 2300, 20)# rolling the indexes# keep the lowestsize = redis.zcard("lowest_follower_counts")redis.zremrangebyrank("lowest_follower_counts", N, -1) if size > N# keep the highestsize = redis.zcard("highest_follower_counts")redis.zremrangebyrank("highest_follower_counts", 0, size - N) if size > N
    111. rolling requires moreroundtrips 2 roundtrips (only with complex pipelining)
    112. Roll indexes with only one trip
    113. Tweet to @antirez that you want scripting
    114. Keeping it Consistent
    115. create/update/destroy
    116. database transactions can’t help you here, you’ll have to put them into your application logicNo transactions,application logic
    117. Disaster Recovery
    118. Two failure scenarios
    119. Web app dies
    120. Redis server dies
    121. Could result in index inconsistency
    122. Simple recovery script
    123. Write Index Timesredis.set("last_bond_trade_indexed", trade.created_at.to_i)
    124. Restore Each Indextime_int = redis.get("last_bond_trade_indexed").to_iindex_time = Time.at(time_int)trades = Trade.where( "created_at > :index_time AND created_at <= :now", {:index_time => index_time, :now => Time.now})trades.each do |trade| list you have to run while not writing new data. trade.index_in_redis Set can be made to runend while writing new data
    125. Our scale
    126. Single Process
    127. Sets don’t work withintersection, union, ordiff.SORT won’t work unlessall those keys fall on thesame server Easy to Scale (consistent hashing)
    128. Works like a champ
    129. Final Thoughts
    130. Use Only if you have to!
    131. Index the minimum tokeep memory footprint down use rolling indexes, don’t keep more shit in memory than you need. Users won’t page through 20 pages of results, so don’t store that many
    132. Plan for disaster andconsistency checking
    133. Finally...
    134. Look at my circle, bitches!
    135. Lesson:Never trust a guy in a suitnot pull a fast one on you
    136. Thanks! Paul Dixpaul@pauldix.net @pauldixhttp://pauldix.net

    ×