A High-Level PassThrough Redis Analytics*by Josiah Carlson www.dr-josiah.com@dr_josiah bit.ly/redis-in-action
Agenda● Quick overview of Redis● Monthly unique return/churn○ too much memory method○ reasonable memory method○ very low m...
Quick Redis overview● Remote key -> data structure server○ Strings/integers/bitmaps○ Lists of strings○ Sets of unique stri...
Monthly unique return/churnProblem:● Say that you have millions of monthly visitors● Need to know monthly churn, expected~...
Monthly unique return/churnToo much memory:● Generate UUIDs for users, store in cookie● Use a HASH mapping from UUIDs to i...
Monthly unique return/churnDrawbacks:● Memory use based on size of HASHes andZSET (about to 400 bytes/unique user)● Second...
Monthly unique return/churnReasonable memory solution:● Store per-month id in a signed cookie (lower-32 is theunique id fo...
Monthly unique return/churnDrawbacks:● Memory use based on unique monthlycounts, ~1 bit per user (not bad)● If you push to...
Monthly unique return/churnVery low memory method:● Store per-month id in a signed cookie● If this month cookie, do nothin...
Monthly unique return/churnDrawback:● If someone sends you duplicate cookies,hard to detect (keep "recently replaced"cache...
Tangent on ZSETsThis slide is a filler so that I can talk about oneof my favorite "get rid of ZSETs" tricks, whichresults ...
Visitor action sequencesProblem:● How are my funnels performing?● These suck:
Visitor action sequencesSequence method:● Each user gets a LIST● All users are recorded in a ZSET with a score based ontim...
Visitor action sequencesLow memory method:● Each user gets a bitmap (limit your unique events)● All actions are mapped to ...
Geo NotificationsProblem:● Want to send events to nearby users● Dont want users to be notified too often● Reduce radius of...
Geo Notifications● Consider the world as a recursively-divided series ofblocks (highest level as 1x1 degree)● Clients subs...
Geo NotificationsDrawbacks:● Event id/timestamp information is duplicated● Large histories may use significant memory(ZSET...
Other questions?
Thank you@dr_josiah www.dr-josiah.combit.ly/redis-in-action
Upcoming SlideShare
Loading in …5
×

Josiah carlson 2013-05-16 - redis analytics

1,222 views

Published on

These are the slides for the talk I presented at the LA Web Speed meetup hosted by Yahoo on May 17, 2013 - http://www.meetup.com/LAWebSpeed/events/115663212/

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,222
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Josiah carlson 2013-05-16 - redis analytics

  1. 1. A High-Level PassThrough Redis Analytics*by Josiah Carlson www.dr-josiah.com@dr_josiah bit.ly/redis-in-action
  2. 2. Agenda● Quick overview of Redis● Monthly unique return/churn○ too much memory method○ reasonable memory method○ very low memory method● Visitor action sequence analytics○ sequence method○ low-memory method● Geographic notifications with partitioning*
  3. 3. Quick Redis overview● Remote key -> data structure server○ Strings/integers/bitmaps○ Lists of strings○ Sets of unique string members○ Hashes of key -> value○ Sorted sets (ZSETs) mapping of member -> score● Supports○ Persistence○ Replication○ Publish/subscribe○ Server-side Lua scripting (like a stored procedure)○ Client-side sharding (server side in-progress)
  4. 4. Monthly unique return/churnProblem:● Say that you have millions of monthly visitors● Need to know monthly churn, expected~50%● Dont want to waste too much memory
  5. 5. Monthly unique return/churnToo much memory:● Generate UUIDs for users, store in cookie● Use a HASH mapping from UUIDs to int ids● Use a HASH mapping from int ids to UUIDs● Create a ZSET of short ids to timestamp● Use per-month bitmaps for churn calculation● Recycle int ids based on old timestamps,discarding UUIDs and resetting bits
  6. 6. Monthly unique return/churnDrawbacks:● Memory use based on size of HASHes andZSET (about to 400 bytes/unique user)● Second HASH can be thrown away● The other HASH, ZSET, and bitmaps can bethrown away and replaced by a "this month"and "last month" SET (about 120 bytes/user)● With 63 bit integer UUID and shardingtechniques, about 16 bytes/user
  7. 7. Monthly unique return/churnReasonable memory solution:● Store per-month id in a signed cookie (lower-32 is theunique id for the month, next 8 is the month)● One month of bitmap● If this month cookie, do nothing● If last month cookie and bit isnt set for that id, mark thebitmap, generate a new cookie, increment unique andreturning counts● If last month cookie and bit is set, generate a newcookie● If old cookie or no cookie, generate a new cookie,increment unique count
  8. 8. Monthly unique return/churnDrawbacks:● Memory use based on unique monthlycounts, ~1 bit per user (not bad)● If you push to hundreds of millions/billions ofusers, you should shard your bitmaps tominimize realloc cost on bitmap updates
  9. 9. Monthly unique return/churnVery low memory method:● Store per-month id in a signed cookie● If this month cookie, do nothing● If last month cookie, generate a new cookiefor the client, increment unique and returncounts● If old cookie or no cookie, generate a newcookie, increment unique count
  10. 10. Monthly unique return/churnDrawback:● If someone sends you duplicate cookies,hard to detect (keep "recently replaced"cache, 5-10 minutes worth is likely goodenough)
  11. 11. Tangent on ZSETsThis slide is a filler so that I can talk about oneof my favorite "get rid of ZSETs" tricks, whichresults in significant memory savings for a fairlylarge subset of problems
  12. 12. Visitor action sequencesProblem:● How are my funnels performing?● These suck:
  13. 13. Visitor action sequencesSequence method:● Each user gets a LIST● All users are recorded in a ZSET with a score based ontime● Each action/page RPUSHes the action/page to the LIST● Clean-up/analyze old sequences based on timestampsin the ZSETDrawbacks:● Memory use can be high for active users● More detailed events can use more memory
  14. 14. Visitor action sequencesLow memory method:● Each user gets a bitmap (limit your unique events)● All actions are mapped to an index in the bitmap● When a user performs the action/visits the page, set thebit and update the ZSET● Clean up/analyze old bitmaps based on timestamps inthe ZSETDrawbacks:● No more strict sequence analysis possible● Memory use is dominated by ZSET storage
  15. 15. Geo NotificationsProblem:● Want to send events to nearby users● Dont want users to be notified too often● Reduce radius of results as notifications rise● Increase radius of results as notifications fall● Allow for history to be received on connect
  16. 16. Geo Notifications● Consider the world as a recursively-divided series ofblocks (highest level as 1x1 degree)● Clients subscribe to all block levels that their user is inor is interested in● When writing an event at point (lat,lon):○ Add the event id to ZSETs to as deep a partition as you would everexpect to need○ Trim the ZSETs along the way based on your desired history○ Check the resulting size of the ZSETs to determine the highest-levelblock that is under your limit○ Publish the event to a channel based on that level
  17. 17. Geo NotificationsDrawbacks:● Event id/timestamp information is duplicated● Large histories may use significant memory(ZSETs can be replaced by LISTs withminimal changes)● Old data in un-visited blocks arent cleanedout (can add expiration)
  18. 18. Other questions?
  19. 19. Thank you@dr_josiah www.dr-josiah.combit.ly/redis-in-action

×