Josiah carlson   2013-05-16 - redis analytics
Upcoming SlideShare
Loading in...5
×
 

Josiah carlson 2013-05-16 - redis analytics

on

  • 859 views

These are the slides for the talk I presented at the LA Web Speed meetup hosted by Yahoo on May 17, 2013 - http://www.meetup.com/LAWebSpeed/events/115663212/

These are the slides for the talk I presented at the LA Web Speed meetup hosted by Yahoo on May 17, 2013 - http://www.meetup.com/LAWebSpeed/events/115663212/

Statistics

Views

Total Views
859
Views on SlideShare
855
Embed Views
4

Actions

Likes
0
Downloads
11
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Josiah carlson   2013-05-16 - redis analytics Josiah carlson 2013-05-16 - redis analytics Presentation Transcript

    • A High-Level PassThrough Redis Analytics*by Josiah Carlson www.dr-josiah.com@dr_josiah bit.ly/redis-in-action
    • Agenda● Quick overview of Redis● Monthly unique return/churn○ too much memory method○ reasonable memory method○ very low memory method● Visitor action sequence analytics○ sequence method○ low-memory method● Geographic notifications with partitioning*
    • Quick Redis overview● Remote key -> data structure server○ Strings/integers/bitmaps○ Lists of strings○ Sets of unique string members○ Hashes of key -> value○ Sorted sets (ZSETs) mapping of member -> score● Supports○ Persistence○ Replication○ Publish/subscribe○ Server-side Lua scripting (like a stored procedure)○ Client-side sharding (server side in-progress)
    • Monthly unique return/churnProblem:● Say that you have millions of monthly visitors● Need to know monthly churn, expected~50%● Dont want to waste too much memory
    • Monthly unique return/churnToo much memory:● Generate UUIDs for users, store in cookie● Use a HASH mapping from UUIDs to int ids● Use a HASH mapping from int ids to UUIDs● Create a ZSET of short ids to timestamp● Use per-month bitmaps for churn calculation● Recycle int ids based on old timestamps,discarding UUIDs and resetting bits
    • Monthly unique return/churnDrawbacks:● Memory use based on size of HASHes andZSET (about to 400 bytes/unique user)● Second HASH can be thrown away● The other HASH, ZSET, and bitmaps can bethrown away and replaced by a "this month"and "last month" SET (about 120 bytes/user)● With 63 bit integer UUID and shardingtechniques, about 16 bytes/user
    • Monthly unique return/churnReasonable memory solution:● Store per-month id in a signed cookie (lower-32 is theunique id for the month, next 8 is the month)● One month of bitmap● If this month cookie, do nothing● If last month cookie and bit isnt set for that id, mark thebitmap, generate a new cookie, increment unique andreturning counts● If last month cookie and bit is set, generate a newcookie● If old cookie or no cookie, generate a new cookie,increment unique count
    • Monthly unique return/churnDrawbacks:● Memory use based on unique monthlycounts, ~1 bit per user (not bad)● If you push to hundreds of millions/billions ofusers, you should shard your bitmaps tominimize realloc cost on bitmap updates
    • Monthly unique return/churnVery low memory method:● Store per-month id in a signed cookie● If this month cookie, do nothing● If last month cookie, generate a new cookiefor the client, increment unique and returncounts● If old cookie or no cookie, generate a newcookie, increment unique count
    • Monthly unique return/churnDrawback:● If someone sends you duplicate cookies,hard to detect (keep "recently replaced"cache, 5-10 minutes worth is likely goodenough)
    • Tangent on ZSETsThis slide is a filler so that I can talk about oneof my favorite "get rid of ZSETs" tricks, whichresults in significant memory savings for a fairlylarge subset of problems
    • Visitor action sequencesProblem:● How are my funnels performing?● These suck:
    • Visitor action sequencesSequence method:● Each user gets a LIST● All users are recorded in a ZSET with a score based ontime● Each action/page RPUSHes the action/page to the LIST● Clean-up/analyze old sequences based on timestampsin the ZSETDrawbacks:● Memory use can be high for active users● More detailed events can use more memory
    • Visitor action sequencesLow memory method:● Each user gets a bitmap (limit your unique events)● All actions are mapped to an index in the bitmap● When a user performs the action/visits the page, set thebit and update the ZSET● Clean up/analyze old bitmaps based on timestamps inthe ZSETDrawbacks:● No more strict sequence analysis possible● Memory use is dominated by ZSET storage
    • Geo NotificationsProblem:● Want to send events to nearby users● Dont want users to be notified too often● Reduce radius of results as notifications rise● Increase radius of results as notifications fall● Allow for history to be received on connect
    • Geo Notifications● Consider the world as a recursively-divided series ofblocks (highest level as 1x1 degree)● Clients subscribe to all block levels that their user is inor is interested in● When writing an event at point (lat,lon):○ Add the event id to ZSETs to as deep a partition as you would everexpect to need○ Trim the ZSETs along the way based on your desired history○ Check the resulting size of the ZSETs to determine the highest-levelblock that is under your limit○ Publish the event to a channel based on that level
    • Geo NotificationsDrawbacks:● Event id/timestamp information is duplicated● Large histories may use significant memory(ZSETs can be replaced by LISTs withminimal changes)● Old data in un-visited blocks arent cleanedout (can add expiration)
    • Other questions?
    • Thank you@dr_josiah www.dr-josiah.combit.ly/redis-in-action