RIDING REDIS @
60M REGISTERED USERS
200M MONTHLY UNIQUE USERS
450M PAGEVIEWS A DAY
600 QUESTIONS / SECOND
20K REQUESTS / SECOND
50K LIKES / MINUTE
250K REGISTRATIONS / DAY
~600 SERVERS
~300 RUBY NODES
~50 JAVA NODES
~35 REDIS NODES
~90 MySQL NODES
COMPARISON
13B PAGEVIEWS 200M UNIQUE USERS 32M QUESTIONS ANSWERED
13B PAGEVIEWS*
300M UNIQUE USERS**
76.9M POSTS CREATED*
130B PAGEVIEWS***
N/A UNIQUE USERS 340M TWEETS CREATED***
_________________________________________________
* http://www.tumblr.com/press
** http://yahoo.tumblr.com/post/50902111638/tumblr-yahoo
***http://en.wikipedia.org/wiki/Twitter
REDIS
Open-source
Key-value database
In-memory
Data-structure server
REDIS
Salvatore Sanfilippo
@antirez
REDIS AUTHOR
TWITTER
INSTAGRAM
STACK-OVERFLOW
BLIZZARD
WHO IS USING REDIS?
6 SCENARIOS
6 REDIS DATA-STRUCTURES
COUNTERS STRING
MESSAGES QUEUE LIST
USER ACTIVITY MATRIX HASH
FUNCTIONAL SWITCHES BIT OPERATIONS
THE WALL SORTED SET
REAL-TIME MONITORING PUB / SUB
Scenario 1 : Counters
GLOBAL COUNTERS
ACTIVE USERS
QUESTIONS ASKED TODAY
LIKES TOTALLY
REGISTRATIONS THIS MONTH
SELECT count(*) FROM users
WHERE state='active';
NAIVE SOLUTION
REDIS STRING
MOST BASIC REDIS TYPE
BINARY SAFE
UP TO 512 Mb
OPERATIONS
INCR, DECR
@redis.incr "users/active/total"
...
@redis.decr "users/active/total"
users / active / total 61 771 480
users / inactive / total 6 943 366
EXAMPLE
USERS COUNTERS
@redis.incr "user/#{@user.id}/answers"
..
@redis.decr "user/#{@user.id}/likes"
..
@redis.incr "user/#{@user.id}/gifts"
user / :user_id/ answers 57
user / :user_id / likes 13 905
user / :user_id / gifts 27
EXAMPLE #2: USERS COUNTERS
COUNTERS CONSISTENCY
CONSISTENCY = 1 / PERFORMANCE
@redis.expire "user/#{@user.id}/inbox_questions", 1.day
SOLVED MOST CASES FOR US
REDIS TTL
SCALABILITY
@redis_shard_map = {
0 => Redis.new(:host=>"redis_host_0"),
1 => Redis.new(:host=>"redis_host_1"),
...,
7 => Redis.new(:host=>"redis_host_7")
}
@redis = @redis_shard_map[@user.id % 8]
@redis.incr "user/#{@user.id}/answers"
SCALABILITY (2)
Scenario 2 : Messages Queue
THIRD PARTY INTERACTION
RUBY BLOCKING MODEL
Resque to the rescue
https://github.com/resque/resque
REDIS-BACKED LIBRARY FOR
● CREATING BACKGROUND JOBS
● PLACING THOSE JOBS ON MULTIPLE QUEUES
● AND PROCESSING THEM LATER
RESQUE
REDIS LIST
SIMPLY LISTS OF STRINGS
MAX LENGTH 232
- 1
CONSTANT TIME INSERTION
OPERATIONS:
LPUSH, LPOP, RPUSH, RPOP
LREM, LRANGE
WORKFLOW
Why Resque?
Resque.enqueue(PostToFacebookJob, "Hello world")
ADDING A NEW JOB
@redis.lpush 'queue:facebook',
'{
"class":"PostToFacebookJob",
"args":["Hello world"]
}'
O(1)
COMPLEXITY
@redis.rpop "queue:facebook"
O(1)
GETTING A JOB FROM QUEUE
Scenario 3 :
User activity matrix
ROBOTS
CRAWLERS
SPAM
TIME-BASED RESTRICTIONS
POST
ASK A QUESTION
LOGIN
LIKE
GET
PROFILE VIEW
USER ACTIONS
LIMIT ACTIVITY
PER MINUTE
PER HOUR
PER DAY
REQUIREMENTS
REDIS HASH
MAPS BETWEEN
STRING FIELDS AND VALUES
MAX PAIRS 232
- 1
OPERATIONS:
HGET, HGETALL
HSET, HDEL
LIKES QUESTIONS REQUESTS
user/:uid/date/:today/minute/:minute 12 3 27
user/:uid/date/:today/hour/:hour 34 15 113
user/:uid/date/:today/day/:day 158 22 529
STRUCTURE
def register_users_like!(user_id)
minute = @redis.hincrby minute_key, 'like', 1
hour = @redis.hincrby hour_key, 'like', 1
day = @redis.hincrby day_key, 'like', 1
end
EXAMPLE: REGISTER ACTIVITY
def allowed_to_like_questions?(user_id)
minute = @redis.hget minute_key, "likes"
hour = @redis.hget hour_key, "likes"
day = @redis.hget day_key, "likes"
return per_minute < LIKES_PER_MINUTE_THRESHOLD &&
per_hour < LIKES_PER_HOUR_THRESHOLD &&
per_day < LIKES_PER_DATE_THRESHOLD
end
EXAMPLE: ALLOWED?
@redis.expire per_minute_key, 1.minute
@redis.expire per_hour_key, 1.hour
@redis.expire per_day_key, 1.day
CLEANUP
SCALABILITY
SCALE BY
USER_ID
DATE
PHASE OF THE MOON
CONSISTENT HASHING
Scenario 4 :
Functional switches
TURN ON/OFF ANY FEATURE ON SITE
REQUIREMENTS
PHOTO ANSWERS
VIDEO ANSWERS
WALL
STREAM
DATABASE SHARDS
SET POSTS PER PAGE
FUNCTIONALITY
BIT OPERATIONS
BIT OPERATIONS
SETBIT, GETBIT
BITCOUNT
@redis.setbit common_settings_key, WALL_ENABLED, true
EXAMPLE
MASTER / SLAVE REPLICATION
SCALABILITY
Scenario 5 : The Wall
THE WALL
THE WALL
SHOW FRIENDS POSTS
INITIAL REQUIREMENT
SELECT * FROM questions q
LEFT JOIN followships f ON (q.user_id = f.friend_id)
WHERE f.user_id = :my_user_id
ORDER BY q.answered_at
LIMIT 0,25
SOLUTION
LATER REQUIREMENTS
LIKES INTRODUCED
SHOW RETWEETS
UNIQUENESS OF ANSWERS
ORDERED BY FIRST OCCURRENCE
PAGINATION NEEDED
DO NOT SHOW OWN POSTS
SHOW RETWEETS SINCE STARTED FOLLOWING A
FRIENDS
MORE REQUIREMENTS
DO NOT SHOW RETWEETS IF ANSWERER OR
RETWEETER IS DISABLED
SHOW LATEST FRIENDS WHO LIKED A QUESTION
OUR SOLUTION
IDEA
STORE SEPARATE SET OF QUESTIONS FOR EVERY USER
NON REPEATING COLLECTIONS OF
STRINGS
EACH MEMBER ASSOCIATED WITH
SCORE
QUICKLY ACCESS
ELEMENTS IN ORDER
FAST EXISTENCE TEST
FAST ACCESS TO ELEMENTS IN THE
MIDDLE
REDIS SORTED SET
user/:user_id/wall
score_1 score_2 ... score_N
question_id_1 question_id_2 ... question_id_N
score - timestamp, when the question_id first occurred in a set
STRUCTURE
● GET USERS WALL
○ ZREVRANGEBYSCORE - O(log(N)+M)
● USER ANSWERED A QUESTION
○ ZADD - O(log(N))
● LIKE
○ ZRANK - O(log(N))
○ ZADD - O(log(N))
● REMOVE ANSWER
○ ZREM - O(M*log(N)
OPERATIONS
GUARANTEED 1000-1500 POSTS ON WALL
PERIODICALLY CALL
ZCARD
ZREMRANGEBYRANK
CLEANUP
USER_ID
SHARDING
Scenario 6 :
Real time monitoring
PATTERN DETECTION
REQUIREMENT
HUMAN vs MACHINE
MySQL TABLE PULL
INITIAL SOLUTION
PUBLISH / SUBSCRIBE MESSAGING
PARADIGM
ALLOW PATTERN-MATCHING
SUBSCRIPTIONS
OPERATIONS:
SUBSCRIBE, UNSUBSCRIBE
PUBLISH
REDIS PUB/SUB
SCHEMA
MODERATORS PANEL
Time complexity: O(N+M) where N is the number of clients
subscribed to the receiving channel and M is the total
number of subscribed patterns (by any client).
SCALING
So, why Redis?
SIMPLE
FAST
FLEXIBLE
ROBUST
FREE
WHY REDIS?
CLUSTERING
WHAT'S MISSING
Not covered?
SETS
LUA SCRIPTING
TRANSACTIONS
PIPELINED
NOT COVERED
JAVA
Questions

Riding Redis @ask.fm