Redis & MongoDB
Stop Big Data Indigestion Before It Starts
@itamarhaber
@itamarhaber
A Redis Geek and Chief Developer Advocate
at
Have you signed for newsletter?
[1] http://bit.ly/RedisWatch
You probably haven't seen anything
like this before
VolumeMongoDB truly excels when
is comes to volume and
variety of data…
…but data coming in at
extreme velocity poses
a digestive challenge for
for any disk-based database
A talk about MongoDB performance
[2] WiredTiger iiBench Results
I'm hardly an
expert, but with
MongoDB v3
storage engines
and future work
this could very
well be a moot
point 
Data ingestion at high velocity
Mobile, online and IoT apps
produce more and more data
with every day that passes.
Simply storing the data as it
comes in doesn't cut it anymore – real time
processing is a must in order distill information
from the data as it rushes in.
A talk about more performance
By doing LESS
you can do MORE
(with MongoDB)
Put differently, "chew" your
data with Redis to prevent
data ingestion indigestion
● "...an [4] open source, BSD licensed,
advanced key-value cache and store"
● 5+2 data types, 160+ commands, entirely in
RAM, Lua scripts, PubSub...
● Nee circa 2009, by [5] antirez
(a.k.a Salvatore Sanfilippo)
● Sponsored by Pivotal
[3] Redis (REmote Dictionary Server)
OSS, humane, pure,
flexible, efficient,
scalable, highly
clusterable,
sexy, fresh,
is actively
ton of uses,
has a client in every
lean & small, supple,
track record, tiny,
and much moar...
...fun & easy, free
inspiring, simple,
innovative, robust,
available, cool,
portable, geeky,
mature, stable,
developed, has a
rich, dependable,
every language,
proven production
vibrant community,
Why use Redis
❤❤1.5M ops / sec
using a single
EC2 instance!
[6] Recorded webinar
Because it is
Getting started with Redis
• Try it online at [7] http://try.redis.io/
• Build it from the source
• [8] Download Redis Labs Enterprise Cluster
• Run it in a container
• [9] Connect to it from any language
git clone https://github.com/antirez/redis
cd redis
git checkout 3.0.1
make; make test; make install
docker run -d --name redis -p 6379:6379 redis
Use case A: Google Analytics
• A real time analytics platform provider
• Strongly focuses on users' behavior
• Primary data storage is MongoDB
• Activity is collected immediately or in bulks
• Raw data fed to Hadoop for offline crunching
• Real time metrics and initial information from
the stream is obtained with Redis
The tidal flow
Sessions events
Real time analysis
Offline analysis
Deep dive topic: sessionizing data
• Stream of events
• A session is a document
• Each has 10s-1000s events
• Events from different users
arrive in order but interleaved
• The result: many small updates
to each session's document
• Peak load: 11M ops/sec and growing
You say potato, I say potato
Hash data type:
HSET session:1
event:1 data
HSET session:1
event:2 data
...
HINCRBY session:1
seq 1
JSON:
{
session: 1,
events: [
{ id: 1,
data: data },
{ id: 2,
data: data },
...
Swallowing in Python
import redis
import pymongo
r = redis.Redis()
session = r.hgetall('session:1')
# {'event:1': 'data', 'event:2': 'data', 'seq': '2'}
...
m = pymongo.MongoClient()
db = m.rta
sessionid = db.sessions.insert_one(session)
Keeping track of sessions
• Sessions end after a logout or a timeout
• Logout events are trivial to detect
• Timeouts, e.g. 30 minutes of inactivity, are
trickier to manage considering there could
be 10,000s of active sessions
• This is where Redis' key expiry and
keyspace notifications come in very handy 
Once you see it, it can't be unseen
Using Redis as a buffer in front
of MongoDB for write-
intensive, hot Big Data is a
useful pattern that makes it
easy to get information in real
time as well as distribute the
load more efficiently.
Use case B: Waze
• An international navigation app/service
• Strongly focuses on public transit
• 10s of millions of users during peak hours
• Primary data storage is MongoDB
• Base data is created in advance
• Real time updates (traffic, vehicles and
passengers) pour into Redis for scheduling
adjustments and notifications
Use case C: Tinder
• A dating app/service
• Strongly focuses on spatially-related groups
• Primary data storage is MongoDB
• Data includes user profiles & preferences
• An influx of positional and preferential
("swipes") events is first munched by Redis
Use case D: Clash of Clans
• A massive real time game
• Strongly focuses on matched team play
• 1000s of teams with 100s of members
• Primary data storage is MongoDB
• Match progress is sieved through Redis for
real time resources status, leaderboards and
scoring
Use case E: Weather.com
• IoT startup
• Focuses on environmental monitoring
• Pilot: real time fire fighting
• Primary data storage is MongoDB
• Sensor data (temperature, humidity, …) is
aggregated in Redis, providing warnings and
alarms in real time
Questions from the audience
Questions or feedback? Contact me!
Itamar Haber
Chief Developer Advocate
📧 itamar@redislabs.com
@itamarhaber

Redis & MongoDB: Stop Big Data Indigestion Before It Starts

  • 1.
    Redis & MongoDB StopBig Data Indigestion Before It Starts @itamarhaber
  • 2.
    @itamarhaber A Redis Geekand Chief Developer Advocate at Have you signed for newsletter? [1] http://bit.ly/RedisWatch
  • 3.
    You probably haven'tseen anything like this before VolumeMongoDB truly excels when is comes to volume and variety of data… …but data coming in at extreme velocity poses a digestive challenge for for any disk-based database
  • 4.
    A talk aboutMongoDB performance [2] WiredTiger iiBench Results I'm hardly an expert, but with MongoDB v3 storage engines and future work this could very well be a moot point 
  • 5.
    Data ingestion athigh velocity Mobile, online and IoT apps produce more and more data with every day that passes. Simply storing the data as it comes in doesn't cut it anymore – real time processing is a must in order distill information from the data as it rushes in.
  • 6.
    A talk aboutmore performance By doing LESS you can do MORE (with MongoDB) Put differently, "chew" your data with Redis to prevent data ingestion indigestion
  • 7.
    ● "...an [4]open source, BSD licensed, advanced key-value cache and store" ● 5+2 data types, 160+ commands, entirely in RAM, Lua scripts, PubSub... ● Nee circa 2009, by [5] antirez (a.k.a Salvatore Sanfilippo) ● Sponsored by Pivotal [3] Redis (REmote Dictionary Server)
  • 8.
    OSS, humane, pure, flexible,efficient, scalable, highly clusterable, sexy, fresh, is actively ton of uses, has a client in every lean & small, supple, track record, tiny, and much moar... ...fun & easy, free inspiring, simple, innovative, robust, available, cool, portable, geeky, mature, stable, developed, has a rich, dependable, every language, proven production vibrant community, Why use Redis ❤❤1.5M ops / sec using a single EC2 instance! [6] Recorded webinar Because it is
  • 9.
    Getting started withRedis • Try it online at [7] http://try.redis.io/ • Build it from the source • [8] Download Redis Labs Enterprise Cluster • Run it in a container • [9] Connect to it from any language git clone https://github.com/antirez/redis cd redis git checkout 3.0.1 make; make test; make install docker run -d --name redis -p 6379:6379 redis
  • 10.
    Use case A:Google Analytics • A real time analytics platform provider • Strongly focuses on users' behavior • Primary data storage is MongoDB • Activity is collected immediately or in bulks • Raw data fed to Hadoop for offline crunching • Real time metrics and initial information from the stream is obtained with Redis
  • 11.
    The tidal flow Sessionsevents Real time analysis Offline analysis
  • 12.
    Deep dive topic:sessionizing data • Stream of events • A session is a document • Each has 10s-1000s events • Events from different users arrive in order but interleaved • The result: many small updates to each session's document • Peak load: 11M ops/sec and growing
  • 13.
    You say potato,I say potato Hash data type: HSET session:1 event:1 data HSET session:1 event:2 data ... HINCRBY session:1 seq 1 JSON: { session: 1, events: [ { id: 1, data: data }, { id: 2, data: data }, ...
  • 14.
    Swallowing in Python importredis import pymongo r = redis.Redis() session = r.hgetall('session:1') # {'event:1': 'data', 'event:2': 'data', 'seq': '2'} ... m = pymongo.MongoClient() db = m.rta sessionid = db.sessions.insert_one(session)
  • 15.
    Keeping track ofsessions • Sessions end after a logout or a timeout • Logout events are trivial to detect • Timeouts, e.g. 30 minutes of inactivity, are trickier to manage considering there could be 10,000s of active sessions • This is where Redis' key expiry and keyspace notifications come in very handy 
  • 16.
    Once you seeit, it can't be unseen Using Redis as a buffer in front of MongoDB for write- intensive, hot Big Data is a useful pattern that makes it easy to get information in real time as well as distribute the load more efficiently.
  • 17.
    Use case B:Waze • An international navigation app/service • Strongly focuses on public transit • 10s of millions of users during peak hours • Primary data storage is MongoDB • Base data is created in advance • Real time updates (traffic, vehicles and passengers) pour into Redis for scheduling adjustments and notifications
  • 18.
    Use case C:Tinder • A dating app/service • Strongly focuses on spatially-related groups • Primary data storage is MongoDB • Data includes user profiles & preferences • An influx of positional and preferential ("swipes") events is first munched by Redis
  • 19.
    Use case D:Clash of Clans • A massive real time game • Strongly focuses on matched team play • 1000s of teams with 100s of members • Primary data storage is MongoDB • Match progress is sieved through Redis for real time resources status, leaderboards and scoring
  • 20.
    Use case E:Weather.com • IoT startup • Focuses on environmental monitoring • Pilot: real time fire fighting • Primary data storage is MongoDB • Sensor data (temperature, humidity, …) is aggregated in Redis, providing warnings and alarms in real time
  • 21.
  • 22.
    Questions or feedback?Contact me! Itamar Haber Chief Developer Advocate 📧 itamar@redislabs.com @itamarhaber