Making Mongo realtime - oplog tailing in Meteor

Making Mongo
Realtime
Oplog tailing in Meteor
David Glasser
Meteor DevShop 10, 2013-Dec-05

Meteor makes realtime the
default
When data changes in your database, everybody's web UI updates
automatically without you having to write any custom code.

How the Meteor realtime stack
works
The server runs its publish function, which typically returns a cursor
Mto.uls(tpsoe" fnto (aed {
eerpbih"o-crs, ucin gmI)
cekgmI,Srn)
hc(aed tig;
rtr Soe.id{ae gmI}
eun crsfn(gm: aed,
{ot {cr:-} lmt 5 fed:{cr:1 ue:1};
sr: soe 1, ii: , ils soe , sr })
};
)

The server watches the query in the database and observes its
changes
When changes happens, the server sends DDP data messages to the
client
The client updates its local cache
Changes to the local cache cause Meteor UI to re-render templates

o s r e h n e makes Mongo
bevCags
a realtime database
o s r e h n e is a brand new API in Meteor's Mongo client interface.
bevCags
hnl =Msae.id{om roI}.bevCags{
ade
esgsfn(ro: omd)osrehne(
add fnto (d fed){.}
de: ucin i, ils ..,
cagd fnto (d fed){.}
hne: ucin i, ils ..,
rmvd fnto (d {.}
eoe: ucin i) ..
}
)

o s r e h n e executes and calls the a d dcallback for each
bevCags
de
matching document
It continues to watches the database and notices when the query's
results change
When the results change, it calls the a d d c a g d and r m v d
de, hne,
eoe
callbacks asynchronously
This continues until you call h n l . t p )
adeso(

o s r e h n e supports all
bevCags
Mongo queries
Meteor turns the full query API of a real database into a live query API
No more custom per-query code to monitor the database and see
when it changes
It's our job to make o s r e h n e as efficient as possible for as
bevCags
many queries as possible

poll-and-diff
Run a query over and over, and compare the results each time
vrrsls={;
a eut
}
vrplAdif=fnto ( {
a olnDf
ucin )
cro.eid)
usrrwn(;
vrodeut =rsls
a lRsls
eut;
rsls={;
eut
}
cro.oEc(ucin(o){
usrfrahfnto dc
rslsdc_d =dc
eut[o.i]
o;
i (.a(lRsls dc_d)
f _hsodeut, o.i)
clbcscagddc_d cagdilsewe(lRslsdc_d,dc)
alak.hne(o.i, hneFedBtenodeut[o.i] o);
es
le
clbcsadddc_d dc;
alak.de(o.i, o)
};
)
_ec(lRsls fnto (o,i){
.ahodeut, ucin dc d
i (_hsrsls i)
f !.a(eut, d)
clbcsrmvdi)
alak.eoe(d;
};
)
}
;
stnevlplAdif 1 *10)
eItra(olnDf, 0
00;
weWWieohCleto(olnDf)
hnertTTeolcinplAdif;

When do we re-run the query?
Every time we think the query may have changed: specifically, any
time that the current Meteor server process writes to the collection
Additionally, every 10 seconds, to catch writes from other processes

Benefits of poll-and-diff
Code is short and correct
Writes from the current process are reflected on clients immediately
Writes from other processes are reflected on clients eventually

Drawbacks of poll-and-diff
1. Cost proportional to poll frequency: number of polls grows with the
frequency of data change
2. Cost proportional to query result size: Mongo bandwidth, CPU to
parse BSON and perform recursive diffs, etc
3. Latency depends on whether a write originated from the same server
(very low) or another process (10 seconds)

Optimizations to poll-and-diff
1. Infer that a write does not affect an observe, then skip the poll. (eg,
when both the write and the query specify a specific _ d
i .)
2. Query de-duplication: if multiple connections want to subscribe to the
same query, use the same poll-and-diff fiber for all of them.

The Mongo oplog
MongoDB replication uses an operation log describing exactly what
has changed in the database
You can tail the oplog: follow along and find out about every change
immediately

Using oplog tailing for
osrehne
bevCags
We're going to let the database tell us what changed

Oplog tailing, conceptually
Soe.id{ae "arm} {ot {cr:-} lmt 3 fed:{cr:1 ue:1}
crsfn(gm: cro", sr: soe 1, ii: , ils soe , sr })

Run the query and cache:
{i:"x" ue:"vtl,soe 10
_d xx, sr aia" cr: 5}
{i:"y" ue:"am" soe 10
_d yy, sr noi, cr: 4}
{i:"z" ue:"lv" soe 10
_d zz, sr saa, cr: 3}

Oplog says:
{p "net,i:"w" {ae "kebl" ue:"lse" soe 10}
o: isr" d ww, gm: se-al, sr gasr, cr: 00}

Ignore it: does not match selector.


Cache is:

Oplog says:
{p "net,i:"a" {ae "arm,ue:"lse" soe 1}
o: isr" d aa, gm: cro" sr gasr, cr: 0}

Ignore it: selector matches, but the score is not high enough.


Cache is:

Oplog says:
{p "eoe,i:"p"
o: rmv" d pp}

Ignore it: removing something we aren't publishing can't affect us
(unless skip option is set!)


Cache is:

Oplog says:
{p "pae,i:"c" {st {oo:"le}}
o: udt" d cc, $e: clr bu"}

This is a document not currently in the cursor. This change does not
affect the selector or the sort criteria, so it can't affect the results. Ignore
it!


Cache is:

Oplog says:
{p "pae,i:"x" {st {oo:"e"}
o: udt" d xx, $e: clr rd}}

This is a document in the cursor, but it does not affect the selector, sort
criteria, or any published fields. Ignore it!


Cache is:

Oplog says:
{p "pae,i:"d" {st {ae "oiin}}
o: udt" d dd, $e: gm: dmno"}

This is a document not currently in the cursor. This change is to a field
from the selector, but it can't make it true. Ignore it!


Cache is:

Oplog says:
{p "pae,i:"x" {st {sr "v"}
o: udt" d xx, $e: ue: ai}}

Invoke c a g d " x " { s r " v " )
hne(xx, ue: ai}.
Cache is now:
{i:"x" ue:"v" soe 10
_d xx, sr ai, cr: 5}


Cache is:

Oplog says:
{p "net,i:"b" {sr "lse" gm:"arm,soe 20}
o: isr" d bb, ue: gasr, ae cro" cr: 0}

Matches and sorts at the top!
Invoke a d d " b " { s r " l s e " s o e 2 0 )and
de(bb, ue: gasr, cr: 0}
rmvd"z".
eoe(zz)
Cache is now:
{i:"b" ue:"lse" soe 20
_d bb, sr gasr, cr: 0}


Cache is:
{i:"b" ue:"lse" soe 20
_d bb, sr gasr, cr: 0}

Oplog says:
{p "pae,i:"e" {st {cr:50}
o: udt" d ee, $e: soe 0}}

This matches if "eee" has g m : " a r m . We have to fetch doc "eee"
ae cro"
from Mongo and check.
If it does, invoke a d d " e " { s r " m l " s o e 5 0 )
de(ee, ue: eiy, cr: 0}
and r m v d " y " . Otherwise, do nothing.
eoe(yy)

Minimongo on the server
In order to process the oplog, we need to be able to interpret Mongo
selectors, field specifiers, sort specifiers, etc
This was not the case for poll-and-diff
Fortunately, Meteor already can do this: minimongo, our client-side
local database cache!
When moving minimongo to the server, we need to be very careful
that we perfectly match Mongo's implementation, even in complex
cases (nested arrays, nulls, etc)
Synchronizing between the "initial query" and the oplog tailing is very
subtle

Benchmarks
Running benchmarks with various high write loads
Benchmark with lots of inserts and few updates: 10x more connected
clients
Benchmark with lots of updates: 2.5x more connected clients
Goal: Scale Meteor so that the DB is the limiting factor
Bottleneck: Mongo server CPU/bandwidth
Can fix by reading from Mongo replicas
More unimplemented heuristics

On d v l
e e today!
Oplog tailing for an initial class of Mongo queries in the next release
Other classes of queries will be supported by 1.0
Current implementation runs automatically for dev-mode m t o
eer
r nand can be enabled in production with $ O G _ P O _ R
u
MNOOLGUL
Works with Galaxy!

Making Mongo realtime - oplog tailing in Meteor

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Making Mongo realtime - oplog tailing in Meteor

Similar to Making Mongo realtime - oplog tailing in Meteor (20)

Recently uploaded

Recently uploaded (20)

Making Mongo realtime - oplog tailing in Meteor