Using Change Streams to
Keep Up With Your Data
Kevin Albertson, Software Engineer II
Kevin Albertson
Software Engineer II, MongoDB

@kevinAlbs
Why Real-Time Data?
March 8th, 2018
March 8th, 2018
Point A
Point B
March 8th, 2018
6:33 6:45 6:54
Tw
eet:everything
ok
Iarrive
atstation
Train
scheduled
7:00
Stillw
aiting
7:15
Stillw
aiting
March 8th, 2018
6:33 6:45 6:54
Tw
eet:everything
ok
Iarrive
atstation
Train
scheduled
7:15
Stillw
aiting
7:00
Stillw
aiting
6:33 6:45 6:54
Tw
eet:everything
ok
Iarrive
atstation
Train
scheduled
7:30
Stillw
aiting
7:15
Stillw
aiting
7:00
Stillw
aiting
6:33 6:45 6:54
Tw
eet:everything
ok
Iarrive
atstation
Train
scheduled
7:30
Stillw
aiting
7:15
Stillw
aiting
7:00
Stillw
aiting
7:45
Stillw
aiting
March 8th, 2018
6:33 6:45 6:54
Tw
eet:everything
ok
Iarrive
atstation
Train
scheduled
8:00
A
nnouncem
ent
7:30
Stillw
aiting
7:15
Stillw
aiting
7:00
Stillw
aiting
7:45
Stillw
aiting
March 8th, 2018
Unnecessary
6:33 6:45 6:54
Tw
eet:everything
ok
Iarrive
atstation
Train
scheduled
8:00
A
nnouncem
ent
7:30
Stillw
aiting
7:15
Stillw
aiting
7:00
Train
passes
7:45
Stillw
aiting
Arrive
in
N
YC
9:30
Take
train
in
otherdirection
9:30
Take
train
to
N
YC 11:00
Real-time is an Expectation
Change Streams Enable Existing Apps to
Provide Real-time Data
const	changeStream	=	
db.collection('train').watch();	
c h a n g e S t r e a m . o n ( ' c h a n g e ' , 	 ( c h a n g e ) 	 = > 	 { 	 c o n s o l e . l o g 	 ( c h a n g e ) 	 } ) ;
Added in 3.6
Improved in 4.0
{
"_id" : (resumeToken),
"operationType" : "insert",
"ns" : { "db" : "test", "coll" : "train" },
"documentKey" : {
"_id" : 123
},
"fullDocument" : {
"_id" : 123,
"text": "hello"
},
}
Change Stream
Characteristics
1. Use Collection Access Controls
2. Present a Defined API
3. Scale Across Nodes
Change Streams
Return 5 Operation Types
1. Insert
2. Update
3. Replace
4. Delete
5. Invalidate
mongos
3 2
1
Shard	1 Shard	2 Shard	3
Total Ordering of Changes Across Shards
Documents Uniquely Identified
Shard key combined with _id to uniquely identify documents
mongos
Shard	2 Shard	3Shard	1
Changes are Durable
S
S
collection.watch()
P
Change Streams are Resumable
collection.watch()
PP
{	
	_id:	<resumeToken>,	
	operationType:	'update'	
	...	
}	


SP
S
Change Streams are Resumable
changeStream.on ('change', (change) => {
console.log(change);
cachedResumeToken = change["_id"];
});
changeStream.on ('error', () => {
if (cachedResumeToken) {
establishChangeStream (cachedResumeToken);
}
});
Change Streams Utilize the Power of the
Aggregation Framework
$match $project $addFields $replaceRoot $redact
var	changeStream	=	coll.watch([{	$match:	{operationType:	{$in:	['delete',	'replace']}}}]);
1. Collection Access Controls
2. Defined API
3. Enable Scaling
4. Total Ordering
5. Durable
6. Resumable
7. Power of Aggregation
Types of Changes
Insert
> db.collection.watch([],
{maxAwaitTimeMS: 30000}).pretty()
> db.collection.insert({
_id: 1,
text: "hello"
})
Shell 1 Shell 2
{
"_id" : (resumeToken),
"operationType" : "insert",
"ns" : {
"db" : "test",
"coll" : "test"
},
"documentKey" : {
"_id" : 1
},
"fullDocument" : {
"_id" : 1,
"text": "hello"
},
}
Update
> db.collection.watch([],
{maxAwaitTimeMS: 30000}).pretty()
> db.collection.updateOne({_id: 1},
{ $set: {“text”: “updated”} })
Shell 1 Shell 2
{
"_id" : (resumeToken),
"operationType" : "update",
"ns" : {
"db" : "test",
"coll" : "collection"
},
"documentKey" : {
"_id" : 1
},
"updateDescription" : {
"updatedFields" : {
"text" : "updated"
},
"removedFields" : [ ]
}
}
Update with fullDocument: updateLookup
> db.collection.watch([],
{ fullDocument: “updateLookup”,
maxAwaitTimeMS: 30000 }).pretty()
> db.collection.updateOne({_id: 1},
{ $set: {“number”: “5”} })
Shell 1 Shell 2
{ "_id" : (resumeToken),
"operationType" : "update",
"fullDocument" : {
"_id" : 1,
"text" : "updated",
"number" : 5
},
"ns": {"db": "test", "coll": "collection"},
"documentKey" : { "_id" : 1 },
"updateDescription" : {
"updatedFields" : { "number" : 5 },
"removedFields" : [ ]
}
}
Replace
> db.collection.watch([],
{maxAwaitTimeMS: 30000}).pretty()
> db.collection.replaceOne({_id: 1},
{text: "replaced"})
Shell 1 Shell 2
{
"_id" : (resumeToken),
"operationType" : "replace",
"ns" : {
"db" : "test",
"coll" : "collection"
},
"documentKey" : {
"_id" : 1
},
"fullDocument" : {
"_id" : 1,
"text" : "replaced"
}
}
Delete
> db.collection.watch([],
{maxAwaitTimeMS: 30000}).pretty()
> db.collection.remove({_id: 1})
Shell 1 Shell 2
{
"_id" : (resumeToken),
"operationType" : "delete",
"ns" : {
"db" : "test",
"coll" : "collection"
},
"documentKey" : {
"_id" : 1
}
}
Invalidate
> db.collection.watch([],
{maxAwaitTimeMS: 30000}).pretty()
> db.collection.drop()
Shell 1 Shell 2
{
"_id" : (token),
"operationType" : "invalidate"
}
New in 4.0
1. startAtOperationTime
2. Change stream on db or client
3. Resumable before first change
Use Case
Real-time Train Arrival Estimates
Collection
> db.train.find().pretty()
{
"_id" : 11,
"position" : {
"lat" : 40.04714,
"lon" : -74.0351
}
}
App Caches State of All Trains
App Cache
_id: 11 (40.0, -74.0)
	db.train.find().readConcern("majority")
App Cache
Populate the Cache
_id: 11 (40.0, -74.0)
	db.train.watch([])
_id:	(resumeToken),	
operationType:	'update',	
updateDescription:	{	
		updatedFields:	{	lat:	40.41169	}	
},	
documentKey:	{_id:	11}	
Handled
_id:	(resumeToken),	
operationType:	'update',	
updateDescription:	{	
		updatedFields:	{	
					passengerCount:	5	
		}	
}	
documentKey:	{_id:	11}	
Handled?
App Cache _id: 11 (40.4, -74.0)
Watch for changes
$match to Filter Irrelevant Changes
db.train.watch([{	
				$match:	{	
								$or:	[	
												{'updateDescription.updatedFields.lat':	{$exists:	1}},	
												{'updateDescription.updatedFields.lon':	{$exists:	1}}	
								]	
				}	
}])
_id:	(resumeToken),	
operationType:	'insert',	
fullDocument:	{	
		_id:	16,	
		lat:	40.5889,	lon:	-76.1938	
},	
documentKey:	{_id:	16}
Ignored?
_id:	(resumeToken),	
operationType:	'update',	
updateDescription:	{	
		updatedFields:	{	
					passengerCount:	5	
		}	
}	
documentKey:	{_id:	11}
Ignored
App Cache _id: 11 (40.4, -74.0)
Handle All Document Changes
If Caching State of Documents:
db.train.watch([{	
				$match:	{	
								$or:	[	
												{	
																operationType:	'update',	
																$or:	[	
																				{'updateDescription.updatedFields.lat':	{$exists:	1}},	
																				{'updateDescription.updatedFields.lon':	{$exists:	1}}	
																]	
												},	
												{	operationType:	{	$in:	['delete',	'insert',	'replace']	}	}	
								]	
				}	
}])
App Cache _id: 11 (40.4, -74.0)
_id:	(resumeToken),	
operationType:	'insert',	
fullDocument:	{	
		_id:	16,	
		lat:	40.5889,	lon:	-76.1938	
},	
documentKey:	{_id:	16}
Handled
_id: 16 (40.4, -74.0)
_id:	(resumeToken),	
operationType:	'delete',	
documentKey:	{_id:	11}Handled
Race Between Populating and Listening!
	db.train.find().readConcern("majority")
.
.
.
	db.train.watch()
update occurs!
find
Client Server
[ {_id: 11}…], $clusterTime: X
watch, startAtOperationTime: X
start session
startAtOperationTime
startAtOperationTime
const session = client.startSession();
const cursor = db.collection('train').find({}, { session: session });
populateApplicationCache(cursor);
const changeStream = db.collection('train').watch([], {
session: session,
startAtOperationTime: session.operationTime
});
changeStream.on('change', updateApplicationCache);
App Complete!
1. Doesn't miss updates
2. Handles network blips
3. Reduces bandwidth
4. Easy to prototype
MongoDB World 2018: Using Change Streams to Keep Up with Your Data

MongoDB World 2018: Using Change Streams to Keep Up with Your Data