7. # M D B l o c a l
updatereplaceinsertdeleteMongoDB Application Action
@ A LY_ C A B R A L
8. # M D B l o c a l
• Resumable
• Targeted Changes
• Total ordering
• Durability
• Security
• Ease of use
• Idempotence
Characteristics of Change Streams
@ A LY_ C A B R A L
9. # M D B l o c a l
coll.watch()
coll.watch({resumeAfter:<cachedResumeToken>})
coll.watch({$match: {operationType: “insert”}})
Change streams
@ A LY_ C A B R A L
Introduction to Change Streams
2:50PM
Shard 1
11. # M D B l o c a l
EXPRESSIBILITY
MORE EXPRESSIVE QUERY LANGUAGE
ARRAY UPDATE
12. # M D B l o c a l
db.products.find( {
$expr: {
$gt: [ "$currRating" , "$prevRating" ]
}
} )
Comparing Fields within a document
@ A LY_ C A B R A L
13. # M D B l o c a l
db.products.find( {
$expr: {
$gt: [ {
$subtract: ["$currRating" , "$prevRating”] },
2]
}
} )
Comparing Fields within a document
@ A LY_ C A B R A L
14. # M D B l o c a l
Expressive Array updates
Update all matching items in an
array
Match Nested Arrays
@ A LY_ C A B R A L
15. # M D B l o c a l
Expressive Array updates
{ _id: 1, name: "X",
Medications: [
{ id: 23, name: "DrugName99", Sched:"I", Rx: [
{ id:13, Qty: 60, started: "2009-01-01" },
{ id:77, Qty: 30, started: "2011-02-01", current:true }
]},
{ id: 41, name: "OtherDrugName", Sched: "II" },
{ id: 59, name: "ThirdDrug", Sched:"I", Rx:[
{ id:994, Qty: 60, started: "2012-01-01", current:true },
{ id:1034, Qty: 90, started: "2007-02-01" }
]},
"lastVisit": ISODate("2017-01-22T13:01:13.000Z")
}
@ A LY_ C A B R A L
16. # M D B l o c a l
Expressive Array updates
{ _id: 1, name: "X",
Medications: [
{ id: 23, Sched:"I", Rx: [
{ id:13, Qty: 60},
{ id:77, Qty: 30, current:true}
]},
{ id: 41, Sched: "II" },
{ id: 59, Sched:"I", Rx:[
{ id:994, Qty: 60, current:true},
{ id:1034, Qty: 90}
]}
}
db.patientRx.update(
{},
{"$set": {"Medications.$[med].Rx.$[rx].Qty": 30}},
{"arrayFilters": [
{"med.Sched": "II"},
{"rx.Current":true, "rx.Qty":{$gt:30}}
]},
{"multi": true }
@ A LY_ C A B R A L
17. # M D B l o c a l
Expressive Array updates
{ _id: 1, name: "X",
Medications: [
{ id: 23, Sched:"I", Rx: [
{ id:13, Qty: 60},
{ id:77, Qty: 30, current:true}
]},
{ id: 41, Sched: "II" },
{ id: 59, Sched:"I", Rx:[
{ id:994, Qty: 60, current:true},
{ id:1034, Qty: 90}
]}
}
db.patientRx.update(
{"Medications": {"$elemMatch": {
"Sched": "II",
"Rx": {"$elemMatch": {
"Current": true,
"Qty": {"$gt": 30}
}}
}}},
{"$set": {"Medications.$[med].Rx.$[rx].Qty": 30}},
{"arrayFilters": [
{"med.Sched": "II"},
{"rx.Current":true, "rx.Qty":{$gt:30}}
]},
{"multi": true }
@ A LY_ C A B R A L
18. # M D B l o c a l
EXPRESSIBILITY
MORE EXPRESSIVE QUERY LANGUAGE
ARRAY UPDATE
19. # M D B l o c a l
ANALYTICS
NEW OPERATORS
TIMEZONE SUPPORT
EXPRESSIVE $LOOKUP
R DRIVER
20. # M D B l o c a l
New operators
@ A LY_ C A B R A L
21. # M D B l o c a l
Timezone support
$dayOfYear, $dayOfMonth, $dayOfWeek, $year, $month, $week, $hour,
$minute, $second, $millisecond, $isoDayOfWeek, $isoWeek, $isoWeekYear
{ $operator: { date: <isoDateExpression>, timezone: <tzExpression> } }
@ A LY_ C A B R A L
22. # M D B l o c a l
>db.collection.aggregate({$lookup:{
from:"coll2",
localField:"x",
foreignField:"y",
as:"coll2details"
}}])
More expressive $lookup
>db.collection.aggregate({$lookup:{
from:"coll2",
let: {x: "$x"},
pipeline: [
{$match: {$expr:
{$eq: [ "$y", "$$x"] }
} }
]
as:"coll2details"
}}])
Prior to 3.6 New in 3.6
@ A LY_ C A B R A L
23. # M D B l o c a l
$lookup example
> db.users.aggregate([
{$match: { < condition matching subset of users > } },
{$lookup:{
from:"sessions",
let: {uid: "$_id"},
pipeline: [
{$match: {$expr: { $eq: [ "$user_id", "$$uid"] } } },
{$sort: { startTime: -1 } },
{$limit: 1},
{$project: {
startTime:1,
active:{$ne:[{$type:"$endTime"},"date"]},
_id: 0
}}
],
as:"lastSession"
}}
])
@ A LY_ C A B R A L
24. # M D B l o c a l
$lookup example
> db.users.aggregate([
{$match: { < condition matching subset of users > } },
{$lookup:{
from:"sessions",
let: {uid: "$_id"},
pipeline: [
{$match: {$expr: { $eq: [ "$user_id", "$$uid"] } } },
{$sort: { startTime: -1 } },
{$limit: 1},
{$project: {
startTime:1,
active:{$ne:[{$type:"$endTime"},"date"]},
_id: 0
}}
],
as:"lastSession"
}}
])
@ A LY_ C A B R A L
25. # M D B l o c a l
Recommended MongoDB R driver for data scientists, developers & statisticians
• MongoDB read & write concerns to control data consistency & durability
• Idiomatic, native language access to the database
• Data security with enterprise authentication mechanisms
• BSON data type support, e.g., Decimal 128 for high precision scientific &
financial analysis
R Driver for MongoDB
26. # M D B l o c a l
ANALYTICS
NEW OPERATORS
TIMEZONE SUPPORT
EXPRESSIVE $LOOKUP
R DRIVER
27. # M D B l o c a l
APPLICATION
AVAILABILITY
RETRYABLE WRITES
RC: AVAILABLE & TUNABLE CONSISTENCY
28. # M D B l o c a l
Retryable writes
PS
S
Application P
write successful :D
write unsuccessful :O
@ A LY_ C A B R A L
29. # M D B l o c a l
Retryable writes
Application MongoDB
{ _id: {
team : "Manchester United",
gameID: "game123"
}
coach : "José Mourinho",
roster : [...],
score : 2
league: "Premier League",
}
{ _id: {
team : "Chelsea",
gameID: "game123"
}
coach : "Antonio Conte",
roster : [...],
score : 2
league: "Premier League",
}
db.games.update(
{$and: [{team:"Chelsea"},
{gameID:"game123"}]
},
{$inc: {quantity:1, score:1}}
)
@ A LY_ C A B R A L
30. # M D B l o c a l
• Automatic Drivers logic
• Network errors
• Elections
• NOT for logic errors
• Safe
• For both non-idempotent and idempotent writes
• NOT for multi: true
Characteristics of Retryable Writes
@ A LY_ C A B R A L
31. # M D B l o c a l
Tunable consistency
Availability Consistencymagic
@ A LY_ C A B R A L
32. # M D B l o c a l
• readConcern (Read Isolation)
• Local
• Majority
• Linearizable
• writeConcern (write acknowledgement)
• <number> (i.e. 1)
• Majority
• Tag
What is readConcern and writeConcern?
@ A LY_ C A B R A L
33. # M D B l o c a l
Tunable consistency
mongos
3 1
2
Shard 1 Shard 2 Shard 3
readConcern: available
@ A LY_ C A B R A L
34. # M D B l o c a l
Tunable consistency
readConcern: available is equivalent to readConcern: local on replica sets
you can pass readConcern: available to a primary in a sharded cluster
readConcern: available is default for secondaries in a sharded cluster
Secondaries in sharded clusters will now respect readConcern : local for safe reads
@ A LY_ C A B R A L
35. # M D B l o c a l
Tunable consistency
mongos
3 1
2
Shard 1 Shard 2 Shard 3
Causal consistency Global Logical Clock
wait until cluster time has moved past the last time you saw
@ A LY_ C A B R A L
36. # M D B l o c a l
APPLICATION
AVAILABILITY
RETRYABLE WRITES
RC: AVAILABLE & TUNABLE CONSISTENCY
37. # M D B l o c a l
OPERATIONS
JSON SCHEMA
NETWORK SECURITY
SESSION MANAGEMENT
END-TO-END COMPRESSION
38. # M D B l o c a l
Network Security
Bind to localhost by Default
IP Whitelisting
• Associate IP addresses or ranges with roles in auth
• If a the IP restrictions are not met, fail to authenticate
• Able to restrict __system user to authenticate from only cluster nodes
@ A LY_ C A B R A L
39. # M D B l o c a l
JSON Schema
{"$jsonSchema": {"type": "object"
"required": ["_id", "Name", "Medications"],
"properties": { "_id": {"type": "objectId"},
"Name": {"type": "string"},
"Meds": { "type": "array",
"items": { "type": "object",
"required": ["id", "Name", "Sched"],
"properties": {"id": {"type": "number"},
"Name": {"type": "string"},
"Rx": { "type": "array",
"items": { "type": "object",
"required": ["id", "Qty"],
"properties":{ "id": {"type": "number"},
"Qty": {"type": "integer"},
"Started": {"type": "string"},
"Current": {"type": "boolean"}
} }, /* items */
"additionalProperties": false
} /* Rx */ } /* properties */
"additionalProperties": {"type":"string"} }, /* items */
} /* Meds */}, /* properties */
"additionalProperties": false
}} /* $jsonSchema */
@ A LY_ C A B R A L
40. # M D B l o c a l
Session Management
Server sessions
• Every Operation is wraped in a server session by deafult in 3.6
• killSessions by user
Client sessions
• Every operation within a defined client session have causal consistency
• Not default, must be explicitly defined
@ A LY_ C A B R A L
41. # M D B l o c a l
MongoDB 3.6 adds compression
of wire protocol traffic between
client and database
• Up to 80% bandwidth savings
MongoDB End to End
Compression
• Wire protocol
• Intra-cluster
• Indexes in memory
• Storage
Application
MongoDB Primary
Replica
Wire Protocol
Compression
MongoDB Secondary Replica
Single ViewMongoDB Secondary Replica
Single ViewMongoDB Secondary Replica
Single ViewMongoDB Secondary Replica
Single ViewMongoDB Secondary Replica
MongoDB Secondary Replica
Intra-Cluster
Compression
Compression of
Data on Disk
Compression of
Indexes in Memory
End to End Compression
42. # M D B l o c a l
OPERATIONS
JSON SCHEMA
NETWORK SECURITY
SESSION MANAGEMENT
END-TO-END COMPRESSION
43. # M D B l o c a l
Realtime
Expressibili
ty Analytics Application
Availability
Operations
@ A LY_ C A B R A L
Here’s why:
Arrays may need to have more than one element updated in a single update
currently positional operator ($) only updates the first match
Arrays may contain objects which themselves contain arrays
currently positional operator ($) does not work at all for nested arrays
3.6 will introduce a new update modifier (called "arrayFilters") which allows specifying exactly which elements in which arrays should be updated, including nested arrays to multiple levels
What if we must update all current prescriptions for medications of Schedule "II" to be at most 30 quantity?
Currently:
{"medications.Sched":"II", rest of query} ... but now we can't update inside Rx array easily and must process it client side. Plus only the first matching medication will be updated!
What if we must update all current prescriptions for medications of Schedule "II" to be at most 30 quantity?
Currently:
{"medications.Sched":"II", rest of query} ... but now we can't update inside Rx array easily and must process it client side. Plus only the first matching medication will be updated!
What if we must update all current prescriptions for medications of Schedule "II" to be at most 30 quantity?
Currently:
{"medications.Sched":"II", rest of query} ... but now we can't update inside Rx array easily and must process it client side. Plus only the first matching medication will be updated!
New expressions allow richer data transformations within the aggregation pipeline, including the ability to convert objects to arrays of key-value pairs, and arrays of key-value pairs to be converted to objects. The mergeObjects expression is useful for setting missing fields into default values, while the REMOVE variable allows the conditional exclusion of fields from projections based on evaluation criteria.
MongoDB stores Date/Timestamps as UTC/Zulu/GMT time. MongoDB does not store time zone information. When doing reporting, users want to group documents into days or hours based on a particular (local) time zone. Even though MongoDB "ISODate" type includes and displays time zone specification, the only time zone that it supports is Zulu (UTC) time.
MongoDB version 3.6 adds time zone options to all aggregation expressions that deal with dates. There is no change to the BSON specification and the only date type that can be stored in the database is ISODate which represents time as milliseconds since epoch - without any time zone. However, all date-related expressions in 3.6 aggregation accept a new option, timezone, to specify the time zone to which the stored ISODate value will be converted.
Expressive $lookup
Relationships are everywhere
Just because you have relationships amongst data does not mean you need a relational database!
$lookup in 3.2 and 3.4
simple equijoin only
if you need only one thing or subset, you have to filter $lookup result array in pipeline afterwards
$lookup in 3.6
$lookup from pipeline with access to values in original collection
instead of looking up "from" entire collection, you lookup from "pipeline" on "from" collection
can use $sort, $limit, $match, $project, etc...
“Relational databases core belief data should live its most semantically pure state, and use joins to let people model the real world. In MongoDB, we think modeling the real world is the most important, and $lookup lets you model relationships between discrete “things” as opposed to facts. For example, i have a relationship with my friends, but my addresses are a core fact of myself”
example: /* from review collection lookup only the most recent review for the user. exclude text */
db.user.aggregate([
{$lookup:{from: "review",
let: {userId:"$user_id"},
pipeline: [{$match:{user_id:{$expr:"$$userId"}}},{$sort:{review_date:-1}},{$limit:1},{$project:{text:0}}],
as: "latestReviewWithoutText" } }
]);
You have two collections, one is users, the other is all sessions for the users. For each user that matches an expression you want to find their most recent session start time and whether it's still active:
Here for each user we look up their sessions, reverse sort them by startTime, keep just one (the one with the highest startTime value) and $project only two fields: startTime and whether or not endTime is set to a value of type "date" (if it's not, then the session is still active).
You have two collections, one is users, the other is all sessions for the users. For each user that matches an expression you want to find their most recent session start time and whether it's still active:
Here for each user we look up their sessions, reverse sort them by startTime, keep just one (the one with the highest startTime value) and $project only two fields: startTime and whether or not endTime is set to a value of type "date" (if it's not, then the session is still active).
A new recommended R driver for MongoDB is now available, enabling you to get the same first class experience with MongoDB as that offered by the other MongoDB drivers – providing idiomatic, native language access to the database. The driver supports advanced MongoDB functionality, including:
Read and write concerns to control data consistency and durability.
Enterprise authentication mechanisms, such as LDAP and Kerberos, to enforce security controls against the database.
Support for advanced BSON data types such as Decimal 128 to support high precision scientific and financial analysis.
You may be wondering, why do I need to know about this stuff anyway? And how the heck are these things even related?
Well, as you know, availability and consistency can be at odds with one another.
And, today, you may be making tradeoffs (that you may not be entirely aware of) that put availability over consistency or vice versa.
There is no wrong answer here, it entirely depends on your application’s needs. For instance, applications that always need to be online (user facing front ends that want to keep their customers) may be willing to sacrifice consistency in degraded clusters so as to continue to serve their user base.
Think of a social media site that will gladly choose availability over the consistency of every post.
Then we have those of you using MongoDB for e-commerce applications where you would gladly choose consistency guarantees over some downtime.
Both of these extremes are valid.
You might fit somewhere in between.
I want you to walk away from the chapter armed with the tools MongoDB 3.6 gives you to tune toward your use case and consciously decide the best option for your needs.