4. The hubba stack
● MEANR - Mongo, Express, Angular, Node, React
● Many Services (Payments, Products, Users, etc)
● Three Engineering teams (one Python, two JS)
● AWS, GCP, MongoDB Atlas, RabbitMQ, Redis, etc
● Mongoose, Mongo Native Driver
5. What is a change stream?
Change streams allow applications to access real-
time data changes without the complexity and risk
of tailing the oplog. Applications can use change
streams to subscribe to all data changes on a
collection and immediately react to them. --
https://docs.mongodb.com/manual/changeStrea
ms/
6. Why are we here?
Hubba
● 6 year old company
● Networking for Brands and Buyers
● Microservices
● Decided to launch ordering
7. The exact product you ordered must be
delivered to you
Ordering must haves
This Not this
8. What exactly is a version?
A particular form of something differing in certain
respects from an earlier form or other forms of the
same type of thing. --
https://www.google.com?q=define+version
9. 1. Options
➔ Not versioning
Previously wasn’t needed, continue to
not us this
➔ Make copy
How do we know what to copy?
➔ In-app versioning
Lean into Mongoose and version the
app
➔ Oplog versioning
Write new service to consume the
oplog
14. Denormalization:
Denormalization allows you to avoid some application-
level joins, at the expense of having more complex and
expensive updates. Denormalizing one or more fields
makes sense if those fields are read much more often
than they are updated. --
https://www.mongodb.com/blog/post/6-rules-of-
thumb-for-mongodb-schema-design-part-3
15. Denormalization Example - Pre bearer of the ring
Users / Hobbits
{
_id : 1,
name : Frodo,
occupation: unemployed
}, {
_id : 2,
name : Sam,
occupation: unemployed
},
Messages
{
_id : 93,
from : 2,
to: 1,
fromOccupation: unemployed,
toOccupation: unemployed,
message: What do you call a hobbit party?
}, {
_id : 94,
from : 1,
to: 2,
fromOccupation: unemployed,
toOccupation: unemployed,
message: A little get together.
}
16. Denormalization Example - Post bearer of the ring
Users / Hobbits
{
_id : 1,
name : Frodo,
occupation: Bearer of the ring
}, {
_id : 2,
name : Sam,
occupation: Protector of Frodo
},
Messages
{
_id : 93,
from : 2,
to: 1,
fromOccupation: Protector of Frodo,
toOccupation: Bearer of the ring,
message: What do you call a hobbit party?
}, {
_id : 94,
from : 1,
to: 2,
fromOccupation: Bearer of the ring,
toOccupation: Protector of Frodo,
message: A little get together.
}
message.update({to: ObjectId: 1}, {$set: {toOccupation: Bearer of the ring}}, {multi: true})
21. 2. Why Mongo 3.6
➔ Easy Versioning
Ability to do versioning without
significant architecture changes
➔ Raw Queries
Allowed us to use existing raw queries
without altering to support versioning
➔ Many Sources
No front-end data consumers did not
need to be altered
40. Implementation
- Proof of concept
- Used full document to consume all changes
from Mongo and write them to our DB
- Successfully mirrored actual documents
41. Gotcha #1
Principle of eventual consistency
- Full document wouldn’t always represent the
document that existed in the DB
Lesson: If you’re using change streams for event sourcing,
you can source attributes out of the update description, but
not out of the full document
42. Solution:
Be more like a database
Change Streams are guaranteed to be delivered in
the order that the change happens. Source the
events as they happen, and apply update
description to the previous version
43. Gotcha #2
How do we Bootstrap versions into DB
- With lack of insert events for all records, how do we get
them into the DB?
49. Gotcha #5
We fell off the Oplog
If your oplog can hold 100 documents, and I fill it up with
messages, you will not be able to resume the products
change stream
50. Solution:
We got a bigger oplog and update our
versioned collections
We build a script to run every hour and randomly
update one of the documents in every versioned
collection
51. Cons
➔ Histories has no context
Histories is unable to validate the data
it gets, its just a blind data store
➔ Spinning up a whole service
We could have solved this with in app
versioning, now we maintain an extra
service
➔ Issues upgrading
We had a few road bumps with
performance when we first released
mongo 3.6 + the new drivers
52. Benefits
➔ Histories is isolated
As long as our data is persisted we
have a history of it
➔ We get to keep our raw queries
Our denormalization strategies
continue to work
➔ Language support
We do not need to implement a history
support for every language we use, just
a version generator, if we want access.
54. Conclusion:
1. Change streams are a great way to follow updates to
your documents
2. Using change streams for event sourcing would be
amazing
3. If you are versioning data in a legacy app, change
streams may be for you
L:
Introductions to who we are
My name is Leigha, I am a developer at hubba, this is Edward, also a developer at hubba
L:
This presentation will be about mongo change streams, and how we use them at hubba to version our data
L
L
ED: Change streams where introduced in mongo 3.6, they are a replacement for using the oplog to subscribe to changes to your documents in your database they can be used for all sorts of fancy things. At hubba we currently use them to version our data, but would like to use them for event sourcing moving forward
Better explaination about change streams
Explain the oplog
L: Explain Hubba in terms of buyers and brands, introducing ordering
We need to focus more on hubba and why its important
L: The key part of this is that the exact product you ordered must be delivered to you, so when the product description changes, we need to know that the description changed, and capture that in our system.
ED: A version is a reference to a particular form of something, this makes versioning the act of creating those references to a particular form of something. As developers we all use versioning every day, every time you use git, push a release or install a package.
L: We learned about the oplog last time at mongo world
Before this, what is versioning
L: Maybe not the best user experience
ED:
When you land on a page to order a product, that product can change before the user adds it to the card
L: Different teams, using different ORMs, raw mongo queries
L: Different teams, using different ORMs, raw mongo queries
Leigha:
Help us make this slide better!!!!!!!!!!!! Explain Raw mongo queries
Highlight keywords here!
L:
L:
ED: We learned about a few people using the oplog for event sourcing and version last year at mongo world!
Mongo DB world
Rolled back
L:
L:
L: we were on Mongo 3.4 at the time
L:
ED: Our Changestream design
Explain architecture
Explain process of adding product to your cart
ED:6 fields
update
ED:6 fields
L: Show an example of a version number
Show an example of incrementing the version number
L:
Show an example of a version number
Show an example of incrementing the version number
We need to explain this better! (I need to set up this up better)
L:
Show an example of a version number
Show an example of incrementing the version number
We need to explain this better! (I need to set up this up better)
L:
Show an example of a version number
Show an example of incrementing the version number
We need to explain this better! (I need to set up this up better)
ED:6 fields
Delete
Explain invalidate better (we dont do it, this happens when a collection is dropped)
Highlight the info
E
ED
L: This happens when a document is updated very quickly many times. This is because Change streams can be configured to only send when the document is durable, however the full document only comes from a node in your cluster
Explain more, how it affected our user
History service would corrupt documents because it would overlap
Quarem
There’s previous data, since we can’t rely on the insert events for this older data because it doesn’t have them, how are we going to get this data into our shiny new History collection?
Explain more, how did it effect us, implications
E: How did bootstraping affect us
But how are we sure that it’s only going to write once and in order?
But what are the chances that the documents will be too large and cause issues?
Example of our large document solution
Phrase this as it’s super unlikely because you have so much space