Working set Only one database is “heated” with inserts Only one database must be in RAM, others may go in and out with queries/map reduces.
Again, In a perfect world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size.
As the total data size goes up, and the capped collection kicks in: <click>
Remember, this is real disk backing this, so we must be absolutely 100% sure the database size always fits in the allocated disk space.
We don’t have room to “accidently” raise the blue line.
Same from before. It would be really nice if the documents per second would stay constant. <click>
Remember, we just settled for the diminished throughput because we couldn’t afford to scale up the node to the point our data would fit in RAM, making it look more like this <click>
Then we agreed that we really like the capped collection behavior, which made our graph look more like this. <click>
Apply a little bit of customization and we help mongo keep the most recent, dare I say the most important data in RAM.
We helped Mongo expire data, not on a per document level where it was forced to manage its giant B-TREE index, but on a “bucket” level, allowing Mongo to throw its data to the OS for removal.
Its noisy, but in general it maintains performance regardless of data size and whether it is working to prune data.
With only a little bit of effort, we scaled-out a single mongod, without scaling-up any of the physical resources.
The system hums along happily when each bucket is sized somewhere between one-quarter to one-third RAM.
This is MongoR implementation specific!
Follow standard Mongo practices like fstab parameters and numa control first.
Assumes the client application itself isn’t a HEAVY user of RAM.
Our implementation has the client application poll MongoR, checking if there is a “new” database handle.
If there are many client applications on each box, their polling may not be aligned and whenever that rotate happens, there may be 2 ‘active’ buckets for a period of minutes or hours or depending on how often the client polls for a new database.
There needs to be headroom for queries / aggregation / map reduce for the historical data to be pulled up into RAM without kicking the “warm” data out of RAM.
Remember the Saw tooth pattern
We set the rotation to occur overnight to have minimal impact around the possibility of 2 active buckets.
MongoDB allocates 2GB buckets at a time, so if the system absorbs more than 2GB between rotation checks, it could go “over” the allocated space.
We found no limit to the number of buckets. We run about 100 30GB buckets per pizza-box.
What do we want? I don’t think MongoR is the end-all solution to this problem. When I built this system, I got the feeling that a lot of people had this problem, but everyone dealt with it separately.
We formed a great relationship with our MongoDB contact who gave us enough hints that we should be concerned with our working set.
This behavior is valuable to us, I hope the behavior is valuable to others. The best case scenario is that we convince MongoDB to build a behavior set like this directly into MongoDB so I can abandon my implementation.
Leveraging MongoDB as a Data Store for Security Data