Lesson: Know Your Hardware<br />MongoDB on blades really sucks<br />Single 10k RPM disks can’t take it when data is noticeably larger than RAM<br />Mongo operations can hit the client timeout (30 sec default)<br />Even minutely cron jobs start to spew<br />Lots of time wasted in development environment, trying different kernels, tuning, etc.<br />Most noticeable during heavy writes but can happen if pages fall out of RAM for other reasons<br />
Lesson: Replica Sets Rock<br />Lots of reboots happened during dev environment troubleshooting<br />Each time, one of the remaining nodes took over<br />No “reclone” no config file or DNS changes<br />Stuff “just worked” while nodes bounced up and down<br />
Lesson: Know Your Data<br />MongoDB is UTF-8<br />Some of our older data is decidedly NOT UTF-8<br />We have lots of sloppy encoding issues to clean up. But we had to clean them all up.<br />Start data load. Wait 12-36 hours. Witness fail. Fix code. Start over. Sigh.<br />This is a combination of having been sloppy and having old data. Even with a lot less history, this can bite you. Get your encoding house in order!<br />
Lesson: Know Your Data Size<br />MongoDB has a doc size limits<br />4MB in 1.6.x, 16MB in 1.8.x<br />What to do with outliers?<br />In our case, trim off some useless data.<br />But going from relational to document means this sort of problem is easy to have. One parent, many children.<br />It’d be nice if this was easier to change, but clients have it hard-coded too.<br />Compression would help, of course.<br />
Lesson: Know Your Data Types<br />Field Types and Conversions can be expensive to do after the fact!<br />MongoDB treats strings and numbers differently, but some programming languages (such as Perl) don’t make that distinction obvious<br />This has indexing implications when you later look for 123456789 but had unknowingly stored “123456789”<br />http://search.cpan.org/dist/MongoDB/lib/MongoDB/DataTypes.pod<br />
Data Types, continued<br />“If the type of a field is ambiguous and important to your application, you should document what you expect the application to send to the database and convert your data to those types before sending.”<br />Do you know how to do that in your language of choice?<br />Some drivers may make a “guess” that gets it right most of the time.<br />
Lesson: Know SomeSharding<br />The Balancer can be your frenemy<br />Initial insert rate: 8,000/sec<br />Later drops to 200/sec<br />Too much time spent waiting to page in data that’s going to be sent to another node and never looked at (locally) again<br />Pre-split your data if possible<br />http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/<br />
Lesson: Know Some Replica Sets<br />Replica Set re-sync requires index rebuilds on the secondary<br />Most painful when a slave is down too long and can’t catch up using the oplog<br />Typically during high write volumes<br />In a large data set, the index rebuilding can take a couple of days w/out many indexes<br />What if you lose another while that is happening?<br />
MongoDBWishlist<br />Replica set node re-sync without out index rebuilding<br />Record (or field) compression (not everyone uses a filesystem that offers compression)<br />Method to tap into the oplog so that changes can be fed to external indexers (Sphinx, Redis, etc.)<br />Hash-based sharding (coming soon?)<br />Cluster snapshot/backup tool<br />
craigslist is hiring!<br />send resumes to: firstname.lastname@example.org<br />Plain Text or PDF, no Word Docs!<br />Laid back, non-corporateenvironment<br />Engineering driven culture<br />Lots of interesting technical challenges<br />Easy SF commute<br />Excellent benefits and pay<br />High-impact work<br />Millions use craigslist daily<br />
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.