5 Pitfalls to Avoid with MongoDB

  • 1,369 views
Uploaded on

Learn how 5 of the most common MongoDB pitfalls can be avoided with Tokutek's TokuMX.

Learn how 5 of the most common MongoDB pitfalls can be avoided with Tokutek's TokuMX.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,369
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
20
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 5 Pitfalls to Avoid with MongoDB Tim Callaghan VP/Engineering, Tokutek tim@tokutek.com @tmcallaghan
  • 2. Tokutek: Database Performance Engines What is Tokutek? Tokutek® offers high performance and scalability for MySQL, MariaDB and MongoDB. Our easy-to-use open source solutions are compatible with your existing code and application infrastructure. Tokutek Performance Engines Remove Limitations -Improve insertion performance by 20X -Reduce HDD and flash storage requirements up to 90% -No need to rewrite code Tokutek Mission: Empower your database to handle the Big Data requirements of today’s applications
  • 3. 3 A Global Customer Base
  • 4. Housekeeping • This presentation will be available for replay following the event • We welcome your questions; please use the console on the right of your screen and we will answer following the presentation • A copy of the presentation is available upon request
  • 5. Agenda • Describe use-cases that lead to well known pitfalls • How can they be avoided? • Test, Measure, and Analyze (benchmark)
  • 6. 6 Pitfalls - 1982
  • 7. Pitfalls - 2013
  • 8. What is TokuMX? • TokuMX = MongoDB with improved storage • Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial • Open Source – http://github.com/Tokutek/mongo
  • 9. 9 Pitfall 1 : Space
  • 10. 1a : Space • MongoDB databases often grow quite large • it easily allows users to... • store large documents • keep them around for a long time • de-normalized data needs more space • Operational challenges • Big disks are cheap, but not fast • Cloud storage is even slower • Fast disks (flash) are expensive • Backups are large as well • Unfortunately, MongoDB does not offer compression • goal = use less disk/flash
  • 11. 1a : Space : Avoidance • TokuMX offers built-in compression • 3 compression algorithms • quicklz, zlib, lzma, (none) • Everything is compressed • Field names and values • Secondary indexes too
  • 12. • BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created { id: 1, peer_id: 9222, torrent_snapshot_id: 4, upload_speed: 0.0000, download_speed: 0.0000, payload_upload_speed: 0.0000, payload_download_speed: 0.0000, total_upload: 0, total_download: 0, fail_count: 0, hashfail_count: 0, progress: 0.0000, created: "2008-10-28 01:57:35" } http://cs.brown.edu/~pavlo/torrent/ 12 1a : Space : Test
  • 13. 13 1a : Space : Analyze size on disk, ~31 million inserts (lower is better)
  • 14. 14 1a : Space : Analyze size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression
  • 15. 1b : Space • MongoDB stores field names in each document • Lots of redundant data • When field names are long, documents may contain more field name data than actual values • Google “mongodb long field names” • Lots of blogs and advice • ... but descriptive schemas are useful!
  • 16. 1b : Space : Avoidance • Again, TokuMX offers built-in compression • Field names are compressed along with values • Compression algorithms love redundant data • Be descriptive and toss that data dictionary! • Who knows what is in field “zq”, not me?
  • 17. 1b : Space : Test schema 1 - long field names (10/20/20) { first_name : “Tim”, last_name : “Callaghan”, email_address : “tim@tokutek.com” } schema 2 - short field names (26 less bytes per doc) { fn : “Tim”, ln : “Callaghan”, ea : “tim@tokutek.com” }
  • 18. 1b : Space : Analyze size on disk, 100 million inserts (lower is better)
  • 19. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression
  • 20. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) In TokuMX, field name length has almost no impact on size due to compression MongoDB was ~10% smaller
  • 21. 21 Pitfall 2 : Replication
  • 22. 2 : Replication • MongoDB natively supports replication • High availability • Read scaling • Shortcomings • lag, resource consumption on secondaries • Recommended reading • http://blog.mongolab.com/2013/03/replication- lag-the-facts-of-life/
  • 23. 2 : Replication : Avoidance • TokuMX replication allows secondary servers to process replication without IO • Simply injecting messages into the Fractal Tree Indexes on the secondary server • The “Hard Work” was done on the primary • Read-before-write • Uniqueness checking • Elimination of replication lag • Your secondaries are fully available for read scaling! • Run multiple secondaries on a single server 23
  • 24. 2 : Replication : Test • Sysbench • Workload • point + range queries, update, delete, insert • 16 collections, 10mm rows, 16GB RAM • Setup • loaded data on single server • shutdown and copied data folder • created secondary • Ran benchmark 24
  • 25. 25 2 : Replication : Analyze Note: TokuMX @ 32 TPS, MongoDB @ 12TPS
  • 26. 26 Pitfall 3 : Declining Performance
  • 27. 3 : Declining Performance • MongoDB insert/update/delete performance drops dramatically when the indexes do not fit in memory • Operations are limited by IOPs • Generally 1 operation per available IO • Less if secondary index maintenance, 1 IO for each • Solution: Add RAM or Shard.
  • 28. 3 : Declining Performance : Avoidance 28 • TokuMX runs on Tokutek’s Fractal Tree indexes • Message buffers delay IO and reduce cache disruption • Perform many operations per IO • Many workloads don’t need additional memory or sharding, they just need better indexing • RAM = $$$ • Sharding = $$$ + Complexity
  • 29. 29 • indexed insertion workload (iibench) • http://github.com/tmcallaghan/iibench-mongodb { dateandtime: <date-time>, cashregisterid: 1..1000, customerid: 1..100000, productid: 1..10000, price: <double> } • insert only, 1000 documents per insert, 100 million inserts • indexes • price + customerid • cashregister + price + customerid • price + dateandtime + customerid 3 : Declining Performance : Test
  • 30. • 100mm inserts into a collection with 3 secondary indexes 30 3 : Declining Performance : Analyze
  • 31. 31 3 : Declining Performance : Analyze • 100mm inserts into a collection with 3 secondary indexes
  • 32. • Array Index Insertion (100 values per document) 32 3 : Declining Performance : Analyze
  • 33. 33 Pitfall 4 : Concurrency
  • 34. 4 : Concurrency • MongoDB originally implemented a global write lock • 1 writer at a time • MongoDB v2.2 moved this lock to the database level • 1 writer at a time in each database • This severely limits the write performance of servers • 36 shards on 1 server example • Allows for more concurrency • High operational complexity • Google “mongodb multiple shards same server”
  • 35. • TokuMX performs locking at the document level • Extreme concurrency! 35 4 : Concurrency : Avoidance instance database database collection collection collection collection document document document document document document document document document document MongoDB v2.2 MongoDB v2.0 TokuM X
  • 36. • Sysbench read-write workload • point and range queries, update, delete, insert • http://github.com/tmcallaghan/sysbench-mongodb { _id: 1..10000000, k: 1..10000000, c: <120 char random string ###-###-###>, pad: <60 char random string ###-###-###>} 36 4 : Concurrency : Test
  • 37. 37 4 : Concurrency : Analyze
  • 38. 38 4 : Concurrency : Analyze
  • 39. 39 Pitfall 5 : Transactions
  • 40. 5 : Got Transactions? • MongoDB does not support “transactions” • Each operation is visible to everyone • There are work-arounds, Google “mongodb transactions” • http://docs.mongodb.org/manual/tutorial/perform-two- phase-commits/ This document provides a pattern for doing multi-document updates or “transactions” using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality. (the document is 8 web pages long) • MongoDB does not support multi-version concurrency control (MVCC) • Readers do not get a consistent view of the data, as they can be interrupted by writers • People try, Google “mongodb mvcc”
  • 41. • ACID • In MongoDB, multi-insertion operations allow for partial success • Asked to store 5 documents, 3 succeeded • TokuMX offers “all or nothing” behavior • Document level locking • MVCC • In MongoDB, queries can be interrupted by writers. • The effect of these writers are visible to the reader • TokuMX offers MVCC • Reads are consistent as of the operation start 41 5 : Transactions : Avoidance
  • 42. • Transactions in TokuMX • db.runCommand({“beginTransaction”}) • ... perform 1 or more operations • db.runCommand(“rollbackTransaction”) | db.runCommand(“commitTransaction”) • Note: not available in sharded environments • For more information • http://www.tokutek.com/2013/04/mongodb-transactions-yes/ • http://www.tokutek.com/2013/04/mongodb-multi-statement- transactions-yes-we-can/ 42 5 : Transactions : Avoidance
  • 43. Tokutek: Database Performance Engines 43 Any Questions? Download TokuMX at www.tokutek.com/download Register for product updates, access to premium content, and invitations at www.tokutek.com Join the Conversation