Your SlideShare is downloading. ×
0
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
VP/Engineering,
Tokutek
tim@tokutek.com
@tmcallaghan
Tokutek: Database Performance Engines
What is Tokutek?
Tokutek® offers high performance and scalability for MySQL,
MariaDB...
3
A Global Customer Base
Housekeeping
• This presentation will be available for replay following
the event
• We welcome your questions; please use ...
Agenda
• Describe use-cases that lead to well known pitfalls
• How can they be avoided?
• Test, Measure, and Analyze (benc...
6
Pitfalls - 1982
Pitfalls - 2013
What is TokuMX?
• TokuMX = MongoDB with improved storage
• Drop in replacement for MongoDB v2.4 applications
• Including r...
9
Pitfall 1 : Space
1a : Space
• MongoDB databases often grow quite large
• it easily allows users to...
• store large documents
• keep them a...
1a : Space : Avoidance
• TokuMX offers built-in compression
• 3 compression algorithms
• quicklz, zlib, lzma, (none)
• Eve...
• BitTorrent Peer Snapshot Data (~31 million documents)
• 3 Indexes : peer_id + created, torrent_snapshot_id + created, cr...
13
1a : Space : Analyze
size on disk, ~31 million inserts (lower is better)
14
1a : Space : Analyze
size on disk, ~31 million inserts (lower is better)
TokuMX achieved
11.6:1 compression
1b : Space
• MongoDB stores field names in each document
• Lots of redundant data
• When field names are long, documents m...
1b : Space : Avoidance
• Again, TokuMX offers built-in compression
• Field names are compressed along with values
• Compre...
1b : Space : Test
schema 1 - long field names (10/20/20)
{ first_name : “Tim”,
last_name : “Callaghan”,
email_address : “t...
1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
TokuMX is substantially
smaller, even without
com...
1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
In TokuMX, field name length
has almost no impact...
21
Pitfall 2 : Replication
2 : Replication
• MongoDB natively supports replication
• High availability
• Read scaling
• Shortcomings
• lag, resource ...
2 : Replication : Avoidance
• TokuMX replication allows secondary servers to process
replication without IO
• Simply injec...
2 : Replication : Test
• Sysbench
• Workload
• point + range queries, update, delete, insert
• 16 collections, 10mm rows, ...
25
2 : Replication : Analyze
Note: TokuMX @ 32 TPS, MongoDB @ 12TPS
26
Pitfall 3 : Declining Performance
3 : Declining Performance
• MongoDB insert/update/delete performance drops
dramatically when the indexes do not fit in mem...
3 : Declining Performance : Avoidance
28
• TokuMX runs on Tokutek’s Fractal Tree indexes
• Message buffers delay IO and re...
29
• indexed insertion workload (iibench)
• http://github.com/tmcallaghan/iibench-mongodb
{ dateandtime: <date-time>,
cash...
• 100mm inserts into a collection with 3 secondary indexes
30
3 : Declining Performance : Analyze
31
3 : Declining Performance : Analyze
• 100mm inserts into a collection with 3 secondary indexes
• Array Index Insertion (100 values per document)
32
3 : Declining Performance : Analyze
33
Pitfall 4 : Concurrency
4 : Concurrency
• MongoDB originally implemented a global write lock
• 1 writer at a time
• MongoDB v2.2 moved this lock t...
• TokuMX performs locking at the document level
• Extreme concurrency!
35
4 : Concurrency : Avoidance
instance
database da...
• Sysbench read-write workload
• point and range queries, update, delete, insert
• http://github.com/tmcallaghan/sysbench-...
37
4 : Concurrency : Analyze
38
4 : Concurrency : Analyze
39
Pitfall 5 : Transactions
5 : Got Transactions?
• MongoDB does not support “transactions”
• Each operation is visible to everyone
• There are work-a...
• ACID
• In MongoDB, multi-insertion operations allow for
partial success
• Asked to store 5 documents, 3 succeeded
• Toku...
• Transactions in TokuMX
• db.runCommand({“beginTransaction”})
• ... perform 1 or more operations
• db.runCommand(“rollbac...
Tokutek: Database Performance Engines
43
Any Questions?
Download TokuMX at www.tokutek.com/download
Register for product u...
Upcoming SlideShare
Loading in...5
×

5 Pitfalls to Avoid with MongoDB

2,957

Published on

Learn how 5 of the most common MongoDB pitfalls can be avoided with Tokutek's TokuMX.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,957
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
33
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Transcript of "5 Pitfalls to Avoid with MongoDB"

  1. 1. 5 Pitfalls to Avoid with MongoDB Tim Callaghan VP/Engineering, Tokutek tim@tokutek.com @tmcallaghan
  2. 2. Tokutek: Database Performance Engines What is Tokutek? Tokutek® offers high performance and scalability for MySQL, MariaDB and MongoDB. Our easy-to-use open source solutions are compatible with your existing code and application infrastructure. Tokutek Performance Engines Remove Limitations -Improve insertion performance by 20X -Reduce HDD and flash storage requirements up to 90% -No need to rewrite code Tokutek Mission: Empower your database to handle the Big Data requirements of today’s applications
  3. 3. 3 A Global Customer Base
  4. 4. Housekeeping • This presentation will be available for replay following the event • We welcome your questions; please use the console on the right of your screen and we will answer following the presentation • A copy of the presentation is available upon request
  5. 5. Agenda • Describe use-cases that lead to well known pitfalls • How can they be avoided? • Test, Measure, and Analyze (benchmark)
  6. 6. 6 Pitfalls - 1982
  7. 7. Pitfalls - 2013
  8. 8. What is TokuMX? • TokuMX = MongoDB with improved storage • Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial • Open Source – http://github.com/Tokutek/mongo
  9. 9. 9 Pitfall 1 : Space
  10. 10. 1a : Space • MongoDB databases often grow quite large • it easily allows users to... • store large documents • keep them around for a long time • de-normalized data needs more space • Operational challenges • Big disks are cheap, but not fast • Cloud storage is even slower • Fast disks (flash) are expensive • Backups are large as well • Unfortunately, MongoDB does not offer compression • goal = use less disk/flash
  11. 11. 1a : Space : Avoidance • TokuMX offers built-in compression • 3 compression algorithms • quicklz, zlib, lzma, (none) • Everything is compressed • Field names and values • Secondary indexes too
  12. 12. • BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created { id: 1, peer_id: 9222, torrent_snapshot_id: 4, upload_speed: 0.0000, download_speed: 0.0000, payload_upload_speed: 0.0000, payload_download_speed: 0.0000, total_upload: 0, total_download: 0, fail_count: 0, hashfail_count: 0, progress: 0.0000, created: "2008-10-28 01:57:35" } http://cs.brown.edu/~pavlo/torrent/ 12 1a : Space : Test
  13. 13. 13 1a : Space : Analyze size on disk, ~31 million inserts (lower is better)
  14. 14. 14 1a : Space : Analyze size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression
  15. 15. 1b : Space • MongoDB stores field names in each document • Lots of redundant data • When field names are long, documents may contain more field name data than actual values • Google “mongodb long field names” • Lots of blogs and advice • ... but descriptive schemas are useful!
  16. 16. 1b : Space : Avoidance • Again, TokuMX offers built-in compression • Field names are compressed along with values • Compression algorithms love redundant data • Be descriptive and toss that data dictionary! • Who knows what is in field “zq”, not me?
  17. 17. 1b : Space : Test schema 1 - long field names (10/20/20) { first_name : “Tim”, last_name : “Callaghan”, email_address : “tim@tokutek.com” } schema 2 - short field names (26 less bytes per doc) { fn : “Tim”, ln : “Callaghan”, ea : “tim@tokutek.com” }
  18. 18. 1b : Space : Analyze size on disk, 100 million inserts (lower is better)
  19. 19. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression
  20. 20. 1b : Space : Analyze size on disk, 100 million inserts (lower is better) In TokuMX, field name length has almost no impact on size due to compression MongoDB was ~10% smaller
  21. 21. 21 Pitfall 2 : Replication
  22. 22. 2 : Replication • MongoDB natively supports replication • High availability • Read scaling • Shortcomings • lag, resource consumption on secondaries • Recommended reading • http://blog.mongolab.com/2013/03/replication- lag-the-facts-of-life/
  23. 23. 2 : Replication : Avoidance • TokuMX replication allows secondary servers to process replication without IO • Simply injecting messages into the Fractal Tree Indexes on the secondary server • The “Hard Work” was done on the primary • Read-before-write • Uniqueness checking • Elimination of replication lag • Your secondaries are fully available for read scaling! • Run multiple secondaries on a single server 23
  24. 24. 2 : Replication : Test • Sysbench • Workload • point + range queries, update, delete, insert • 16 collections, 10mm rows, 16GB RAM • Setup • loaded data on single server • shutdown and copied data folder • created secondary • Ran benchmark 24
  25. 25. 25 2 : Replication : Analyze Note: TokuMX @ 32 TPS, MongoDB @ 12TPS
  26. 26. 26 Pitfall 3 : Declining Performance
  27. 27. 3 : Declining Performance • MongoDB insert/update/delete performance drops dramatically when the indexes do not fit in memory • Operations are limited by IOPs • Generally 1 operation per available IO • Less if secondary index maintenance, 1 IO for each • Solution: Add RAM or Shard.
  28. 28. 3 : Declining Performance : Avoidance 28 • TokuMX runs on Tokutek’s Fractal Tree indexes • Message buffers delay IO and reduce cache disruption • Perform many operations per IO • Many workloads don’t need additional memory or sharding, they just need better indexing • RAM = $$$ • Sharding = $$$ + Complexity
  29. 29. 29 • indexed insertion workload (iibench) • http://github.com/tmcallaghan/iibench-mongodb { dateandtime: <date-time>, cashregisterid: 1..1000, customerid: 1..100000, productid: 1..10000, price: <double> } • insert only, 1000 documents per insert, 100 million inserts • indexes • price + customerid • cashregister + price + customerid • price + dateandtime + customerid 3 : Declining Performance : Test
  30. 30. • 100mm inserts into a collection with 3 secondary indexes 30 3 : Declining Performance : Analyze
  31. 31. 31 3 : Declining Performance : Analyze • 100mm inserts into a collection with 3 secondary indexes
  32. 32. • Array Index Insertion (100 values per document) 32 3 : Declining Performance : Analyze
  33. 33. 33 Pitfall 4 : Concurrency
  34. 34. 4 : Concurrency • MongoDB originally implemented a global write lock • 1 writer at a time • MongoDB v2.2 moved this lock to the database level • 1 writer at a time in each database • This severely limits the write performance of servers • 36 shards on 1 server example • Allows for more concurrency • High operational complexity • Google “mongodb multiple shards same server”
  35. 35. • TokuMX performs locking at the document level • Extreme concurrency! 35 4 : Concurrency : Avoidance instance database database collection collection collection collection document document document document document document document document document document MongoDB v2.2 MongoDB v2.0 TokuM X
  36. 36. • Sysbench read-write workload • point and range queries, update, delete, insert • http://github.com/tmcallaghan/sysbench-mongodb { _id: 1..10000000, k: 1..10000000, c: <120 char random string ###-###-###>, pad: <60 char random string ###-###-###>} 36 4 : Concurrency : Test
  37. 37. 37 4 : Concurrency : Analyze
  38. 38. 38 4 : Concurrency : Analyze
  39. 39. 39 Pitfall 5 : Transactions
  40. 40. 5 : Got Transactions? • MongoDB does not support “transactions” • Each operation is visible to everyone • There are work-arounds, Google “mongodb transactions” • http://docs.mongodb.org/manual/tutorial/perform-two- phase-commits/ This document provides a pattern for doing multi-document updates or “transactions” using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback like functionality. (the document is 8 web pages long) • MongoDB does not support multi-version concurrency control (MVCC) • Readers do not get a consistent view of the data, as they can be interrupted by writers • People try, Google “mongodb mvcc”
  41. 41. • ACID • In MongoDB, multi-insertion operations allow for partial success • Asked to store 5 documents, 3 succeeded • TokuMX offers “all or nothing” behavior • Document level locking • MVCC • In MongoDB, queries can be interrupted by writers. • The effect of these writers are visible to the reader • TokuMX offers MVCC • Reads are consistent as of the operation start 41 5 : Transactions : Avoidance
  42. 42. • Transactions in TokuMX • db.runCommand({“beginTransaction”}) • ... perform 1 or more operations • db.runCommand(“rollbackTransaction”) | db.runCommand(“commitTransaction”) • Note: not available in sharded environments • For more information • http://www.tokutek.com/2013/04/mongodb-transactions-yes/ • http://www.tokutek.com/2013/04/mongodb-multi-statement- transactions-yes-we-can/ 42 5 : Transactions : Avoidance
  43. 43. Tokutek: Database Performance Engines 43 Any Questions? Download TokuMX at www.tokutek.com/download Register for product updates, access to premium content, and invitations at www.tokutek.com Join the Conversation
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×