Introducing TokuMX:
The Performance Engine for
MongoDB
Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwal...
What is TokuMX?
!

• TokuMX = MongoDB with improved storage
!

• Drop in replacement for MongoDB v2.4 applications
• Inclu...
B-tree Limitations
Performance is IO limited when bigger than RAM:	

try to fit all internal nodes and some leaf nodes
RAM
...
TokuMX : Indexed Insertion

4

®
TokuMX : Indexed Insertion

5

®
TokuMX : Concurrency (>RAM)

6

®
TokuMX : Concurrency (<RAM)

7

®
TokuMX : Raw Compression
bittorrent data, size on disk, ~31 million inserts (lower is better)

TokuMX achieved	

11.6:1 co...
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

TokuMX is substant...
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

MongoDB was ~10%
s...
TokuMX : ACID + MVCC
• ACID
– In MongoDB, multi-insertion operations allow for partial
success
o Asked to store 5 document...
Questions?

Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh

®
TokuMX : Indexed Insertion
!
•

indexed insertion workload (iibench)
• http://github.com/tmcallaghan/iibench-mongodb

!
{ ...
TokuMX : Concurrency
!

• Sysbench read-write workload
• point and range queries, update, delete, insert
• http://github.c...
TokuMX : Raw Compression
• BitTorrent Peer Snapshot Data (~31 million documents)
• 3 Indexes : peer_id + created, torrent_...
TokuMX : Compression : Field Names
!

schema 1 - long field names (10/20/20)
{ first_name
: “Tim”, !
last_name
: “Callagha...
Upcoming SlideShare
Loading in …5
×

Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

1,303
-1

Published on

Talk given to NYC.rb meetup on 2013-12-10 about TokuMX, a replacement engine for MongoDB.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,303
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

  1. 1. Introducing TokuMX: The Performance Engine for MongoDB Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  2. 2. What is TokuMX? ! • TokuMX = MongoDB with improved storage ! • Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial ! • Open Source – http://github.com/Tokutek/mongo ®
  3. 3. B-tree Limitations Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes RAM 22 10 99 RAM DISK 2, 3, 4 10,20 22,25 99 Plus, mmap. ®
  4. 4. TokuMX : Indexed Insertion 4 ®
  5. 5. TokuMX : Indexed Insertion 5 ®
  6. 6. TokuMX : Concurrency (>RAM) 6 ®
  7. 7. TokuMX : Concurrency (<RAM) 7 ®
  8. 8. TokuMX : Raw Compression bittorrent data, size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression 8 ®
  9. 9. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression 9 ®
  10. 10. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) MongoDB was ~10% smaller In TokuMX, field name length has almost no impact on size due to compression 10 ®
  11. 11. TokuMX : ACID + MVCC • ACID – In MongoDB, multi-insertion operations allow for partial success o Asked to store 5 documents, 3 succeeded – In TokuMX, offer “all or nothing” behavior (atomic) • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader – We offer MVCC o Reads are consistent as of the operation start 11 ®
  12. 12. Questions? Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  13. 13. TokuMX : Indexed Insertion ! • indexed insertion workload (iibench) • http://github.com/tmcallaghan/iibench-mongodb ! { dateandtime: <date-time>,! cashregisterid: 1..1000,! customerid: 1..100000,! productid: 1..10000,! price: <double> }! ! • • insert only, 1000 documents per insert, 100 million inserts indexes • price + customerid • cashregister + price + customerid • price + dateandtime + customerid ! 13 ®
  14. 14. TokuMX : Concurrency ! • Sysbench read-write workload • point and range queries, update, delete, insert • http://github.com/tmcallaghan/sysbench-mongodb ! { _id: 1..10000000,! k: 1..10000000,! c: <120 char random string ###-###-###>,! pad: <60 char random string ###-###-###>} 14 ®
  15. 15. TokuMX : Raw Compression • BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created ! {                         id: 1,! peer_id: 9222,! torrent_snapshot_id: 4,! upload_speed: 0.0000,! download_speed: 0.0000,! payload_upload_speed: 0.0000,! payload_download_speed: 0.0000,! total_upload: 0,! total_download: 0,! fail_count: 0,! hashfail_count: 0,! progress: 0.0000,! created: "2008-10-28 01:57:35" }! ! http://cs.brown.edu/~pavlo/torrent/ 15 ®
  16. 16. TokuMX : Compression : Field Names ! schema 1 - long field names (10/20/20) { first_name : “Tim”, ! last_name : “Callaghan”, ! email_address : “tim@tokutek.com” } ! schema 2 { fn : ln : ea : - short field names (26 less bytes per doc) “Tim”, ! “Callaghan”, ! “tim@tokutek.com” } ! 16 ®

×