SlideShare a Scribd company logo
Introducing TokuMX:
The Performance Engine for
MongoDB
Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh
®
What is TokuMX?
!

• TokuMX = MongoDB with improved storage
!

• Drop in replacement for MongoDB v2.4 applications
• Including replication and sharding
• Same data model
• Same query language
• Drivers just work
• No Full Text or Geospatial
!

• Open Source
– http://github.com/Tokutek/mongo

®
B-tree Limitations
Performance is IO limited when bigger than RAM:	

try to fit all internal nodes and some leaf nodes
RAM

22

10

99

RAM

DISK
2, 3, 4

10,20

22,25

99

Plus, mmap.
®
TokuMX : Indexed Insertion

4

®
TokuMX : Indexed Insertion

5

®
TokuMX : Concurrency (>RAM)

6

®
TokuMX : Concurrency (<RAM)

7

®
TokuMX : Raw Compression
bittorrent data, size on disk, ~31 million inserts (lower is better)

TokuMX achieved	

11.6:1 compression

8

®
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

TokuMX is substantially
smaller, even without
compression

9

®
TokuMX : Compression : Field Names
synthetic data, size on disk, 100 million inserts (lower is better)

MongoDB was ~10%
smaller
In TokuMX, field name length has
almost no impact on size due to
compression

10

®
TokuMX : ACID + MVCC
• ACID
– In MongoDB, multi-insertion operations allow for partial
success
o Asked to store 5 documents, 3 succeeded

– In TokuMX, offer “all or nothing” behavior (atomic)

• MVCC
– In MongoDB, queries can be interrupted by writers.
o The effect of these writers are visible to the reader

– We offer MVCC
o Reads are consistent as of the operation start

11

®
Questions?

Leif Walsh	

Senior Engineer, Tokutek	

leif@tokutek.com	

@leifwalsh

®
TokuMX : Indexed Insertion
!
•

indexed insertion workload (iibench)
• http://github.com/tmcallaghan/iibench-mongodb

!
{ dateandtime: <date-time>,!
cashregisterid: 1..1000,!
customerid: 1..100000,!
productid: 1..10000,!
price: <double> }!

!
•
•

insert only, 1000 documents per insert, 100 million inserts
indexes
• price + customerid
• cashregister + price + customerid
• price + dateandtime + customerid

!

13

®
TokuMX : Concurrency
!

• Sysbench read-write workload
• point and range queries, update, delete, insert
• http://github.com/tmcallaghan/sysbench-mongodb
!

{ _id: 1..10000000,!
k: 1..10000000,!
c: <120 char random string ###-###-###>,!
pad: <60 char random string ###-###-###>}

14

®
TokuMX : Raw Compression
• BitTorrent Peer Snapshot Data (~31 million documents)
• 3 Indexes : peer_id + created, torrent_snapshot_id + created, created

!
{
 
 
 
 
 
 
 
 
 
 
 
 

id: 1,!
peer_id: 9222,!
torrent_snapshot_id: 4,!
upload_speed: 0.0000,!
download_speed: 0.0000,!
payload_upload_speed: 0.0000,!
payload_download_speed: 0.0000,!
total_upload: 0,!
total_download: 0,!
fail_count: 0,!
hashfail_count: 0,!
progress: 0.0000,!
created: "2008-10-28 01:57:35" }!

!
http://cs.brown.edu/~pavlo/torrent/

15

®
TokuMX : Compression : Field Names
!

schema 1 - long field names (10/20/20)
{ first_name
: “Tim”, !
last_name
: “Callaghan”, !
email_address : “tim@tokutek.com” }
!

schema 2
{ fn :
ln :
ea :

- short field names (26 less bytes per doc)
“Tim”, !
“Callaghan”, !
“tim@tokutek.com” }

!

16

®

More Related Content

What's hot

Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
N Masahiro
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
Yiran Wang
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingErik Osterman
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
Ceph Community
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 
Automation m ysql_and_customer_photo
Automation m ysql_and_customer_photoAutomation m ysql_and_customer_photo
Automation m ysql_and_customer_photo
Manju Kb
 
Node.js
Node.jsNode.js
Node.js
hotrannam
 
Building an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with AlluxioBuilding an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with Alluxio
Alluxio, Inc.
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
N Masahiro
 
Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)
Viktor Turskyi
 
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
PLNOG 4: Ela Jasińska -  (Ab)Using Route ServersPLNOG 4: Ela Jasińska -  (Ab)Using Route Servers
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
PROIDEA
 
gRPC & Kubernetes
gRPC & KubernetesgRPC & Kubernetes
gRPC & Kubernetes
Kausal
 
PHP at Density and Scale
PHP at Density and ScalePHP at Density and Scale
PHP at Density and Scale
David Timothy Strauss
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programminghotrannam
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
C4Media
 

What's hot (19)

Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Automation m ysql_and_customer_photo
Automation m ysql_and_customer_photoAutomation m ysql_and_customer_photo
Automation m ysql_and_customer_photo
 
Node.js
Node.jsNode.js
Node.js
 
Building an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with AlluxioBuilding an Efficient AI Training Platform at bilibili with Alluxio
Building an Efficient AI Training Platform at bilibili with Alluxio
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
 
Node.js and Ruby
Node.js and RubyNode.js and Ruby
Node.js and Ruby
 
Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)
 
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
PLNOG 4: Ela Jasińska -  (Ab)Using Route ServersPLNOG 4: Ela Jasińska -  (Ab)Using Route Servers
PLNOG 4: Ela Jasińska - (Ab)Using Route Servers
 
gRPC & Kubernetes
gRPC & KubernetesgRPC & Kubernetes
gRPC & Kubernetes
 
PHP at Density and Scale
PHP at Density and ScalePHP at Density and Scale
PHP at Density and Scale
 
HTML5 Programming
HTML5 ProgrammingHTML5 Programming
HTML5 Programming
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 

Viewers also liked

Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
Rogue Wave Software
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
Francisco Zamora-Martinez
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
Francisco Zamora-Martinez
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and Analysis
Dhrumil Patel
 
Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshesTomáš Hnilica
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
Francisco Zamora-Martinez
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
Francisco Zamora-Martinez
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithmsrajatmay1992
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
leifwalsh
 
Making Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableMaking Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and Dependable
Jason Riedy
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
Milind Gokhale
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
leifwalsh
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
leifwalsh
 
PhD defence
PhD defencePhD defence
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
Sri Ambati
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
Francisco Zamora-Martinez
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
leifwalsh
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Francisco Zamora-Martinez
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
Francisco Zamora-Martinez
 
The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compression
leifwalsh
 

Viewers also liked (20)

Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and Analysis
 
Visualization of large FEM meshes
Visualization of large FEM meshesVisualization of large FEM meshes
Visualization of large FEM meshes
 
Efficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based modelsEfficient Viterbi algorithms for lexical tree based models
Efficient Viterbi algorithms for lexical tree based models
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Making Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and DependableMaking Static Pivoting Scalable and Dependable
Making Static Pivoting Scalable and Dependable
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
PhD defence
PhD defencePhD defence
PhD defence
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 
The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compression
 

Similar to Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
Tim Callaghan
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
Tim Callaghan
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
Francisco Gonçalves
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
Tim Callaghan
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
Tim Callaghan
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
Tim Callaghan
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
Tim Callaghan
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
Amazon Web Services Korea
 
Engage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-RedEngage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-Red
Paul Withers
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible Disasters
MongoDB
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
Brian Enochson
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
confluent
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Ontico
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
Jonas Brømsø
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
johnrjenson
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharpSerdar Buyuktemiz
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
Massimo Sgrelli
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
emiltamas
 

Similar to Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10) (20)

Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue[AWS Builders] Effective AWS Glue
[AWS Builders] Effective AWS Glue
 
Engage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-RedEngage 2019: Introduction to Node-Red
Engage 2019: Introduction to Node-Red
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible Disasters
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 

Recently uploaded

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

  • 1. Introducing TokuMX: The Performance Engine for MongoDB Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  • 2. What is TokuMX? ! • TokuMX = MongoDB with improved storage ! • Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial ! • Open Source – http://github.com/Tokutek/mongo ®
  • 3. B-tree Limitations Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes RAM 22 10 99 RAM DISK 2, 3, 4 10,20 22,25 99 Plus, mmap. ®
  • 4. TokuMX : Indexed Insertion 4 ®
  • 5. TokuMX : Indexed Insertion 5 ®
  • 6. TokuMX : Concurrency (>RAM) 6 ®
  • 7. TokuMX : Concurrency (<RAM) 7 ®
  • 8. TokuMX : Raw Compression bittorrent data, size on disk, ~31 million inserts (lower is better) TokuMX achieved 11.6:1 compression 8 ®
  • 9. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) TokuMX is substantially smaller, even without compression 9 ®
  • 10. TokuMX : Compression : Field Names synthetic data, size on disk, 100 million inserts (lower is better) MongoDB was ~10% smaller In TokuMX, field name length has almost no impact on size due to compression 10 ®
  • 11. TokuMX : ACID + MVCC • ACID – In MongoDB, multi-insertion operations allow for partial success o Asked to store 5 documents, 3 succeeded – In TokuMX, offer “all or nothing” behavior (atomic) • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader – We offer MVCC o Reads are consistent as of the operation start 11 ®
  • 12. Questions? Leif Walsh Senior Engineer, Tokutek leif@tokutek.com @leifwalsh ®
  • 13. TokuMX : Indexed Insertion ! • indexed insertion workload (iibench) • http://github.com/tmcallaghan/iibench-mongodb ! { dateandtime: <date-time>,! cashregisterid: 1..1000,! customerid: 1..100000,! productid: 1..10000,! price: <double> }! ! • • insert only, 1000 documents per insert, 100 million inserts indexes • price + customerid • cashregister + price + customerid • price + dateandtime + customerid ! 13 ®
  • 14. TokuMX : Concurrency ! • Sysbench read-write workload • point and range queries, update, delete, insert • http://github.com/tmcallaghan/sysbench-mongodb ! { _id: 1..10000000,! k: 1..10000000,! c: <120 char random string ###-###-###>,! pad: <60 char random string ###-###-###>} 14 ®
  • 15. TokuMX : Raw Compression • BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created ! {                         id: 1,! peer_id: 9222,! torrent_snapshot_id: 4,! upload_speed: 0.0000,! download_speed: 0.0000,! payload_upload_speed: 0.0000,! payload_download_speed: 0.0000,! total_upload: 0,! total_download: 0,! fail_count: 0,! hashfail_count: 0,! progress: 0.0000,! created: "2008-10-28 01:57:35" }! ! http://cs.brown.edu/~pavlo/torrent/ 15 ®
  • 16. TokuMX : Compression : Field Names ! schema 1 - long field names (10/20/20) { first_name : “Tim”, ! last_name : “Callaghan”, ! email_address : “tim@tokutek.com” } ! schema 2 { fn : ln : ea : - short field names (26 less bytes per doc) “Tim”, ! “Callaghan”, ! “tim@tokutek.com” } ! 16 ®