General
MongoDB Cluster
• Sx write throughput.
• Rx read throughput.
• R/2 nodes can go down
without losing availability.
• Data can survive
destruction of R-1 nodes.
• S×R hardware &
maintenance cost.
TokuMX: MongoDB with Fractal Trees
• MongoDB fork.
• Compression, performance, transactions.
• Details about Fractal Trees after lunch.
TokuMX: MongoDB with Fractal Trees
• Read-free Replication
• Fast Updates
• Optimized Sharding Migrations
• Ark Consensus for Replication Failover
• Partitioned Collections
• Clustering Indexes & Primary Keys
• tokutek.com/tokumx
Fractal Tree
Performance Basics
Writes are cheap:
• O(1/B) I/Os per op.
• ≈10k/s
Reads are expensive:
• Ω(1) I/O per op.
• ≈100/s
Read-free Replication
Updates are reads + writes.
Secondaries can trust the primary,
only do writes.
Looking at I/O utilization,
secondaries are very cheap
compared to primaries.
A Traditional
TokuMX Cluster
• 9 machines, only 3x
throughput benefit.
• Secondaries are
under-utilized.
A TokuMX Cluster With
Read-free Replication
• 3x write throughput.
• 3x read throughput.
• (maybe separately)
A TokuMX Cluster With
Read-free Replication
• 1 node can go down
without losing availability.
A TokuMX Cluster With
Read-free Replication
• Data can survive
destruction of 2 nodes.
A TokuMX Cluster With
Read-free Replication
• Only 3x hardware cost,
down from 9x.
Dynamo Architecture
• Developed at Amazon.
• Used by Cassandra, Riak, Voldemort.
• Many components, I will focus on data
partitioning.
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dynamo Architecture
• Servers are equal peers, not separate
primaries and secondaries.
• Store overlapping subsets of data
(MongoDB shards store disjoint subsets).
• Data partitioning determined by
consistent hashing.
Dynamo Partitioning
• N servers in a ring.
• hash(K) is a location
around the ring.
• Store data for K on the
next R servers on the
ring.
Dynamo Partitioning
• All nodes accept writes:
~linear write scaling.
• Data replicated R times:
Rx read performance/
reliability.
Dynamo-style Sharding in TokuMX
• Each node is primary for some
chunks, secondary for others.
• Nodes store overlapping
subsets of the data set.
Dynamo-style Sharding in TokuMX
• S primaries in the ring:
Sx write throughput.
• R copies of each chunk on
separate machines:
Rx read throughput,
availability & recovery
guarantees.
Dynamo-style Sharding in TokuMX
• Adding a node:
– Move one secondary from each
of next 2 nodes to the new node.
– Initialize a new replica set on the
new node and next 2 nodes.
Future Work
Chunk balancer is not
sophisticated:
• Adding/removing machines is
rough, overloads the machine’s
neighbors.
• Can we use ideas from
Cassandra & Riak to improve
this?
MongoDB architecture
requires managing multiple
processes on each machine.
• We can do better with good
tools. Talk to me if you want to
write them.
Thanks!
Come to my talk after lunch for details about
Fractal Trees.
leif@tokutek.com
@leifwalsh
tokutek.com/tokumx
slidesha.re/13pxgH8