Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

1,659 views

Published on

Доклад Лейфа Уолша на HighLoad++ 2014.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)

  1. 1. A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @leifwalsh
  2. 2. A Traditional MongoDB Cluster • 3 shards. • 3 replicas per shard.
  3. 3. A Traditional MongoDB Cluster • 3x write throughput. • 3x read throughput.
  4. 4. A Traditional MongoDB Cluster • 1 node can go down without losing availability.
  5. 5. A Traditional MongoDB Cluster • Data can survive destruction of 2 nodes.
  6. 6. General MongoDB Cluster • Sx write throughput. • Rx read throughput. • R/2 nodes can go down without losing availability. • Data can survive destruction of R-1 nodes. • S×R hardware & maintenance cost.
  7. 7. TokuMX: MongoDB with Fractal Trees • MongoDB fork. • Compression, performance, transactions. • Details about Fractal Trees after lunch.
  8. 8. TokuMX: MongoDB with Fractal Trees • Read-free Replication • Fast Updates • Optimized Sharding Migrations • Ark Consensus for Replication Failover • Partitioned Collections • Clustering Indexes & Primary Keys • tokutek.com/tokumx
  9. 9. Fractal Tree Performance Basics Writes are cheap: • O(1/B) I/Os per op. • ≈10k/s Reads are expensive: • Ω(1) I/O per op. • ≈100/s
  10. 10. Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes.
  11. 11. Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes. Looking at I/O utilization, secondaries are very cheap compared to primaries.
  12. 12. A Traditional TokuMX Cluster • 9 machines, only 3x throughput benefit. • Secondaries are under-utilized.
  13. 13. A TokuMX Cluster With Read-free Replication • 3x write throughput. • 3x read throughput. • (maybe separately)
  14. 14. A TokuMX Cluster With Read-free Replication • 1 node can go down without losing availability.
  15. 15. A TokuMX Cluster With Read-free Replication • Data can survive destruction of 2 nodes.
  16. 16. A TokuMX Cluster With Read-free Replication • Only 3x hardware cost, down from 9x.
  17. 17. Dynamo Architecture • Developed at Amazon. • Used by Cassandra, Riak, Voldemort. • Many components, I will focus on data partitioning. http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
  18. 18. Dynamo Architecture • Servers are equal peers, not separate primaries and secondaries. • Store overlapping subsets of data (MongoDB shards store disjoint subsets). • Data partitioning determined by consistent hashing.
  19. 19. Dynamo Partitioning • N servers in a ring. • hash(K) is a location around the ring. • Store data for K on the next R servers on the ring.
  20. 20. Dynamo Partitioning • All nodes accept writes: ~linear write scaling. • Data replicated R times: Rx read performance/ reliability.
  21. 21. Dynamo-style Sharding in TokuMX • Each node is primary for some chunks, secondary for others. • Nodes store overlapping subsets of the data set.
  22. 22. Dynamo-style Sharding in TokuMX • S primaries in the ring: Sx write throughput. • R copies of each chunk on separate machines: Rx read throughput, availability & recovery guarantees.
  23. 23. Dynamo-style Sharding in TokuMX • Adding a node: – Move one secondary from each of next 2 nodes to the new node. – Initialize a new replica set on the new node and next 2 nodes.
  24. 24. Future Work Chunk balancer is not sophisticated: • Adding/removing machines is rough, overloads the machine’s neighbors. • Can we use ideas from Cassandra & Riak to improve this? MongoDB architecture requires managing multiple processes on each machine. • We can do better with good tools. Talk to me if you want to write them.
  25. 25. Thanks! Come to my talk after lunch for details about Fractal Trees. leif@tokutek.com @leifwalsh tokutek.com/tokumx slidesha.re/13pxgH8

×