SlideShare a Scribd company logo
1 of 25
Download to read offline
A New MongoDB Sharding 
Architecture for Higher 
Availability and Better 
Resource Utilization 
Leif Walsh 
@leifwalsh
A Traditional 
MongoDB Cluster 
• 3 shards. 
• 3 replicas per shard.
A Traditional 
MongoDB Cluster 
• 3x write throughput. 
• 3x read throughput.
A Traditional 
MongoDB Cluster 
• 1 node can go down 
without losing availability.
A Traditional 
MongoDB Cluster 
• Data can survive 
destruction of 2 nodes.
General 
MongoDB Cluster 
• Sx write throughput. 
• Rx read throughput. 
• R/2 nodes can go down 
without losing availability. 
• Data can survive 
destruction of R-1 nodes. 
• S×R hardware & 
maintenance cost.
TokuMX: MongoDB with Fractal Trees 
• MongoDB fork. 
• Compression, performance, transactions. 
• Details about Fractal Trees after lunch.
TokuMX: MongoDB with Fractal Trees 
• Read-free Replication 
• Fast Updates 
• Optimized Sharding Migrations 
• Ark Consensus for Replication Failover 
• Partitioned Collections 
• Clustering Indexes & Primary Keys 
• tokutek.com/tokumx
Fractal Tree 
Performance Basics 
Writes are cheap: 
• O(1/B) I/Os per op. 
• ≈10k/s 
Reads are expensive: 
• Ω(1) I/O per op. 
• ≈100/s
Read-free Replication 
Updates are reads + writes. 
Secondaries can trust the primary, 
only do writes.
Read-free Replication 
Updates are reads + writes. 
Secondaries can trust the primary, 
only do writes. 
Looking at I/O utilization, 
secondaries are very cheap 
compared to primaries.
A Traditional 
TokuMX Cluster 
• 9 machines, only 3x 
throughput benefit. 
• Secondaries are 
under-utilized.
A TokuMX Cluster With 
Read-free Replication 
• 3x write throughput. 
• 3x read throughput. 
• (maybe separately)
A TokuMX Cluster With 
Read-free Replication 
• 1 node can go down 
without losing availability.
A TokuMX Cluster With 
Read-free Replication 
• Data can survive 
destruction of 2 nodes.
A TokuMX Cluster With 
Read-free Replication 
• Only 3x hardware cost, 
down from 9x.
Dynamo Architecture 
• Developed at Amazon. 
• Used by Cassandra, Riak, Voldemort. 
• Many components, I will focus on data 
partitioning. 
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Dynamo Architecture 
• Servers are equal peers, not separate 
primaries and secondaries. 
• Store overlapping subsets of data 
(MongoDB shards store disjoint subsets). 
• Data partitioning determined by 
consistent hashing.
Dynamo Partitioning 
• N servers in a ring. 
• hash(K) is a location 
around the ring. 
• Store data for K on the 
next R servers on the 
ring.
Dynamo Partitioning 
• All nodes accept writes: 
~linear write scaling. 
• Data replicated R times: 
Rx read performance/ 
reliability.
Dynamo-style Sharding in TokuMX 
• Each node is primary for some 
chunks, secondary for others. 
• Nodes store overlapping 
subsets of the data set.
Dynamo-style Sharding in TokuMX 
• S primaries in the ring: 
Sx write throughput. 
• R copies of each chunk on 
separate machines: 
Rx read throughput, 
availability & recovery 
guarantees.
Dynamo-style Sharding in TokuMX 
• Adding a node: 
– Move one secondary from each 
of next 2 nodes to the new node. 
– Initialize a new replica set on the 
new node and next 2 nodes.
Future Work 
Chunk balancer is not 
sophisticated: 
• Adding/removing machines is 
rough, overloads the machine’s 
neighbors. 
• Can we use ideas from 
Cassandra & Riak to improve 
this? 
MongoDB architecture 
requires managing multiple 
processes on each machine. 
• We can do better with good 
tools. Talk to me if you want to 
write them.
Thanks! 
Come to my talk after lunch for details about 
Fractal Trees. 
leif@tokutek.com 
@leifwalsh 
tokutek.com/tokumx 
slidesha.re/13pxgH8

More Related Content

What's hot

MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
2011 mongo sf-scaling
2011 mongo sf-scaling2011 mongo sf-scaling
2011 mongo sf-scalingMongoDB
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB DeploymentMongoDB
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014clairvoyantllc
 
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureHadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureErik Krogen
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeErik Krogen
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesLightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesMongoDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 

What's hot (20)

MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment Checklist
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
2011 mongo sf-scaling
2011 mongo sf-scaling2011 mongo sf-scaling
2011 mongo sf-scaling
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Hadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On AzureHadoop Meetup Jan 2019 - Hadoop On Azure
Hadoop Meetup Jan 2019 - Hadoop On Azure
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesLightning Talk: What You Need to Know Before You Shard in 20 Minutes
Lightning Talk: What You Need to Know Before You Shard in 20 Minutes
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 

Viewers also liked

Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
Mongo db
Mongo dbMongo db
Mongo dbUPCI
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...MongoDB
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기Jongwon Kim
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
 
Cleversafe august 2016
Cleversafe august 2016Cleversafe august 2016
Cleversafe august 2016Joe Krotz
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsTony Pearson
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseDataWorks Summit
 

Viewers also liked (11)

Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기FIFA 온라인 3의 MongoDB 사용기
FIFA 온라인 3의 MongoDB 사용기
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
 
Cleversafe august 2016
Cleversafe august 2016Cleversafe august 2016
Cleversafe august 2016
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBase
 

Similar to A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization

CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streamsdmantula
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014Avinash Ramineni
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibrarySebastian Andrasoni
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemAdam Marcus
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem yarapavan
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance CachingScyllaDB
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 

Similar to A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization (20)

CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streams
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
The NoSQL Ecosystem
The NoSQL Ecosystem The NoSQL Ecosystem
The NoSQL Ecosystem
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Cassandra
CassandraCassandra
Cassandra
 

More from leifwalsh

The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compressionleifwalsh
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structuresleifwalsh
 
Write-optimization in external memory data structures
Write-optimization in external memory data structuresWrite-optimization in external memory data structures
Write-optimization in external memory data structuresleifwalsh
 
The Level Ancestor Problem simplified
The Level Ancestor Problem simplifiedThe Level Ancestor Problem simplified
The Level Ancestor Problem simplifiedleifwalsh
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structuresleifwalsh
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)leifwalsh
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)leifwalsh
 

More from leifwalsh (7)

The Language of Compression
The Language of CompressionThe Language of Compression
The Language of Compression
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Write-optimization in external memory data structures
Write-optimization in external memory data structuresWrite-optimization in external memory data structures
Write-optimization in external memory data structures
 
The Level Ancestor Problem simplified
The Level Ancestor Problem simplifiedThe Level Ancestor Problem simplified
The Level Ancestor Problem simplified
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
 

Recently uploaded

What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 

Recently uploaded (20)

What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 

A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization

  • 1. A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @leifwalsh
  • 2. A Traditional MongoDB Cluster • 3 shards. • 3 replicas per shard.
  • 3. A Traditional MongoDB Cluster • 3x write throughput. • 3x read throughput.
  • 4. A Traditional MongoDB Cluster • 1 node can go down without losing availability.
  • 5. A Traditional MongoDB Cluster • Data can survive destruction of 2 nodes.
  • 6. General MongoDB Cluster • Sx write throughput. • Rx read throughput. • R/2 nodes can go down without losing availability. • Data can survive destruction of R-1 nodes. • S×R hardware & maintenance cost.
  • 7. TokuMX: MongoDB with Fractal Trees • MongoDB fork. • Compression, performance, transactions. • Details about Fractal Trees after lunch.
  • 8. TokuMX: MongoDB with Fractal Trees • Read-free Replication • Fast Updates • Optimized Sharding Migrations • Ark Consensus for Replication Failover • Partitioned Collections • Clustering Indexes & Primary Keys • tokutek.com/tokumx
  • 9. Fractal Tree Performance Basics Writes are cheap: • O(1/B) I/Os per op. • ≈10k/s Reads are expensive: • Ω(1) I/O per op. • ≈100/s
  • 10. Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes.
  • 11. Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes. Looking at I/O utilization, secondaries are very cheap compared to primaries.
  • 12. A Traditional TokuMX Cluster • 9 machines, only 3x throughput benefit. • Secondaries are under-utilized.
  • 13. A TokuMX Cluster With Read-free Replication • 3x write throughput. • 3x read throughput. • (maybe separately)
  • 14. A TokuMX Cluster With Read-free Replication • 1 node can go down without losing availability.
  • 15. A TokuMX Cluster With Read-free Replication • Data can survive destruction of 2 nodes.
  • 16. A TokuMX Cluster With Read-free Replication • Only 3x hardware cost, down from 9x.
  • 17. Dynamo Architecture • Developed at Amazon. • Used by Cassandra, Riak, Voldemort. • Many components, I will focus on data partitioning. http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
  • 18. Dynamo Architecture • Servers are equal peers, not separate primaries and secondaries. • Store overlapping subsets of data (MongoDB shards store disjoint subsets). • Data partitioning determined by consistent hashing.
  • 19. Dynamo Partitioning • N servers in a ring. • hash(K) is a location around the ring. • Store data for K on the next R servers on the ring.
  • 20. Dynamo Partitioning • All nodes accept writes: ~linear write scaling. • Data replicated R times: Rx read performance/ reliability.
  • 21. Dynamo-style Sharding in TokuMX • Each node is primary for some chunks, secondary for others. • Nodes store overlapping subsets of the data set.
  • 22. Dynamo-style Sharding in TokuMX • S primaries in the ring: Sx write throughput. • R copies of each chunk on separate machines: Rx read throughput, availability & recovery guarantees.
  • 23. Dynamo-style Sharding in TokuMX • Adding a node: – Move one secondary from each of next 2 nodes to the new node. – Initialize a new replica set on the new node and next 2 nodes.
  • 24. Future Work Chunk balancer is not sophisticated: • Adding/removing machines is rough, overloads the machine’s neighbors. • Can we use ideas from Cassandra & Riak to improve this? MongoDB architecture requires managing multiple processes on each machine. • We can do better with good tools. Talk to me if you want to write them.
  • 25. Thanks! Come to my talk after lunch for details about Fractal Trees. leif@tokutek.com @leifwalsh tokutek.com/tokumx slidesha.re/13pxgH8