SoftwarePeople
Md Khairul Anam
Introduction to Availability &
Scalability in MongoDB
Availability
Replica Set – Creation
Replica Set – Initialize
Replica Set – Failure
Replica Set – Failover
Replica Set – Recovery
Replica Set – Recovered
Replica Set Roles
• Heartbeats
• Priority Comparisons
• Optime
• Connections
• Networka Partitions
Factors and Conditions that Affect Elections
Strong Consistency
Delayed Consistency
Maintenance and Upgrade
• Rolling upgrade/maintenance
– Start with Secondary
– Primary last
Replica Set – 1 Data Center
Single datacenter
Single switch & power
Points of failure:
– Power
– Network
– Data center
Automatic recovery of
single node crash
Replica Set – 2 Data Centers
Multi data center
DR node for safety
Can’t do multi data
center durable write
safely since only 1 node
in distant DC
Replica Set – 3 Data Centers
Three data centers
Can survive full data
center loss
Can do w= { dc : 2 } to
guarantee write in 2 data
centers
Questions?
Scalability
User Growth
– 1995: 0.4% of the world’s population
– Today: 30% of the world is online (~2.2B)
Data Set Growth
– Facebook’s data set is around 100 petabytes
– 4 billion photos taken in the last year (4x a decade ago
Examining Growth
Read/Write Throughput
Exceeds I/O
Working Set
In
d
e
x
e
s
D
a
t
a
Working Set
Indexes Data
Working Set Exceeds
Physical Memory
Vertical Scalability
(Scale Up)
Horizontal Scalability (Scale Out)
Custom Hardware
– Oracle
Custom Software
– Facebook + MySQL
– Google
MongoDB Auto-Sharding
Adata store that is
– Free
– Publiclyavailable
– Open Source(https://github.com/mongodb/mongo)
– Horizontallyscalable
– Applicationindependent
Data Store Scalability Solutions
Sharded Cluster Architecture
• Shard is a node of the cluster
• Shard can be a single mongod or a replica
set
What is a Shard?
Config Server
– Stores cluster chunk ranges and locations
– Can have only 1 or 3 (production must have 3)
– Not a replica set
Meta Data Storage
Mongos
– Acts as a router / balancer
– No local data (persists to config database)
– Can have 1 or many
Routing and Managing Data
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
Partitioning
• Shard key is used to partition your collection
• Shard key must exist in every document
• Shard key must be indexed
• Shard key is used to route requests to shards
What is a Shard Key
Shards and Shard Keys
Shard
Shard key
range
• Initially 1 chunk
• Default max chunk size: 64mb
• MongoDB automatically splits & migrates
chunks when max reached
Data Distribution
• Targeted Queries
• Scatter Gather Queries
• Scatter Gather Queries with Sort
Cluster Request Routing
Questions?
Thank You

Availability and scalability in mongo

Editor's Notes

  • #4 Basic explanation 2 or more nodes form the set Quorum
  • #5 Initialize -> Election Primary + data replication from primary to secondary
  • #6 Primary down/network failure Automatic election of new primary if majority exists
  • #7 New primary elected Replication established from new primary
  • #8 Down node comes up Rejoins sets Recovery and then secondary
  • #10 Primary Data member Secondary Hot standby Arbiters Voting member
  • #16 A good question to ask the audience : 'Why wouldn't you set w={dc:3}'… Why would you ever do that? What would be the complications?
  • #18 Consistency Write preferences Read preferences
  • #20 A good question to ask the audience : 'Why wouldn't you set w={dc:3}'… Why would you ever do that? What would be the complications?