SlideShare a Scribd company logo
1 of 43
Download to read offline
June-22-2018
How to Scale MongoDB
● Location: Skopje, Republic of Macedonia
● Education: MSc, Software Engineering
● Experience:
○ Lead Database Consultant (since 2016)
○ Database Consultant (2012 - 2016)
○ Web Developer, DBA (2007 - 2012)
● Certifications: C100DBA - MongoDB certified DBA (since 2016)
https://mk.linkedin.com/in/igorle @igorle
About me
© 2018 Pythian. Confidential
Overview
• What is HA, how MongoDB achieves HA
• Replica set, components, deployment topologies
• Scaling (Vertical vs Horizontal)
• What is a sharded cluster in MongoDB
• Cluster components - shards, config servers, mongos
• Shard keys and chunks
• Hashed vs range based sharding
• Choosing shard key
• QA
© 2018 Pythian. Confidential
HA
© 2018 Pythian. Confidential
What is HA?
• High availability refers to systems that are durable and likely to operate
continuously without failure for a long time
• single point of failure (SPOF) is a part of a system that, if it fails, will stop
the entire system from working
© 2018 Pythian. Confidential
Application Database
SPOF? SPOF?
Database
Replica
What is HA?
• HA on the Application vs Database layer
• Splitting Writes and Read operations
© 2018 Pythian. Confidential
Application Primary
Application
…………...
Replica
LoadBalancer
LoadBalancer?
Writes
Reads
Database
Replica
What is HA?
• Application node failure
• Database node failure
• Failover (elect new Primary node)
© 2018 Pythian. Confidential
Application Primary
Application
…………...
Replica
LoadBalancer
LoadBalancer?
Writes
Reads
• Group of mongod processes that maintain
the same data set
• Redundancy and high availability
• Increased read capacity (scaling reads)
• Automatic failover
HA with MongoDB - replica sets
© 2018 Pythian. Confidential
Replication concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
© 2018 Pythian. Confidential
1.
Replication concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
© 2018 Pythian. Confidential
2. oplog
1.
Replication concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
© 2018 Pythian. Confidential
2. oplog
1.
3. 3.
Replication concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary
© 2018 Pythian. Confidential
2. oplog
1.
3. 3.
4. 4.
Replication concept
1. Write operations go to the Primary node
2. All changes are recorded into operations log
3. Asynchronous replication to Secondary
4. Secondaries copy the Primary oplog
5. Secondary can use sync source Secondary*
*settings.chainingAllowed (true by default)
© 2018 Pythian. Confidential
2. oplog
1.
3. 3.
4. 4.
5.
Configuration options
• 50 members per replica set (7 voting members)
• Arbiter node
• Priority 0 node
• Hidden node
• Delayed node
© 2018 Pythian. Confidential
• Does not hold copy of data
• Votes in elections
Arbiter node
hidden : true
© 2018 Pythian. Confidential
Arbiter
• Does not hold copy of data
• Votes in elections
Arbiter node
hidden : true
© 2018 Pythian. Confidential
Arbiter
Priority 0 node
Priority - floating point (i.e. decimal) number between 0 and 1000
• Cannot become primary, cannot trigger election
• Visible to application
© 2018 Pythian. Confidential
Secondary
priority : 0
Hidden node
• Not visible to application
• Never becomes primary
• Use cases
○ reporting
○ backups
hidden : true
© 2018 Pythian. Confidential
hidden: true priority:0
Secondary
hidden : true priority : 0
Delayed node
• Must be priority 0 member
• Should be hidden member (not mandatory)
• Mainly used for backups
• Recovery in case of human error
© 2018 Pythian. Confidential
Secondary
slaveDelay : 3600
priority : 0
hidden : true
Scaling
© 2018 Pythian. Confidential
• Working set - amount of memory that a process requires in a given time
interval
• Time for scaling ?
Scaling
© 2018 Pythian. Confidential
● Vertical scaling
○ Adding more power (CPU, RAM, DISK) to an existing machine
○ Might require downtime while scaling up
● Horizontal scaling
○ Adding more machines into your pool of resources
○ Partitioning the data on multiple machines
○ Parallelizing the processing
○ More complex to implement and maintain
Scaling
© 2018 Pythian. Confidential
Scaling (vertical)
© 2017 Pythian. Confidential
1TB
1TB
Scaling (horizontal) - Sharding
© 2017 Pythian. Confidential
1TB
256GB 256GB 256GB 256GB
1TB
Shard 1 Shard 2 Shard 3 Shard 4
* https://www.amazon.com/dp/0596102356
Scaling costs
© 2018 Pythian. Confidential
Scaling with MongoDB
© 2018 Pythian. Confidential
Scaling (sharding) with MongoDB
Shard
© 2018 Pythian. Confidential
Sharding with MongoDB
• Shard/Replica set
(subset of the sharded data)
• Config servers
(metadata and config settings)
• mongos
(query router, cluster interface)
mongos> sh.addShard("shardN")
© 2018 Pythian. Confidential
1 2 N-1 N
Shards
• Contains subset of sharded data
• Replica set for redundancy and HA
• Primary shard
• Non sharded collections
• --shardsvr in config file (port 27018)
© 2018 Pythian. Confidential
Config servers
• Stores the metadata for sharded cluster in
config database
• Authentication configuration information in
admin database
• Config servers as replica set only (>= 3.4)
• Holds balancer on Primary node (>= 3.4)
• --configsvr in config file (port 27019)
© 2018 Pythian. Confidential
mongos
• Caching metadata from config servers
• Routes queries to shards
• No persistent state
• Updates cache on metadata changes
• Holds balancer (mongodb <= 3.2)
• mongos version 3.4 cannot connect to
earlier mongod version
© 2018 Pythian. Confidential
Sharding collection
• Enable sharding on database "users"
sh.enableSharding("users")
• Shard collection history
sh.shardCollection("users.history", { user_id : 1 } )
• Shard key - indexed key that exists in every document in the collection
○ range based
sh.shardCollection("users.history", { user_id : 1 } )
○ hashed based
sh.shardCollection( "users.history", { user_id : "hashed" } )
© 2018 Pythian. Confidential
Ranged sharding
• Dividing data into contiguous ranges determined by the shard key values
• Documents with “close” shard key values are likely to be in the same
chunk or shard
• Query Isolation - more likely to target single shard for range queries
© 2018 Pythian. Confidential
Hashed sharding
• Uses a hashed index of a single field as the shard key to partition data
• More even data distribution at the cost of reducing Query Isolation
• Applications do not need to compute hashes
© 2018 Pythian. Confidential
Shard keys and chunks
Choosing Shard key
• Large Shard Key Cardinality
• Low Shard Key Frequency
• Non-Monotonically changing shard keys (ranged or hashed sharding)
Chunks
• A contiguous range of shard key values within a particular shard
• Inclusive lower and exclusive upper range based on shard key
• Default size of 64MB ([1 .. 1024] MB)
© 2018 Pythian. Confidential
Shard key cardinality
© 2018 Pythian. Confidential
Shard key frequency
© 2018 Pythian. Confidential
Monotonically changing shard key
© 2018 Pythian. Confidential
Choosing Shard key
Analyse your query patterns before making any decision (mtools - mloginfo)
● users
○ Querying data by username (unique key?). Consider {username} as
shard key
● user history
○ Querying data by user_id. Consider {user_id} as shard key
○ Querying data by user_id, date_created. Consider {user_id,
date_created} as shard key
● views
○ Write intensive workload. created_on (timestamp). Consider hashed
sharding on {timestamp}
© 2018 Pythian. Confidential
Shard keys limitations
• Must be ascending indexed key or indexed compound keys that exists in
every document in the collection
• Cannot be multikey index, a text index or a geospatial index
• Cannot exceed 512 bytes
• Immutable key - can not be updated/changed
• Update operations that affect a single document must include the shard
key or the _id field
• No option for sharding if unique indexes on other fields exist
• No option for second unique index if the shard key is unique index
© 2018 Pythian. Confidential
Summary
• Replica set with odd number of voting members
• Hidden or Delayed member for dedicated functions (reporting, backups …)
• Dataset fits into single server, keep unsharded deployment
• Horizontal scaling - shards running as replica sets
• Shard keys are immutable with max size of 512 bytes
• Shard keys must exists in every document in the collection
• Ranged sharding may not distribute the data evenly
• Hashed sharding distributes the data randomly
• Sharding adds complexity for maintenance (backups and upgrades)
© 2018 Pythian. Confidential
Questions?
© 2018 Pythian. Confidential
We’re Hiring!
https://www.pythian.com/careers/
© 2018 Pythian. Confidential

More Related Content

What's hot

MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External AuthenticationJason Terpko
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2MongoDB
 
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB WorldNoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB WorldAjay Gupte
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMongoDB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Webinar: Architecting Secure and Compliant Applications with MongoDB
Webinar: Architecting Secure and Compliant Applications with MongoDBWebinar: Architecting Secure and Compliant Applications with MongoDB
Webinar: Architecting Secure and Compliant Applications with MongoDBMongoDB
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentMongoDB
 
Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4 Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4 MongoDB
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDBMongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDBStone Gao
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB
 

What's hot (20)

MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
 
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB WorldNoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Webinar: Architecting Secure and Compliant Applications with MongoDB
Webinar: Architecting Secure and Compliant Applications with MongoDBWebinar: Architecting Secure and Compliant Applications with MongoDB
Webinar: Architecting Secure and Compliant Applications with MongoDB
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4 Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDB
 
Sharding
ShardingSharding
Sharding
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
Sharding
ShardingSharding
Sharding
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 

Similar to How to scale MongoDB

Advanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona ServerAdvanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona ServerSeveralnines
 
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...Jimmy Guerrero
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
Distributed Database Architecture for GDPR
Distributed Database Architecture for GDPRDistributed Database Architecture for GDPR
Distributed Database Architecture for GDPRYugabyte
 
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)Lee Myring
 
Optimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldOptimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldDevOps.com
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLYugabyte
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidImply
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteCarlos Andrés García
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteVMware Tanzu
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Severalnines
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 

Similar to How to scale MongoDB (20)

Advanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona ServerAdvanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona Server
 
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
 
Distributed Database Architecture for GDPR
Distributed Database Architecture for GDPRDistributed Database Architecture for GDPR
Distributed Database Architecture for GDPR
 
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
 
Optimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldOptimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real World
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQL
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Timesten Architecture
Timesten ArchitectureTimesten Architecture
Timesten Architecture
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 

How to scale MongoDB

  • 2. ● Location: Skopje, Republic of Macedonia ● Education: MSc, Software Engineering ● Experience: ○ Lead Database Consultant (since 2016) ○ Database Consultant (2012 - 2016) ○ Web Developer, DBA (2007 - 2012) ● Certifications: C100DBA - MongoDB certified DBA (since 2016) https://mk.linkedin.com/in/igorle @igorle About me © 2018 Pythian. Confidential
  • 3. Overview • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical vs Horizontal) • What is a sharded cluster in MongoDB • Cluster components - shards, config servers, mongos • Shard keys and chunks • Hashed vs range based sharding • Choosing shard key • QA © 2018 Pythian. Confidential
  • 4. HA © 2018 Pythian. Confidential
  • 5. What is HA? • High availability refers to systems that are durable and likely to operate continuously without failure for a long time • single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working © 2018 Pythian. Confidential Application Database SPOF? SPOF?
  • 6. Database Replica What is HA? • HA on the Application vs Database layer • Splitting Writes and Read operations © 2018 Pythian. Confidential Application Primary Application …………... Replica LoadBalancer LoadBalancer? Writes Reads
  • 7. Database Replica What is HA? • Application node failure • Database node failure • Failover (elect new Primary node) © 2018 Pythian. Confidential Application Primary Application …………... Replica LoadBalancer LoadBalancer? Writes Reads
  • 8. • Group of mongod processes that maintain the same data set • Redundancy and high availability • Increased read capacity (scaling reads) • Automatic failover HA with MongoDB - replica sets © 2018 Pythian. Confidential
  • 9. Replication concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary © 2018 Pythian. Confidential 1.
  • 10. Replication concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary © 2018 Pythian. Confidential 2. oplog 1.
  • 11. Replication concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary © 2018 Pythian. Confidential 2. oplog 1. 3. 3.
  • 12. Replication concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary © 2018 Pythian. Confidential 2. oplog 1. 3. 3. 4. 4.
  • 13. Replication concept 1. Write operations go to the Primary node 2. All changes are recorded into operations log 3. Asynchronous replication to Secondary 4. Secondaries copy the Primary oplog 5. Secondary can use sync source Secondary* *settings.chainingAllowed (true by default) © 2018 Pythian. Confidential 2. oplog 1. 3. 3. 4. 4. 5.
  • 14. Configuration options • 50 members per replica set (7 voting members) • Arbiter node • Priority 0 node • Hidden node • Delayed node © 2018 Pythian. Confidential
  • 15. • Does not hold copy of data • Votes in elections Arbiter node hidden : true © 2018 Pythian. Confidential Arbiter
  • 16. • Does not hold copy of data • Votes in elections Arbiter node hidden : true © 2018 Pythian. Confidential Arbiter
  • 17. Priority 0 node Priority - floating point (i.e. decimal) number between 0 and 1000 • Cannot become primary, cannot trigger election • Visible to application © 2018 Pythian. Confidential Secondary priority : 0
  • 18. Hidden node • Not visible to application • Never becomes primary • Use cases ○ reporting ○ backups hidden : true © 2018 Pythian. Confidential hidden: true priority:0 Secondary hidden : true priority : 0
  • 19. Delayed node • Must be priority 0 member • Should be hidden member (not mandatory) • Mainly used for backups • Recovery in case of human error © 2018 Pythian. Confidential Secondary slaveDelay : 3600 priority : 0 hidden : true
  • 20. Scaling © 2018 Pythian. Confidential
  • 21. • Working set - amount of memory that a process requires in a given time interval • Time for scaling ? Scaling © 2018 Pythian. Confidential
  • 22. ● Vertical scaling ○ Adding more power (CPU, RAM, DISK) to an existing machine ○ Might require downtime while scaling up ● Horizontal scaling ○ Adding more machines into your pool of resources ○ Partitioning the data on multiple machines ○ Parallelizing the processing ○ More complex to implement and maintain Scaling © 2018 Pythian. Confidential
  • 23. Scaling (vertical) © 2017 Pythian. Confidential 1TB 1TB
  • 24. Scaling (horizontal) - Sharding © 2017 Pythian. Confidential 1TB 256GB 256GB 256GB 256GB 1TB Shard 1 Shard 2 Shard 3 Shard 4
  • 26. Scaling with MongoDB © 2018 Pythian. Confidential
  • 27. Scaling (sharding) with MongoDB Shard © 2018 Pythian. Confidential
  • 28. Sharding with MongoDB • Shard/Replica set (subset of the sharded data) • Config servers (metadata and config settings) • mongos (query router, cluster interface) mongos> sh.addShard("shardN") © 2018 Pythian. Confidential 1 2 N-1 N
  • 29. Shards • Contains subset of sharded data • Replica set for redundancy and HA • Primary shard • Non sharded collections • --shardsvr in config file (port 27018) © 2018 Pythian. Confidential
  • 30. Config servers • Stores the metadata for sharded cluster in config database • Authentication configuration information in admin database • Config servers as replica set only (>= 3.4) • Holds balancer on Primary node (>= 3.4) • --configsvr in config file (port 27019) © 2018 Pythian. Confidential
  • 31. mongos • Caching metadata from config servers • Routes queries to shards • No persistent state • Updates cache on metadata changes • Holds balancer (mongodb <= 3.2) • mongos version 3.4 cannot connect to earlier mongod version © 2018 Pythian. Confidential
  • 32. Sharding collection • Enable sharding on database "users" sh.enableSharding("users") • Shard collection history sh.shardCollection("users.history", { user_id : 1 } ) • Shard key - indexed key that exists in every document in the collection ○ range based sh.shardCollection("users.history", { user_id : 1 } ) ○ hashed based sh.shardCollection( "users.history", { user_id : "hashed" } ) © 2018 Pythian. Confidential
  • 33. Ranged sharding • Dividing data into contiguous ranges determined by the shard key values • Documents with “close” shard key values are likely to be in the same chunk or shard • Query Isolation - more likely to target single shard for range queries © 2018 Pythian. Confidential
  • 34. Hashed sharding • Uses a hashed index of a single field as the shard key to partition data • More even data distribution at the cost of reducing Query Isolation • Applications do not need to compute hashes © 2018 Pythian. Confidential
  • 35. Shard keys and chunks Choosing Shard key • Large Shard Key Cardinality • Low Shard Key Frequency • Non-Monotonically changing shard keys (ranged or hashed sharding) Chunks • A contiguous range of shard key values within a particular shard • Inclusive lower and exclusive upper range based on shard key • Default size of 64MB ([1 .. 1024] MB) © 2018 Pythian. Confidential
  • 36. Shard key cardinality © 2018 Pythian. Confidential
  • 37. Shard key frequency © 2018 Pythian. Confidential
  • 38. Monotonically changing shard key © 2018 Pythian. Confidential
  • 39. Choosing Shard key Analyse your query patterns before making any decision (mtools - mloginfo) ● users ○ Querying data by username (unique key?). Consider {username} as shard key ● user history ○ Querying data by user_id. Consider {user_id} as shard key ○ Querying data by user_id, date_created. Consider {user_id, date_created} as shard key ● views ○ Write intensive workload. created_on (timestamp). Consider hashed sharding on {timestamp} © 2018 Pythian. Confidential
  • 40. Shard keys limitations • Must be ascending indexed key or indexed compound keys that exists in every document in the collection • Cannot be multikey index, a text index or a geospatial index • Cannot exceed 512 bytes • Immutable key - can not be updated/changed • Update operations that affect a single document must include the shard key or the _id field • No option for sharding if unique indexes on other fields exist • No option for second unique index if the shard key is unique index © 2018 Pythian. Confidential
  • 41. Summary • Replica set with odd number of voting members • Hidden or Delayed member for dedicated functions (reporting, backups …) • Dataset fits into single server, keep unsharded deployment • Horizontal scaling - shards running as replica sets • Shard keys are immutable with max size of 512 bytes • Shard keys must exists in every document in the collection • Ranged sharding may not distribute the data evenly • Hashed sharding distributes the data randomly • Sharding adds complexity for maintenance (backups and upgrades) © 2018 Pythian. Confidential