Scaling, Security & Performance
Sasidhar Gogulapati
Why MongoDB?
• When there is a need of high write load. Can do
80,000 inserts/sec on a single node.
– Sharding required only if data set is more than 50 million
• When high availability is required in unreliable
environment
– Setting replica set is easy and fast
– Recovery from a node failure is instant
• When data needs to grow big
– My SQL table performance degrades when table size is 5-
10 GB
• MongoDB has a built in easy solution for partitioning and
sharding
• When data is location based
• With bulit in functions of mongodb, it is fast and accurate to find data
from specific locations.
• With over 2,000/s CDR inserts, MongoDB architecture is great
for a system that must support high insert load. Yet you can
guarantee transactions with findAndModify (which is slower)
and two-phase commit (application wise)
• Schema-less design enables rapid introduction.As MongoDB is
schema-less, adding a new field, does not effect old rows (or
documents) and will be instant
How Secure MongoDB is?
MongoDB provides various features, such as authentication, access
control, encryption, to secure your MongoDB deployments. Some key
security features include:
Encryption at rest, when used in conjunction with transport
encryption and good security policies that protect relevant
accounts, passwords, and encryption keys, can help ensure
compliance with security and privacy standards, including
HIPAA, PCI-DSS, and FERPA
*Available only for Enteprise version
• MongoDB supports TLS/SSL to encrypt all of
MongoDB’s network traffic
• TLS/SSL ensures that MongoDB network traffic is
only readable by the intended client. TLS/SSL
implementation uses OpenSSL libraries.
• MongoDB’s SSL encryption only allows use of
strong SSL ciphers with a minimum of 128-bit
key length for all connections.
Sharding
• Sharding - Method for distributing data across
multiple machines
• Should be used only with large data sets and high
throughput opertions
• MongoDB supports horizontal scaling by Sharding
(Increasing number of instances where mongodb is
installed)
A MongoDB sharded cluster consists of
following components:
1. Shard: Each shard consists of subset of sharded data
. Each Shard can be deployed as a replica set
2. mongos: Acts as a query router, providing a
interface between client applications and the
sharded cluster
3. Config Servers: stores metadata and configuration
settings for the clusters
MongoDB shards data at the collection level, distributing the data
across shards in cluster.
• Shard contains a subset of sharded data for a sharded
cluster
• Shards should be deployed as a replica set to provide
redundance and high availability
• Query on a shard returns only subset of data
• Users, client or applications should only directly
connect to a shard to perform local administrative or
maintanence operations
• Use mongos to do operations at cluster level,
including read and write
• Config servers store the metadata of a sharded
cluster
• Metadata reflects state and organization for all data
and components in the sharded cluster
• And also includes the list of chunks on every shard
and the ranges that define the chunks
• Deploy config servers as replica sets
• mongos cache this data and use for routing the read
and write operations
• Mongos instances route queries and write operations
to shards in a sharded cluster
• Tracks what data is on which shard by caching the
metadata from config servers
• Mongos has no persistent state and uses minimal
system resources
• Most common practice is to run mongos on the
same application servers
• To distribute the documents in collections, mongodb
partitions the collection using the shard key
• Shard key can be chosen while sharding the
collection. It cannot be changed after sharding
• A sharded collection can have only one shard key
• Choice of shard key affects the performance,
efficiency and scalability of a sharded cluster
• Shard key specification:
https://docs.mongodb.com/manual/core/sharding-
shard-key/#sharding-shard-key-creation
How to scale MongoDB?
• Sharding (clustering)
• Vertical Scaling (Increase in CPU,Memory.,)
• Horizontal Scaling (Increase in instances)
Sharding is the solution to address scaling issues in MongoDB. Using sharding,
developers can horizontally scale the database over multiple servers. The same
dataset is divided over multiple servers. Each individual server calls the shard; it is
an independent database. All shards together make the single logical database.
MongoDB Performance
Recently usain.com published a
comprehensive independent
database comparison,
measuring performance across
multiple dimensions using the
Yahoo! Cloud Serving
Benchmark (YCSB). In these
tests it observed that MongoDB
overwhelmingly outperformed
key value stores, in terms of
throughput and latency, across
a number of configurations
• When all three databases are configured the same way, MongoDB
provides 20% greater throughput than Cassandra, and 50% greater
throughput than Couchbase
• When tested with configuration that prevents any possible data loss,
MongoDB outperforms Cassandra and Couchbase by more than 25x, with
latency that is more than 95% better than Cassandra, and more than
99.5% better than Couchbase
• Finally, when tested with a configuration that provides excellent
performance and minimal possible data loss in the event of a node failure,
MongoDB provides 3x greater throughput than Cassandra in read-
intensive workloads, and 70% higher throughput in write-intensive
workloads, while providing 80% lower latency
MongoDB : Scaling, Security & Performance

MongoDB : Scaling, Security & Performance

  • 1.
    Scaling, Security &Performance Sasidhar Gogulapati
  • 2.
  • 3.
    • When thereis a need of high write load. Can do 80,000 inserts/sec on a single node. – Sharding required only if data set is more than 50 million • When high availability is required in unreliable environment – Setting replica set is easy and fast – Recovery from a node failure is instant • When data needs to grow big – My SQL table performance degrades when table size is 5- 10 GB
  • 4.
    • MongoDB hasa built in easy solution for partitioning and sharding • When data is location based • With bulit in functions of mongodb, it is fast and accurate to find data from specific locations. • With over 2,000/s CDR inserts, MongoDB architecture is great for a system that must support high insert load. Yet you can guarantee transactions with findAndModify (which is slower) and two-phase commit (application wise) • Schema-less design enables rapid introduction.As MongoDB is schema-less, adding a new field, does not effect old rows (or documents) and will be instant
  • 5.
  • 6.
    MongoDB provides variousfeatures, such as authentication, access control, encryption, to secure your MongoDB deployments. Some key security features include:
  • 7.
    Encryption at rest,when used in conjunction with transport encryption and good security policies that protect relevant accounts, passwords, and encryption keys, can help ensure compliance with security and privacy standards, including HIPAA, PCI-DSS, and FERPA *Available only for Enteprise version
  • 8.
    • MongoDB supportsTLS/SSL to encrypt all of MongoDB’s network traffic • TLS/SSL ensures that MongoDB network traffic is only readable by the intended client. TLS/SSL implementation uses OpenSSL libraries. • MongoDB’s SSL encryption only allows use of strong SSL ciphers with a minimum of 128-bit key length for all connections.
  • 9.
  • 10.
    • Sharding -Method for distributing data across multiple machines • Should be used only with large data sets and high throughput opertions • MongoDB supports horizontal scaling by Sharding (Increasing number of instances where mongodb is installed)
  • 11.
    A MongoDB shardedcluster consists of following components: 1. Shard: Each shard consists of subset of sharded data . Each Shard can be deployed as a replica set 2. mongos: Acts as a query router, providing a interface between client applications and the sharded cluster 3. Config Servers: stores metadata and configuration settings for the clusters
  • 12.
    MongoDB shards dataat the collection level, distributing the data across shards in cluster.
  • 13.
    • Shard containsa subset of sharded data for a sharded cluster • Shards should be deployed as a replica set to provide redundance and high availability • Query on a shard returns only subset of data • Users, client or applications should only directly connect to a shard to perform local administrative or maintanence operations • Use mongos to do operations at cluster level, including read and write
  • 14.
    • Config serversstore the metadata of a sharded cluster • Metadata reflects state and organization for all data and components in the sharded cluster • And also includes the list of chunks on every shard and the ranges that define the chunks • Deploy config servers as replica sets • mongos cache this data and use for routing the read and write operations
  • 15.
    • Mongos instancesroute queries and write operations to shards in a sharded cluster • Tracks what data is on which shard by caching the metadata from config servers • Mongos has no persistent state and uses minimal system resources • Most common practice is to run mongos on the same application servers
  • 16.
    • To distributethe documents in collections, mongodb partitions the collection using the shard key • Shard key can be chosen while sharding the collection. It cannot be changed after sharding • A sharded collection can have only one shard key • Choice of shard key affects the performance, efficiency and scalability of a sharded cluster • Shard key specification: https://docs.mongodb.com/manual/core/sharding- shard-key/#sharding-shard-key-creation
  • 17.
    How to scaleMongoDB?
  • 18.
    • Sharding (clustering) •Vertical Scaling (Increase in CPU,Memory.,) • Horizontal Scaling (Increase in instances)
  • 19.
    Sharding is thesolution to address scaling issues in MongoDB. Using sharding, developers can horizontally scale the database over multiple servers. The same dataset is divided over multiple servers. Each individual server calls the shard; it is an independent database. All shards together make the single logical database.
  • 20.
  • 21.
    Recently usain.com publisheda comprehensive independent database comparison, measuring performance across multiple dimensions using the Yahoo! Cloud Serving Benchmark (YCSB). In these tests it observed that MongoDB overwhelmingly outperformed key value stores, in terms of throughput and latency, across a number of configurations
  • 22.
    • When allthree databases are configured the same way, MongoDB provides 20% greater throughput than Cassandra, and 50% greater throughput than Couchbase • When tested with configuration that prevents any possible data loss, MongoDB outperforms Cassandra and Couchbase by more than 25x, with latency that is more than 95% better than Cassandra, and more than 99.5% better than Couchbase • Finally, when tested with a configuration that provides excellent performance and minimal possible data loss in the event of a node failure, MongoDB provides 3x greater throughput than Cassandra in read- intensive workloads, and 70% higher throughput in write-intensive workloads, while providing 80% lower latency

Editor's Notes

  • #4  https://dzone.com/articles/mongodb-facts-over-80000 https://dzone.com/articles/when-use-mongodb-rather-mysql
  • #5 https://dzone.com/articles/when-use-mongodb-rather-mysql
  • #6 https://docs.mongodb.com/manual/security/
  • #7 https://docs.mongodb.com/manual/core/security-transport-encryption/
  • #19 https://kadira.io/blog/other/scaling-mongodb-at-kadira
  • #22 https://www.mongodb.com/blog/post/high-performance-benchmarking-mongodb-and-nosql-systems