Evolution of MongoDB Replicaset and Its Best Practices
Mongo DB
1. Mongo DB
Note :
• This PPT is made from official documentation (mostly).
• Contains information as per my understanding.
• I might not know the answer to your question right now.
2. Agenda
• Introduction and features of Mongo DB
• Where to use ( and not to use )Mongo DB
• Supported and recommended platforms
• Installation , start/ stop services, imp terms.
• Architecture
• Does Mongo DB holds ACID ?
• Properties of Mongo and other considerations
• Backup
• What would be covered in the next session
• References
3. Introduction to Mongo DB
• Mongo is open-source document database.
• Provides high performance.
• Uses Rich Query Language.
• Provides High Availability. -- cluster
• Provides Horizontal Scalability. -- Shard
• Data in Mongo DB are stored in BSON format.
• Collection level Concurrency using MMAPv1 concept. ( incorporated in WiredTiger
too).
• Data consistency using Journaling
• Networking
• Support replicas
• Support indexes of many types
• Support Compressions
• Allows backups and restore features
• Cannot replace RDBMS or HADOOP systems
4. Where/when ( and where/when not)
to use Mongo DB
• Where/when to use :
– Used as primary data store for operational
applications with real time needs.
– Applications which use unstructured / semi –
structured data , large scalable and multi-
datacenter requirements.
• Where/when not to use :
• Applications using complex calculations.
• Finance data
• Any application which scans large subset of data.
6. Mongo DB recommended platforms
• As per official note , its advised to always use 64 bit.
• Moving to 3.2 version onwards, 32 bit has been deprecated.
• Current release as on Feb 17 is 3.4.2.
7. Mongo Installation
• Mongo DB community Edition
• Mongo DB Enterprise.
• Mongo DB instance stores its data files in /var/lib/mongo
• Logfiles are stored in /var/log/mongodb
• Executables go in /usr/bin
• User account created: mongod and mongo
• Default user mongo can be changed. Additional
permissions need to be given to data and log directories.
https://docs.mongodb.com/v3.2/tutorial/install-mongodb-on-red-hat/
9. Imp Terms
mongod : mongod is the primary daemon
process for the MongoDB system.
mongos : mongos for "MongoDB Shard," is a
routing service for MongoDB shard
configurations.
mongo : mongo is an interactive JavaScript shell
interface to MongoDB.
10. • All Mongo service files, by default are stored
in : /usr/bin
11. Mongo configuration
• Mongo has a configuration file located in /etc by the
name mongod.conf.
• This contains all the configurations related information
, along with the location where the data will be stored.
• Mongo Db ( from version 3.2 onwards) used
WiredTiger storage engine, previous one was MMAPv1.
15. Does Mongo DB Holds ACID ?
• Atomicity , Consistency , Isolation and Durability.
• In one word – YES
• In detail – yes , but in limited sense , that is at the document [ row]
level.
• There is no possibility of atomic changes that span multiple
documents.
• Mongo DB handles consistency in its own way, shared in the next
slides.
• Isolation –Earlier version had complete server wide locks ,where as
version 2.2 onwards It became database wide.
• Durability –
• Does not give by default , but can be achieved using replicas but that
sacrifices performance.
• Eg: By keeping multiple replicas before considering a write operation finished.
16. Concurrency
• Concurrency is maintained using collection
[table]level locking from version 3 onwards.
• All collections have unique readers-writer lock
that allows multiple clients to modify
documents [ rows] in different collection [
tables] at the same time.
• Previous versions ( 2.2 through 2.6) allowed
concurrent read access per database , but only
one write operation per database.
17. Data Consistency
• Journaling :
• The storage engine uses checkpoints to provide a consistent view of data on disk
and allows recovery from the last checkpoint.
• Journaling needed to recover in unexpected shutdowns.
• With journaling , one journal record is created for each client initiated write
operation.
• Usual commits are done every 60 seconds however journal is committed every 100
milli seconds.
• Records the initial writes done by the initial write.
• The storage engine uses in-memory buffering for storing journal records.
• Can store upto 128 Kb of records.
• Storage engine syncs the buffered journals record on disks on certain criteria.
• Inspite of all this , in case of hard shutdowns , updates may be lost.
• Journals create journal files also which are created in a separate directory.
18. • Journaling continued
• Recovery using Journaling : 3 Step process:
– Looks in the data files to find the identifier of the last checkpoint.
– Searches in the journal files for the record that matches the identifier of the last checkpoint.
– Apply the operations in the journal files since the last checkpoint.
• MongoDB uses write ahead logging to an on-disk journal.
• Journaling guarantees that MongoDB can quickly recover write operations.
• Its advised to keep Journaling enabled.
19. • Read concern
• Two types :
– Majority
– Local
• Local is default .
• Local : Does not guarantee that the read data
would not be rolled back.
• Majority : Read when the data is written to many
replicas for more consistency.
20. • Write concern
• It is the level of acknowledgement requested from MongoDb write
operations.
• Weak : Data is written. No wait for acknowledgement.
• Stronger: Wait is done for acknowledgement.
• Write concerns can include the below fields :
• { w: <value>, j: <boolean>, wtimeout: <number> }
• W: is the number of instances the write should be propagated to.
• J : true | false : if the write has been written onto the Journal.
• Wtimeout : time limit to prevent write to block indefinitely.
21. Networking for Mongo
• Mongo should always run in a trusted env.
• Access only to servers and systems which really
access.
• By default , authorization ( control) is not
enabled and has to be enabled.
• Disable HTTP interface. Mongo uses HTTP
interface to check status of the server and run
queries.
• Connection pool should be optimized to around
100%-120% of the concurrent database requests.
22. Hardware Considerations
• Can run on commodity systems. No high end systems
needed.
• To run mongo with traditional storage
engine(MMAPv1) we 2 real cores or one physical core.
• Increasing the number of cores can improve
performance only upto some extent.
• Increasing RAM may help in improving performance by
reducing page faults.
• WiredTiger is multithreaded and additional CPUs help
in performance.
• Wired tiger uses some memory for caching also which
can be adjusted.
23. Other considerations
• Mongo Db supports compression and Encryption.
• Supports SSDs and SSDs and RAM helps in
performance.
• Running Mongo on NUMA [Non Uniform Access
Memory]* hardware causes problems.
• Mongo systems prefer RAID-10. Raid 5 does not
provide good performance.
• NFS is not recommended.
• Each OS has some specific settings which can be found
in the documentation.
• NUMA is computer memory design where the memory access time depends on the
memory location relative to processor.
24. Mongo DB backups
• Backup with MongoDB Cloud Manager
• Paid
• UI based.
• Backup with MongoDB OPS Manager
• Same as cloud Manager ,
• Paid
• UI based
• In Premise.
• Backup by copying underlying datafiles
• Using File system snapshot feature.
• Needs Journaling Enabled.
• To be taken separately on each shard after disabling the balancer.
• Backup with mongodump( and mongorestore)
• Efficient for small DBs.
• Does not work good for big databases.
• Has performance issues.
• Smaller than cp/rsync. ( Like expdp/impdp in Oracle).
• Backup using cp / rsync
• This is COLD backup.
• Writes need to be stopped for this to be a good copy.
25. What’s in the next session ?
• Security
• Replication
• Sharding
• working demo on:
– Installation
– Document Creation
– DML Statements / Indexes
– Backups
– Replication and Sharding ( if possible).