Mongo DB
Note :
• This PPT is made from official documentation (mostly).
• Contains information as per my understanding.
• I might not know the answer to your question right now.
Agenda
• Introduction and features of Mongo DB
• Where to use ( and not to use )Mongo DB
• Supported and recommended platforms
• Installation , start/ stop services, imp terms.
• Architecture
• Does Mongo DB holds ACID ?
• Properties of Mongo and other considerations
• Backup
• What would be covered in the next session
• References
Introduction to Mongo DB
• Mongo is open-source document database.
• Provides high performance.
• Uses Rich Query Language.
• Provides High Availability. -- cluster
• Provides Horizontal Scalability. -- Shard
• Data in Mongo DB are stored in BSON format.
• Collection level Concurrency using MMAPv1 concept. ( incorporated in WiredTiger
too).
• Data consistency using Journaling
• Networking
• Support replicas
• Support indexes of many types
• Support Compressions
• Allows backups and restore features
• Cannot replace RDBMS or HADOOP systems
Where/when ( and where/when not)
to use Mongo DB
• Where/when to use :
– Used as primary data store for operational
applications with real time needs.
– Applications which use unstructured / semi –
structured data , large scalable and multi-
datacenter requirements.
• Where/when not to use :
• Applications using complex calculations.
• Finance data
• Any application which scans large subset of data.
Mongo DB supported platforms
Mongo DB recommended platforms
• As per official note , its advised to always use 64 bit.
• Moving to 3.2 version onwards, 32 bit has been deprecated.
• Current release as on Feb 17 is 3.4.2.
Mongo Installation
• Mongo DB community Edition
• Mongo DB Enterprise.
• Mongo DB instance stores its data files in /var/lib/mongo
• Logfiles are stored in /var/log/mongodb
• Executables go in /usr/bin
• User account created: mongod and mongo
• Default user mongo can be changed. Additional
permissions need to be given to data and log directories.
https://docs.mongodb.com/v3.2/tutorial/install-mongodb-on-red-hat/
Mongo Start / Stop / restart
• Start : sudo service mongod start
• Stop : sudo service mongod stop
• Restart : service mongod restart
• Remove :
1. yum erase $(rpm -qa | grep mongodb-org)
2. sudo rm -r /var/log/mongodb sudo rm -r
/var/lib/mongo
Imp Terms
mongod : mongod is the primary daemon
process for the MongoDB system.
mongos : mongos for "MongoDB Shard," is a
routing service for MongoDB shard
configurations.
mongo : mongo is an interactive JavaScript shell
interface to MongoDB.
• All Mongo service files, by default are stored
in : /usr/bin
Mongo configuration
• Mongo has a configuration file located in /etc by the
name mongod.conf.
• This contains all the configurations related information
, along with the location where the data will be stored.
• Mongo Db ( from version 3.2 onwards) used
WiredTiger storage engine, previous one was MMAPv1.
Mongo vs RDBMS mapping
MONGO RDBMS
Databases Databases
Collections Tables
Documents Rows
Fields Columns
Architecture of Mongo DB
• Mongo DB Nexus Architecture
• Mongo DB Flexible Storage Architecture
Does Mongo DB Holds ACID ?
• Atomicity , Consistency , Isolation and Durability.
• In one word – YES
• In detail – yes , but in limited sense , that is at the document [ row]
level.
• There is no possibility of atomic changes that span multiple
documents.
• Mongo DB handles consistency in its own way, shared in the next
slides.
• Isolation –Earlier version had complete server wide locks ,where as
version 2.2 onwards It became database wide.
• Durability –
• Does not give by default , but can be achieved using replicas but that
sacrifices performance.
• Eg: By keeping multiple replicas before considering a write operation finished.
Concurrency
• Concurrency is maintained using collection
[table]level locking from version 3 onwards.
• All collections have unique readers-writer lock
that allows multiple clients to modify
documents [ rows] in different collection [
tables] at the same time.
• Previous versions ( 2.2 through 2.6) allowed
concurrent read access per database , but only
one write operation per database.
Data Consistency
• Journaling :
• The storage engine uses checkpoints to provide a consistent view of data on disk
and allows recovery from the last checkpoint.
• Journaling needed to recover in unexpected shutdowns.
• With journaling , one journal record is created for each client initiated write
operation.
• Usual commits are done every 60 seconds however journal is committed every 100
milli seconds.
• Records the initial writes done by the initial write.
• The storage engine uses in-memory buffering for storing journal records.
• Can store upto 128 Kb of records.
• Storage engine syncs the buffered journals record on disks on certain criteria.
• Inspite of all this , in case of hard shutdowns , updates may be lost.
• Journals create journal files also which are created in a separate directory.
• Journaling continued
• Recovery using Journaling : 3 Step process:
– Looks in the data files to find the identifier of the last checkpoint.
– Searches in the journal files for the record that matches the identifier of the last checkpoint.
– Apply the operations in the journal files since the last checkpoint.
• MongoDB uses write ahead logging to an on-disk journal.
• Journaling guarantees that MongoDB can quickly recover write operations.
• Its advised to keep Journaling enabled.
• Read concern
• Two types :
– Majority
– Local
• Local is default .
• Local : Does not guarantee that the read data
would not be rolled back.
• Majority : Read when the data is written to many
replicas for more consistency.
• Write concern
• It is the level of acknowledgement requested from MongoDb write
operations.
• Weak : Data is written. No wait for acknowledgement.
• Stronger: Wait is done for acknowledgement.
• Write concerns can include the below fields :
• { w: <value>, j: <boolean>, wtimeout: <number> }
• W: is the number of instances the write should be propagated to.
• J : true | false : if the write has been written onto the Journal.
• Wtimeout : time limit to prevent write to block indefinitely.
Networking for Mongo
• Mongo should always run in a trusted env.
• Access only to servers and systems which really
access.
• By default , authorization ( control) is not
enabled and has to be enabled.
• Disable HTTP interface. Mongo uses HTTP
interface to check status of the server and run
queries.
• Connection pool should be optimized to around
100%-120% of the concurrent database requests.
Hardware Considerations
• Can run on commodity systems. No high end systems
needed.
• To run mongo with traditional storage
engine(MMAPv1) we 2 real cores or one physical core.
• Increasing the number of cores can improve
performance only upto some extent.
• Increasing RAM may help in improving performance by
reducing page faults.
• WiredTiger is multithreaded and additional CPUs help
in performance.
• Wired tiger uses some memory for caching also which
can be adjusted.
Other considerations
• Mongo Db supports compression and Encryption.
• Supports SSDs and SSDs and RAM helps in
performance.
• Running Mongo on NUMA [Non Uniform Access
Memory]* hardware causes problems.
• Mongo systems prefer RAID-10. Raid 5 does not
provide good performance.
• NFS is not recommended.
• Each OS has some specific settings which can be found
in the documentation.
• NUMA is computer memory design where the memory access time depends on the
memory location relative to processor.
Mongo DB backups
• Backup with MongoDB Cloud Manager
• Paid
• UI based.
• Backup with MongoDB OPS Manager
• Same as cloud Manager ,
• Paid
• UI based
• In Premise.
• Backup by copying underlying datafiles
• Using File system snapshot feature.
• Needs Journaling Enabled.
• To be taken separately on each shard after disabling the balancer.
• Backup with mongodump( and mongorestore)
• Efficient for small DBs.
• Does not work good for big databases.
• Has performance issues.
• Smaller than cp/rsync. ( Like expdp/impdp in Oracle).
• Backup using cp / rsync
• This is COLD backup.
• Writes need to be stopped for this to be a good copy.
What’s in the next session ?
• Security
• Replication
• Sharding
• working demo on:
– Installation
– Document Creation
– DML Statements / Indexes
– Backups
– Replication and Sharding ( if possible).
References
• https://docs.mongodb.com/v3.2/administration/production-
notes/
• http://openmymind.net/mongodb.pdf
• https://docs.mongodb.com/v3.2/tutorial/install-mongodb-on-
red-hat/
• https://www.mongodb.com/faq
• https://dzone.com/articles/how-acid-mongodb
• http://nosql.mypopescu.com/post/392868405/mongodb-
durability-a-tradeoff-to-be-aware-of
• https://www.mongodb.com/collateral/mongodb-architecture-
guide

Mongo DB

  • 1.
    Mongo DB Note : •This PPT is made from official documentation (mostly). • Contains information as per my understanding. • I might not know the answer to your question right now.
  • 2.
    Agenda • Introduction andfeatures of Mongo DB • Where to use ( and not to use )Mongo DB • Supported and recommended platforms • Installation , start/ stop services, imp terms. • Architecture • Does Mongo DB holds ACID ? • Properties of Mongo and other considerations • Backup • What would be covered in the next session • References
  • 3.
    Introduction to MongoDB • Mongo is open-source document database. • Provides high performance. • Uses Rich Query Language. • Provides High Availability. -- cluster • Provides Horizontal Scalability. -- Shard • Data in Mongo DB are stored in BSON format. • Collection level Concurrency using MMAPv1 concept. ( incorporated in WiredTiger too). • Data consistency using Journaling • Networking • Support replicas • Support indexes of many types • Support Compressions • Allows backups and restore features • Cannot replace RDBMS or HADOOP systems
  • 4.
    Where/when ( andwhere/when not) to use Mongo DB • Where/when to use : – Used as primary data store for operational applications with real time needs. – Applications which use unstructured / semi – structured data , large scalable and multi- datacenter requirements. • Where/when not to use : • Applications using complex calculations. • Finance data • Any application which scans large subset of data.
  • 5.
  • 6.
    Mongo DB recommendedplatforms • As per official note , its advised to always use 64 bit. • Moving to 3.2 version onwards, 32 bit has been deprecated. • Current release as on Feb 17 is 3.4.2.
  • 7.
    Mongo Installation • MongoDB community Edition • Mongo DB Enterprise. • Mongo DB instance stores its data files in /var/lib/mongo • Logfiles are stored in /var/log/mongodb • Executables go in /usr/bin • User account created: mongod and mongo • Default user mongo can be changed. Additional permissions need to be given to data and log directories. https://docs.mongodb.com/v3.2/tutorial/install-mongodb-on-red-hat/
  • 8.
    Mongo Start /Stop / restart • Start : sudo service mongod start • Stop : sudo service mongod stop • Restart : service mongod restart • Remove : 1. yum erase $(rpm -qa | grep mongodb-org) 2. sudo rm -r /var/log/mongodb sudo rm -r /var/lib/mongo
  • 9.
    Imp Terms mongod :mongod is the primary daemon process for the MongoDB system. mongos : mongos for "MongoDB Shard," is a routing service for MongoDB shard configurations. mongo : mongo is an interactive JavaScript shell interface to MongoDB.
  • 10.
    • All Mongoservice files, by default are stored in : /usr/bin
  • 11.
    Mongo configuration • Mongohas a configuration file located in /etc by the name mongod.conf. • This contains all the configurations related information , along with the location where the data will be stored. • Mongo Db ( from version 3.2 onwards) used WiredTiger storage engine, previous one was MMAPv1.
  • 12.
    Mongo vs RDBMSmapping MONGO RDBMS Databases Databases Collections Tables Documents Rows Fields Columns
  • 13.
    Architecture of MongoDB • Mongo DB Nexus Architecture
  • 14.
    • Mongo DBFlexible Storage Architecture
  • 15.
    Does Mongo DBHolds ACID ? • Atomicity , Consistency , Isolation and Durability. • In one word – YES • In detail – yes , but in limited sense , that is at the document [ row] level. • There is no possibility of atomic changes that span multiple documents. • Mongo DB handles consistency in its own way, shared in the next slides. • Isolation –Earlier version had complete server wide locks ,where as version 2.2 onwards It became database wide. • Durability – • Does not give by default , but can be achieved using replicas but that sacrifices performance. • Eg: By keeping multiple replicas before considering a write operation finished.
  • 16.
    Concurrency • Concurrency ismaintained using collection [table]level locking from version 3 onwards. • All collections have unique readers-writer lock that allows multiple clients to modify documents [ rows] in different collection [ tables] at the same time. • Previous versions ( 2.2 through 2.6) allowed concurrent read access per database , but only one write operation per database.
  • 17.
    Data Consistency • Journaling: • The storage engine uses checkpoints to provide a consistent view of data on disk and allows recovery from the last checkpoint. • Journaling needed to recover in unexpected shutdowns. • With journaling , one journal record is created for each client initiated write operation. • Usual commits are done every 60 seconds however journal is committed every 100 milli seconds. • Records the initial writes done by the initial write. • The storage engine uses in-memory buffering for storing journal records. • Can store upto 128 Kb of records. • Storage engine syncs the buffered journals record on disks on certain criteria. • Inspite of all this , in case of hard shutdowns , updates may be lost. • Journals create journal files also which are created in a separate directory.
  • 18.
    • Journaling continued •Recovery using Journaling : 3 Step process: – Looks in the data files to find the identifier of the last checkpoint. – Searches in the journal files for the record that matches the identifier of the last checkpoint. – Apply the operations in the journal files since the last checkpoint. • MongoDB uses write ahead logging to an on-disk journal. • Journaling guarantees that MongoDB can quickly recover write operations. • Its advised to keep Journaling enabled.
  • 19.
    • Read concern •Two types : – Majority – Local • Local is default . • Local : Does not guarantee that the read data would not be rolled back. • Majority : Read when the data is written to many replicas for more consistency.
  • 20.
    • Write concern •It is the level of acknowledgement requested from MongoDb write operations. • Weak : Data is written. No wait for acknowledgement. • Stronger: Wait is done for acknowledgement. • Write concerns can include the below fields : • { w: <value>, j: <boolean>, wtimeout: <number> } • W: is the number of instances the write should be propagated to. • J : true | false : if the write has been written onto the Journal. • Wtimeout : time limit to prevent write to block indefinitely.
  • 21.
    Networking for Mongo •Mongo should always run in a trusted env. • Access only to servers and systems which really access. • By default , authorization ( control) is not enabled and has to be enabled. • Disable HTTP interface. Mongo uses HTTP interface to check status of the server and run queries. • Connection pool should be optimized to around 100%-120% of the concurrent database requests.
  • 22.
    Hardware Considerations • Canrun on commodity systems. No high end systems needed. • To run mongo with traditional storage engine(MMAPv1) we 2 real cores or one physical core. • Increasing the number of cores can improve performance only upto some extent. • Increasing RAM may help in improving performance by reducing page faults. • WiredTiger is multithreaded and additional CPUs help in performance. • Wired tiger uses some memory for caching also which can be adjusted.
  • 23.
    Other considerations • MongoDb supports compression and Encryption. • Supports SSDs and SSDs and RAM helps in performance. • Running Mongo on NUMA [Non Uniform Access Memory]* hardware causes problems. • Mongo systems prefer RAID-10. Raid 5 does not provide good performance. • NFS is not recommended. • Each OS has some specific settings which can be found in the documentation. • NUMA is computer memory design where the memory access time depends on the memory location relative to processor.
  • 24.
    Mongo DB backups •Backup with MongoDB Cloud Manager • Paid • UI based. • Backup with MongoDB OPS Manager • Same as cloud Manager , • Paid • UI based • In Premise. • Backup by copying underlying datafiles • Using File system snapshot feature. • Needs Journaling Enabled. • To be taken separately on each shard after disabling the balancer. • Backup with mongodump( and mongorestore) • Efficient for small DBs. • Does not work good for big databases. • Has performance issues. • Smaller than cp/rsync. ( Like expdp/impdp in Oracle). • Backup using cp / rsync • This is COLD backup. • Writes need to be stopped for this to be a good copy.
  • 25.
    What’s in thenext session ? • Security • Replication • Sharding • working demo on: – Installation – Document Creation – DML Statements / Indexes – Backups – Replication and Sharding ( if possible).
  • 26.
    References • https://docs.mongodb.com/v3.2/administration/production- notes/ • http://openmymind.net/mongodb.pdf •https://docs.mongodb.com/v3.2/tutorial/install-mongodb-on- red-hat/ • https://www.mongodb.com/faq • https://dzone.com/articles/how-acid-mongodb • http://nosql.mypopescu.com/post/392868405/mongodb- durability-a-tradeoff-to-be-aware-of • https://www.mongodb.com/collateral/mongodb-architecture- guide