July 2015 - Tech Sharing
 Architecture
 Data Model
 Query Language
 Data Management
 References
 A storage engine is the part of a database that is responsible for managing how data is stored on disk.
 Many databases support multiple storage engines, where different engines perform better for specific
workloads.
For example, one storage engine might offer better performance for read-heavy workloads, and another
might support a higher-throughput for write operations
 You can have a replica set members that use different storage engines
MongoDB
Example relational data model for a blogging
application
Data as documents: simpler for developers,
faster for users.
 Dynamic/Flexible
 Collections (Tables) can be created without defining structure of the documents
 Documents in a collection need not have an identical set of fields.
 In practice, it is common for the documents in a collection to have a largely
homogeneous structure; however, this is not a requirement
 The structure of documents can be changed simply by adding new fields or deleting
existing ones (which simplifies and facilitates iterative software development)
 Schema Design is still important!
 Types of queries the application will perform
 How objects are managed in application code
 How documents will change over time
Repetition of publisher data If the number of books per publisher
is small with limited growth
To avoid mutable, growing arrays,
store the publisher reference inside
the book document
If your application frequently retrieves
the address data with the name information,
then your application needs to issue multiple queries
With the embedded data model,
your application can retrieve the
complete patron information with one query
MongoDB
 Core processes
 mongod – database process
 mongos – controller/query router of sharded clusters
 mongo – interactive mongoDB shell
 Import / Export Tools
 Binary
 mongodump – create BSON dump files
 mongorestore – restore BSON dump files
 Bsondump – convert BSON dump files to JSON
 mongooplog – stream oplog entries outside of normal replication
 JSON/CSV/TSV
 mongoimport – taking data
 mongoexport – export data
 Diagnostic Tools
 mongostat – status of currently running mongod or mongos instance
 mongotop - the amount of time statistics on a per-collection level a MongoDB instance spends reading and writing data
 mongosniff - a low-level operation tracing/sniffing view into database activity in real time (only in Unix)
 mongoperf - utility to check disk I/O performance independently of MongoDB
 GridFS
 Mongofiles - utility makes it possible to manipulate files stored in your MongoDB instance in GridFS objects from the command line
 Linux
 mongod --dbpath <path to data directory>
 Windows
 mongod.exe --dbpath <path to data directory>
 Rich, interactive JavaScript shell
 Included in all MongoDB distributions
 Think sqlcmd in MS SQL or sqlplus in Oracle
 Support all commands/queries, including administrative operations
 A collection can be created by inserting row(s)
 find
 Query criteria
 Projection
 Cursor modifier
 Pretty
 findOne
 update
 $set/$unset
 Replace whole document
 Array (addToSet, push, pop)
 upsert
 findAndModify
 upsert
 Remove
 New
 drop the collection
 remove
 All records
 Based on criteria (multiple rows by default)
 justOne parameter
 findAndModify with remove option
 aggregate - Aggregation Pipeline (recommended/preferred)
 mapReduce - Map Reduce
 group
 Supported drivers: Java, .NET, Ruby, PHP, JavaScript, node.js, Python,
Perl, PHP, Scala and others
 Implemented as methods or functions within the API of a specific
programming language, as opposed to a completely separate language like
SQL
 [Example here]
 Types:
 Unique Indexes
 Compound Indexes
 Array Indexes - For fields that contain an array, each array value is stored as a separate index
entry
 TTL (Time to Live) Indexes - allow the user to specify a period of time after which the data
will automatically be deleted from the database
 Geospatial Indexes - optimize queries related to location within a two dimensional space
 Sparse Indexes - allow for smaller, more efficient indexes when fields are not present in all
documents.
 Text Search Indexes - uses advanced, language-specific linguistic rules for stemming,
tokenization and stop words
 Covered Queries - Queries that return results containing only indexed fields can be
returned without reading from the source documents
MongoDB
Sharding and replica sets:
- automatic sharding provides horizontal scalability
- replica sets help prevent database downtime
 Sharding, or horizontal scaling, divides the data set and distributes the
data over multiple servers, or shards. Each shard is an independent
database, and collectively, the shards make up a single logical database.
 Replication provides redundancy and increases data availability.
 With multiple copies of data on different database servers, replication
protects a database from the loss of a single server
 Find/Identify/Target the most frequent (>80%) data access pattern
 Flexible Schema promotes “Agile”, be prepared for “Changes” to the data
model for improvements
 For storages
 Use the _id field explicitly (else will default to 12-bytes ObjectId)
 Use shorter field names
 Embed documents (data model consideration)
 Use Index & Profiling for performance
 docs.mongodb.org has very wealthy resources (offline file(s) is available
at http://docs.mongodb.org/manual/about)
 Documentation (http://docs.mongodb.org)
 Free Online Training (http://university.mongodb.com)
 Presentations (http://mongodb.com/presentations)
 Case Studies (http://mongodb.com/customers)
 http://www.newtonsoft.com/json
bembengarifin@gmail.com

MongoDB

  • 1.
    July 2015 -Tech Sharing
  • 2.
     Architecture  DataModel  Query Language  Data Management  References
  • 4.
     A storageengine is the part of a database that is responsible for managing how data is stored on disk.  Many databases support multiple storage engines, where different engines perform better for specific workloads. For example, one storage engine might offer better performance for read-heavy workloads, and another might support a higher-throughput for write operations  You can have a replica set members that use different storage engines
  • 5.
  • 6.
    Example relational datamodel for a blogging application Data as documents: simpler for developers, faster for users.
  • 9.
     Dynamic/Flexible  Collections(Tables) can be created without defining structure of the documents  Documents in a collection need not have an identical set of fields.  In practice, it is common for the documents in a collection to have a largely homogeneous structure; however, this is not a requirement  The structure of documents can be changed simply by adding new fields or deleting existing ones (which simplifies and facilitates iterative software development)  Schema Design is still important!  Types of queries the application will perform  How objects are managed in application code  How documents will change over time
  • 10.
    Repetition of publisherdata If the number of books per publisher is small with limited growth To avoid mutable, growing arrays, store the publisher reference inside the book document
  • 11.
    If your applicationfrequently retrieves the address data with the name information, then your application needs to issue multiple queries With the embedded data model, your application can retrieve the complete patron information with one query
  • 12.
  • 13.
     Core processes mongod – database process  mongos – controller/query router of sharded clusters  mongo – interactive mongoDB shell  Import / Export Tools  Binary  mongodump – create BSON dump files  mongorestore – restore BSON dump files  Bsondump – convert BSON dump files to JSON  mongooplog – stream oplog entries outside of normal replication  JSON/CSV/TSV  mongoimport – taking data  mongoexport – export data  Diagnostic Tools  mongostat – status of currently running mongod or mongos instance  mongotop - the amount of time statistics on a per-collection level a MongoDB instance spends reading and writing data  mongosniff - a low-level operation tracing/sniffing view into database activity in real time (only in Unix)  mongoperf - utility to check disk I/O performance independently of MongoDB  GridFS  Mongofiles - utility makes it possible to manipulate files stored in your MongoDB instance in GridFS objects from the command line
  • 14.
     Linux  mongod--dbpath <path to data directory>  Windows  mongod.exe --dbpath <path to data directory>
  • 15.
     Rich, interactiveJavaScript shell  Included in all MongoDB distributions  Think sqlcmd in MS SQL or sqlplus in Oracle  Support all commands/queries, including administrative operations
  • 17.
     A collectioncan be created by inserting row(s)
  • 18.
     find  Querycriteria  Projection  Cursor modifier  Pretty  findOne
  • 19.
     update  $set/$unset Replace whole document  Array (addToSet, push, pop)  upsert  findAndModify  upsert  Remove  New
  • 20.
     drop thecollection  remove  All records  Based on criteria (multiple rows by default)  justOne parameter  findAndModify with remove option
  • 21.
     aggregate -Aggregation Pipeline (recommended/preferred)  mapReduce - Map Reduce  group
  • 22.
     Supported drivers:Java, .NET, Ruby, PHP, JavaScript, node.js, Python, Perl, PHP, Scala and others  Implemented as methods or functions within the API of a specific programming language, as opposed to a completely separate language like SQL  [Example here]
  • 23.
     Types:  UniqueIndexes  Compound Indexes  Array Indexes - For fields that contain an array, each array value is stored as a separate index entry  TTL (Time to Live) Indexes - allow the user to specify a period of time after which the data will automatically be deleted from the database  Geospatial Indexes - optimize queries related to location within a two dimensional space  Sparse Indexes - allow for smaller, more efficient indexes when fields are not present in all documents.  Text Search Indexes - uses advanced, language-specific linguistic rules for stemming, tokenization and stop words  Covered Queries - Queries that return results containing only indexed fields can be returned without reading from the source documents
  • 24.
  • 25.
    Sharding and replicasets: - automatic sharding provides horizontal scalability - replica sets help prevent database downtime
  • 26.
     Sharding, orhorizontal scaling, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.
  • 27.
     Replication providesredundancy and increases data availability.  With multiple copies of data on different database servers, replication protects a database from the loss of a single server
  • 28.
     Find/Identify/Target themost frequent (>80%) data access pattern  Flexible Schema promotes “Agile”, be prepared for “Changes” to the data model for improvements  For storages  Use the _id field explicitly (else will default to 12-bytes ObjectId)  Use shorter field names  Embed documents (data model consideration)  Use Index & Profiling for performance  docs.mongodb.org has very wealthy resources (offline file(s) is available at http://docs.mongodb.org/manual/about)
  • 29.
     Documentation (http://docs.mongodb.org) Free Online Training (http://university.mongodb.com)  Presentations (http://mongodb.com/presentations)  Case Studies (http://mongodb.com/customers)  http://www.newtonsoft.com/json
  • 31.

Editor's Notes

  • #5 https://www.mongodb.com/blog/post/whats-new-mongodb-30-part-3-performance-efficiency-gains-new-storage-architecture
  • #7 http://bsonspec.org/ http://www.newtonsoft.com/json
  • #8 http://docs.mongodb.org/manual/reference/bios-example-collection/ http://bsonspec.org/faq.html
  • #9 http://docs.mongodb.org/manual/reference/sql-comparison/
  • #10 http://docs.mongodb.org/manual/faq/fundamentals/ http://docs.mongodb.org/manual/core/data-modeling-introduction/
  • #11 http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/#data-modeling-publisher-and-books
  • #14 https://docs.mongodb.org/manual/reference/program/
  • #15 http://docs.mongodb.org/manual/administration/install-on-linux/ http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
  • #16 http://docs.mongodb.org/manual/administration/scripting/
  • #17 http://docs.mongodb.org/manual/core/read-operations-introduction/
  • #19 http://docs.mongodb.org/manual/reference/operator/query/
  • #22 http://docs.mongodb.org/manual/reference/aggregation-commands-comparison/ http://docs.mongodb.org/manual/meta/aggregation-quick-reference/
  • #23 http://docs.mongodb.org/ecosystem/drivers/
  • #29 http://docs.mongodb.org/manual/administration/analyzing-mongodb-performance
  • #31 https://university.mongodb.com/