Nosql part 2


Published on

Praxis Weekend Analytics

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Nosql part 2

  1. 1. NoSQL & MongoDB..Part II Arindam Chatterjee
  2. 2. Indexes in MongoDB • Indexes support the efficient resolution of queries in MongoDB. –Without indexes, MongoDB must scan every document in a collection to select those documents that match the query statement. –These collection scans are inefficient and require the mongod to process a large volume of data for each operation. • Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. –The index stores the value of a specific field or set of fields, ordered by the value of the field. • Indexes in MongoDB are similar to indexes in other database systems. • MongoDB defines indexes at the collection level and supports indexes on any field or sub-field of the documents in a MongoDB collection.
  3. 3. Indexes in MongoDB..2 • The following diagram illustrates a query that selects documents using an index. MongoDB narrows the query by scanning the range of documents with values of score less than 30.
  4. 4. Indexes in MongoDB..3 • MongoDB can use indexes to return documents sorted by the index key directly from the index without requiring an additional sort phase. Descending
  5. 5. Indexes in MongoDB..4 Index Types • Default _id –All MongoDB collections have an index on the _id field that exists by default. If applications do not specify a value for _id the driver or the mongod will create an _id field with an ObjectID value. –The _id index is unique, and prevents clients from inserting two documents with the same value for the _id field. • Single Field –MongoDB supports user-defined indexes on a single field of a document. Example: Index on score filed (ascending)
  6. 6. Indexes in MongoDB..5 Index Types • Compound Index –These are user-defined indexes on multiple fields Example: Diagram of a compound index on the userid field (ascending) and the score field (descending). The index sorts first by the userid field and then by the score field.
  7. 7. Indexes in MongoDB..6 Index Types • Multikey Index –MongoDB uses multikey indexes to index the content stored in arrays. –If we index a field that holds an array value, MongoDB creates separate index entries for every element of the array. –These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays. –MongoDB automatically determines whether to create a multikey index if the indexed field contains an array value; we do not need to explicitly specify the multikey type.
  8. 8. Indexes in MongoDB..7 Index Types • Multikey Index: Illustration Diagram of a multikey index on the field. The addr field contains an array of address documents. The address documents contain the zip field.
  9. 9. Indexes in MongoDB..8 Other Index Types • Geospatial Index – • Text Index – – • MongoDB provides two special indexes: 2d indexes that uses planar geometry when returning results and 2sphere indexes that use spherical geometry to return results. MongoDB provides a beta text index type that supports searching for string content in a collection. These text indexes do not store language-specific stop words (e.g. “the”, “a”, “or”) and stem the words in a collection to only store root words. Hashed Index – To support hash based sharding, MongoDB provides a hashed index type, which indexes the hash of the value of a field. These indexes have a more random distribution of values along their range, but only support equality matches and cannot support range-based queries.
  10. 10. Indexes in MongoDB..9 Explicit creation of Index • Using ensureIndex() from shell – The following creates an index on the phone-number field of the people collection • db.people.ensureIndex( { "phone-number": 1 } ) . – The following operation will create an index on the item, category, and price fields of the products collection • db.products.ensureIndex( { item: 1, category: 1, price: 1 } ) – unique constraint prevent applications from inserting documents that have duplicate values for the inserted fields. The following example creates a unique index on the "tax-id": of the accounts collection to prevent storing multiple account records for the same legal entity • db.accounts.ensureIndex( { "tax-id": 1 }, { unique: true } ) – ensureIndex() only creates an index if an index of the same specification does not already exist.
  11. 11. Indexes in MongoDB..10 Indexing Strategies • Create Indexes to Support Your Queries – • Use Indexes to Sort Query Results – • To support efficient queries, use the strategies here when you specify the sequential order and sort order of index fields. Ensure Indexes Fit in RAM – • An index supports a query when the index contains all the fields scanned by the query. Creating indexes that supports queries results in greatly increased query performance. When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing. Create Queries that Ensure Selectivity – Selectivity is the ability of a query to narrow results using the index. Selectivity allows MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
  12. 12. Indexes in MongoDB..11 • Indexes to Support Queries – – For commonly issued queries, create indexes. If a query searches multiple fields, create a compound index. Scanning an index is much faster than scanning a collection. Consider a posts collection containing blog posts, and if we need to regularly issue a query that sorts on the author_name field, then we can optimize the query by creating an index on the author_name field • db.posts.ensureIndex( { author_name : 1 } ) – If we regularly issue a query that sorts on the timestamp field, then we can optimize the query by creating an index on the timestamp field • db.posts.ensureIndex( { timestamp : 1 } ) If we want to limit the results to reduce network load, we can use limit() • db.posts.find().sort( { timestamp : -1 } ).limit(10) [
  13. 13. Indexes in MongoDB..12 • Index Administration – – • Detailed information about indexes is stored in the system.indexes collection of each database. system.indexes is a reserved collection, so we cannot insert documents into it or remove documents from it. We can manipulate its documents only through ensureIndex and the dropIndexes database command. Running Index at Background – Building indexes is time-consuming and resource-intensive. Using the {"background" : true} option builds the index in the background, while handling incoming requests. • > db.people.ensureIndex({"username" : 1}, {"background" : true}) – – If we do not include the “background” option, the database will block all other requests while the index is being built. Creating indexes on existing documents is faster than creating the index first and then inserting all of the documents.
  14. 14. Indexes in MongoDB..12 • Do’s and Do not’s – Create index only on the keys required for the query • Indexes create additional overhead on the database • Insert, Update and Delete operations become slow with too many idexes – Index direction is important if there are more than one keys • Index with {"username" : 1, "age" : -1} and {"username" : 1, "age" : 1} have different connotation – – – There is a built-in maximum of 64 indexes per collection, which is more than almost any application should need. Delete Index with “dropIndexes” if it is not required Sometimes the most efficient solution is actually not to use an index. In general, if a query is returning a half or more of the collection, it will be more efficient for the database to just do a table scan instead of having to look up the index and then the value for almost every single document.
  15. 15. Exercise 2 • Insert records in collection userdetail – – – – – – – – – – – • {"username" : "smith", "age" : 48, "user_id" : 0 } {"username" : "smith", "age" : 30, "user_id" : 1 } {"username" : "john", "age" : 36, "user_id" : 2 } {"username" : "john", "age" : 18, "user_id" : 3 } {"username" : "joe", "age" : 36, "user_id" : 4 } {"username" : "john", "age" : 7, "user_id" : 5 } {"username" : "simon", "age" : 3, "user_id" : 6 } {"username" : "joe", "age" : 27, "user_id" : 7 } {"username" : "jacob", "age" : 17, "user_id" : 8 } {"username" : "sally", "age" : 52, "user_id" : 9 } {"username" : "simon", "age" : 59, "user_id" : 10 } Run the ensureIndex operation – db.userdetail.ensureIndex({"username" : 1, "age" : -1})
  16. 16. Data Modelling in MongoDB
  17. 17. Data Modelling in MongoDB • MongoDB has flexible Schema unlike Relational Databases. We need not declare Table’s schema before inserting data. • MongoDB’s collections do not enforce document structure • There are 2 ways of mapping Relationships –References –Embedded Documents Example: References • Both the “contact” and “access” documents contain a reference to the “user” document. • These are normalized data models
  18. 18. Data Modelling in MongoDB..2 Example: Embedded Documents “contact” and “access” are subdocuments embedded in main document. This is a “denormalized” data model
  19. 19. Data Modelling in MongoDB..3 References vs. Embedded Documents References: Used when Embedded documents: Used when • embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication. • we have “contains” relationships between entities. • to represent more complex manyto-many relationships. • to model large hierarchical data sets. • we have one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents. • We need applications to store related pieces of information in the same database record.
  20. 20. Data Modelling in MongoDB..4 One to many relationships : Example where Embedding is advantageous Using References Using Embedded documents { { _id: “chat", name: "ABC Chat" _id: "chat", name: "ABC Chat", addresses: [ { street: "10 Simla Street", city: "Kolkata", zip: 700006 }, { street: "132 Lanka Street", zip: 400032 } ] } { patron_id: "chat", street: "10 Simla Street", city: "Kolkata", zip: 700006 } { patron_id: "chat", street: "132 Lanka Street", city: "Mumbai", zip: 400032 } } Issue with above: If the application frequently retrieves the address data with the name information, then your application needs to issue multiple queries to resolve the references With the embedded data model, the application can retrieve the complete patron information with one query.
  21. 21. Data Modelling in MongoDB..5 One to many relationships : Example where referencing is advantageous Using Embedded documents Using Reference { { title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O'Reilly Media", location: "CA", } _id: "oreilly", name: "O'Reilly Media", location: "CA" } { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" } { title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher: { name: "O'Reilly Media", location: "CA", } } { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher_id: "oreilly" } Issue with above: Embedding leads to repetition of publisher data. } Publisher Information kept separately in the above example to avoid repetition.
  22. 22. Data Modelling in MongoDB..6 Tree structure with parent references
  23. 23. Data Modelling in MongoDB..7 Modelling Tree structure with Parent reference • The following lines of code describes the tree structure in previous slide – – – – – – • db.categories.insert( { _id: "MongoDB", parent: "Databases" } ) db.categories.insert( { _id: “dbm", parent: "Databases" } ) db.categories.insert( { _id: "Databases", parent: "Programming" } ) db.categories.insert( { _id: "Languages", parent: "Programming" } ) db.categories.insert( { _id: "Programming", parent: "Books" } ) db.categories.insert( { _id: "Books", parent: null } ) The query to retrieve the parent of a node – db.categories.findOne( { _id: "MongoDB" } ).parent; • Query by the parent field to find its immediate children nodes – db.categories.find( { parent: "Databases" } );
  24. 24. Data Modelling in MongoDB..8 Modelling Tree structure with Child reference • The following lines of code describes the sametree structure db.categories.insert( { _id: "MongoDB", children: [] } ); db.categories.insert( { _id: “dbm", children: [] } ); db.categories.insert( { _id: "Databases", children: [ "MongoDB", “dbm" ] } ); db.categories.insert( { _id: "Languages", children: [] } ) db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } ); – db.categories.insert( { _id: "Books", children: [ "Programming" ] } ); – – – – – • The query to retrieve the immediate child of a node – db.categories.findOne( { _id: "Databases" } ).children; • Query by the child field to find its parent nodes – db.categories.find( { children: "MongoDB" } );
  25. 25. Data Modelling in MongoDB..8 Data Modelling for “Atomic” operations • Example (Online purchase portal): – Step I: Insert data in a collection called “books” including the number of available copies – Step II: Check if the book is available during checkout Code – Step I: ({ _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly", available: 3, checkout: [ { by: "joe", date: ISODate("2012-10-15") } ] });
  26. 26. Data Modelling in MongoDB..9 Data Modelling for “Atomic” operations Code – Step II ( { query: { _id: 123456789, available: { $gt: 0 } }, update: { $inc: { available: -1 }, $push: { checkout: { by: "abc", date: new Date() } } } } ); – In the above example, db.collection.findAndModify() method is used to atomically determine if a book is available for checkout and update with the new checkout information. – Embedding the available field and the checkout field within the same document ensures that the updates to these fields are in sync:
  27. 27. Data Modelling in MongoDB..10 Keyword based Search Example: Perform a keyword based search in a collection “volumes” – Step I: Insert data in a collection “volumes” db.volumes.insert ({ title : "Moby-Dick" , author : "Herman Melville" , published : 1851 , ISBN : 0451526996 , topics : [ "whaling" , "allegory" , "revenge" , "American" , "novel" , "nautical" , "voyage" , "Cape Cod" ] }); In the above example, several topics are included on which we can perform keyword search – Step II: create a multi-key index on the topics array db.volumes.ensureIndex( { topics: 1 } ) – Step III: Search based on keyword “voyage” • db.volumes.findOne( { topics : "voyage" }, { title: 1 } )
  28. 28. Exercise • • Create a collection named product meant for albums. The album can have several product types including Audio Album and Movie. Record of Audio album can be created with the following attributes – – • Record 1 (music Album) sku (character, unique identifier), type-Audio Album ,title:” Remembering Manna De”, description “By Music lovers”, physical_description (weight, width, height, depth), pricing (list, retail, savings, pct_savings), details (title, artist,genre (“bengali modern”, “bengali film”), tracks (“birth”, “childhood”, “growing up”, “end”) Record 2 (movie) with similar details and description pertaining to movie (e.g. director, writer, music director, actors) Assignment – – Write a query to return all products with a discount>10% Write a query which will return the documents for the albums of a specific genre, sorted in reverse chronological order – Write a query which selects films that a particular actor starred in, sorted by issue date