Nosql part1 8th December


Published on

Weekend Business Analytics Praxis

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Nosql part1 8th December

  1. 1. NoSQL & MongoDB Part I Arindam Chatterjee 1
  2. 2. Introduction to NoSQL • • NoSQL stands for “Not only SQL” NoSQL is – SQL for non-relational database management system – Different from traditional relational database system – designed for distributed data storage that • typically not requires fixed schema, • avoid join operations and • scale horizontally Used by Facebook, Google and other applications requiring large volume of unstructured Web application data 2
  3. 3. History of NoSQL • RDBMS systems have limitations with respect to the following – Scalability, – Parallelization – Cost • • Example: Google that gets billions of requests a month across applications which are geographically distributed. The above led to research on the following concepts – – – – GFS: Distributed files System Chubby: Distributed coordination system MapReduce: Parallel execution system Big Data: Column oriented database 3
  4. 4. NoSQL..Where to use • NoSQL is useful in the following cases – Online stores and portals like amazon where the transaction of an individual should not “lock” a database or part of a database – Where “committed” transaction is not critical (e.g. a buyer orders an item and someone else clicks for the same item at the same time, one of them may end up not getting the item if the same is the last piece left. An “apology” mail and refund can sort the matter. – Cost • NoSQL SHOULD NOT be used in the following cases – Stock exchanges or banking where transactions are critical, cached or state data will just not work – Other non financial transactions where completion of transactions are critical 4
  5. 5. Benefits of NoSQL • Schemaless data representation: – Almost all NoSQL implementations offer schema-less data representation. This means that we do not have to think too far ahead to define a structure and we can continue to evolve over time, including adding new fields or even nesting the data, for example, in case of JSON representation. • Development time: – reduced development time time because one doesn’t have to deal with complex SQL queries and joins • Speed: – NoSQL databases are much faster than relational databases • Ability to plan ahead for scalability: – The applications can be quite elastic, can handle sudden spikes of load. – Provides horizontal scalability and partitioning to new servers 5
  6. 6. List of NoSQL Databases • • • • • Document Based: MongoDB, CouchDB, RavenDB, Terrastore Key Value: Redis, Membase, Voldemort XML based: BaseX, eXist Column based: BigTable, Hadoop/HBase, Cassandra, SimpleDB, Cloudera Graph bases: Neo4J, FLockDB, InfiniteGraph 6
  7. 7. Storage Types for NoSQL Databases • Column Oriented storage – – • Data stored as columns as opposed to rows (in traditional RDBMS) Used for On Line Analytical Processing type databases Example: We want to store the following information Employee ID 1234 3242 5678 4543 • First Name Asim Noel Raj Rohan Last Name Das David Malhotra Singh Dept HR Marketing Production R&D Advantages: – New column can be added without worrying for filling up default values of existing rows Efficient for computing maxima, minima, averages and sums, specifically on large datasets Traditional RDBMS Approach Column Oriented Approach • Data serialized as follows • Data stored as follows 1234, Asim, Das, HR 3242, Noel, David, Marketing 5678, Raj, Malhotra, Production 4543, Rohan, Singh, R&D 1234, 3242, 5678, 4543 Asim, Noel, Raj, Rohan Das,David, Malhotra, Singh HR, Marketing, Production, R&D 7
  8. 8. Storage Types for NoSQL Databases..2 • Document Oriented storage – – – – – • Allows the inserting, retrieving, and manipulating of semi-structured data Documents themselves act as records (or rows) two records may have completely different set of fields or columns The records may or may not adhere to a specific schema Most of the databases available under this category use XML, JSON, BSON data types Example: Different record contain different level of Employee information as follows Record 1 {"EmployeeID": "SM1", "FirstName" : "Anuj", "LastName" : "Sharma", "Age" : 45, "Salary" : 10000000 } Record 2 {"EmployeeID": "MM2", "FirstName" : "Anand", "Age" : 34, "Salary" : 5000000, "Address" : { "Line1" : "123, 4th Street", "City" : "Bangalore", "State" : "Karnataka" }, "Projects" : [ "nosql-migration", "top-secret-007" ]} 8
  9. 9. Storage Types for NoSQL Databases..3 • Key value storage – – – • Similar to document oriented storage with the following differences Unlike a document store that can create a key when a new document is inserted, a key-value store requires the key to be specified Unlike a document store where the value can be indexed and queried, for a key-value store, the value is opaque and as such, the key must be known to retrieve the value Advantages: – – Key-value stores are optimized for querying against keys. They serve great in-memory caches. 9
  10. 10. RDBMS vs NoSQL NoSQL RDBMS • • • • • • Structured and organized data Structured query language (SQL) Data and its relationships are stored in separate tables. Follows ACID rules Data Manipulation Language, Data Definition Language Tight Consistency ACID: Atomic Consistent Isolated Durable • • • • • • • No declarative query language No predefined schema Key-Value pair storage, Column Store, Document Store, Graph databases Eventual consistency rather ACID property Unstructured and unpredictable data CAP Theorem Prioritizes high performance, high availability and scalability BASE: Basically Available Soft State Eventual Consistency CAP: Consistency Availability Partition Tolerance 10
  11. 11. CAP • CAP theorem (Brewer’s Theorem) : Three basic requirements which exist in a special relation when designing applications for a distributed architecture. – – – • • • Consistency : This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data. Availability : This means that the system is always on (service guarantee availability), no downtime. Partition Tolerance : This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. It is theoretically impossible to fulfill all 3 requirements C, A and P CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. 11
  12. 12. BASE • BASE system gives up on Consistency – Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem. – Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. – Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time. 12
  13. 13. MongoDB 13
  14. 14. MongoDB • • Open Source database written in C++. Document Oriented database – • Used to store data for very high performance applications with unforeseen growth in data – • Example format : FirstName="Arun", Address="St. Xavier's Road", Spouse=[{Name:"Kiran"}], Children=[{Name:"Rihit", Age:8}] – If load increases (more storage space, more processing power), it can be distributed to other nodes across computer networks (sharding) MongoDB supports Map/Reduce framework for batch processing of data and aggregation operation – – Map : A master node takes an input. Splits it into smaller sections. Sends it to the associated nodes. These nodes may perform the same operation in turn to send those smaller section of input to other nodes. It processes the problem (taken as input) and sends it back to the Master Node. Reduce : The master node aggregates those results to find the output. 14
  15. 15. RDBMS vs. MongoDB RDBMS Record/Row Table Column Value Index Table join MongoDb Document/Object Collection Key, field Value Index embedded documents and linking 15
  16. 16. NoSQL operations in MongoDB • Creating Table (Collections) Other SQL Schema MongoDb statement CREATE TABLE users ( id MEDIUMINT NOT NULL AUTO_INCREMENT, user_id Varchar(30), age Number, status char(1), PRIMARY KEY (id) db.users.insert( { user_id: "abc123", age: 55, status: "A" }) ) db.createCollection("users") Alternatively, In MongoDB, collections are implicitly created on first insert() operation. The primary key _id is automatically added if _id field is not specified. Reference: See insert() and db.createCollection() for more information. 16
  17. 17. NoSQL operations in MongoDB • Altering Table (Collections) Other SQL Schema Adding a Column ALTER TABLE users ADD join_date DATETIME Dropping a Column ALTER TABLE users DROP column join_date MongoDb statement Adding a field db.users.update( { }, { $set: { join_date: new () } }, { multi: true } ) Dropping a field db.users.update( { }, { $unset: { join_date: “” } }, { multi: true } ) Collections do not describe or enforce the structure of its documents; i.e. there is no structural alteration at the collection level for adding/removing fields Reference: See the Data Modeling Considerations for MongoDB Applications, update(), $set and $unset for more information 17
  18. 18. NoSQL operations in MongoDB • INSERT and SELECT operations Other SQL Schema Inserting data INSERT INTO users(user_id, age, status) VALUES (“abc001", 35, “U") SELECT operation SELECT * FROM users WHERE status = "A" MongoDb statement Inserting data db.users.insert( { user_id: “abc001", age: 35, status: “U" }) Find operation db.users.find( { status: "A" } ) Reference: See insert() and find() for more information Use pretty() to display data in formatted way: db.users.find().pretty(); 18
  19. 19. NoSQL operations in MongoDB • UPDATE and DELETE operations Other SQL Schema UPDATE MongoDb statement UPDATE UPDATE users SET status = "C" db.users.update( { age: { $gt: 10 } }, { $set: { status: "C" } }, { multi: true } ) WHERE age > 10 UPDATE users SET age = age + 5 WHERE status = “U" DELETE DELETE FROM users WHERE status = "D" db.users.update( { status: “U" } , { $inc: { age: 5 } }, { multi: true } ) REMOVE db.users.remove( { status: "D" } ) Reference: See update(), $gt, $inc , $set and remove() for more information 19
  20. 20. More examples in MongoDB • Run the database(Windows): – – – • Connect to the database: – – – – • • Open Command prompt Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-1004/bin Run mongod.exe Open Command prompt Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-1004/bin Run mongo A mongo “shell” will open Show database: show dbs Select a database: use <database name> 20
  21. 21. More examples in MongoDB..2 • Switch to database testData (use testData;) • Task I: Insert data directly : The following operation inserts a row/document in Collections testData – db.testData.insert({ name : "OtherDB" } ); • Task 2: Insert data with JavaScript operations : The following operation inserts 2 rows/ documents in Collections testData j = { name : "mongo" } k={x:3} db.testData.insert( j ); db.testData.insert( k ); • Task 3: Check to see that the 3 records are inserted in the collections testData – db.testData.find(); 21
  22. 22. More examples in MongoDB..3 Inserting multiple documents using a For loop • Task : use the following loop from the mongo shell – for (var i = 1; i <= 25; i++) db.testData.insert( { x : i } ) • Use find() to see the result. 25 records will be shown – db.testData.find() Note: If the collection and database do not exist, MongoDB creates them implicitly before inserting documents. 22
  23. 23. More examples in MongoDB..4 Queries with conditions • Task : In the above example, 25 rows were created. We want to show the rows there x is less than 15. We also want to limit to first 5 rows in the display. – db.testData.find( { "x": { $lt: 15 } }).limit(5) Condition : x<15 Limit to 5 rows 23
  24. 24. More examples in MongoDB..5 Inserting with explicit “id” • Task : Insert a record in collections named “inventory” with explicit id, type and quantity. – db.inventory.insert( { _id: 10, type: "misc", item: "card", qty: 15 } ); Explicit ID Inserting with update() method • Call the update() method with the upsert flag to create a new document if no document matches the update’s query criteria. . –db.inventory.update( { type: "book", item : "journal" }, { $set : { qty: 10 } }, { upsert : true } ); The above example creates a new document if no document in the inventory collection contains { type: "books", item : "journal" } and assigns an unique ID 24
  25. 25. More examples in MongoDB..6 Inserting with save() method • To insert a document with the save() method, pass the method a document that does not contain the _id field or a document that contains an _id field that does not exist in the collection. . – { type: "book", item: "notebook", qty: 40 } ) The above example creates a new document in the inventory collection , adds the ID field and assigns an unique ID 25
  26. 26. More examples in MongoDB..6 Conditional queries • Task: Select all documents in the inventory collection where the value of the type field is either 'food' or 'snacks‘ –db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } ); • Task: “AND” condition- specifying an equality match on the field food AND a less than ($lt) comparison match on the field price –db.inventory.find( { type: 'food', price: { $lt: 9.95 } } ); • Task: “OR” condition- the query document selects all documents in the collection where the field qty has a value greater than ($gt) 100 OR the value of the price field is less than ($lt) 9.95 –db.inventory.find( { $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } ); 26
  27. 27. More examples in MongoDB..7 Compound queries (using “AND” and “OR” both) • Task: Select all documents in the collection where the value of the type field is 'food' and either the qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95: –db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } ); 27
  28. 28. More examples in MongoDB..8 Matching on “subdocuments” • When the field holds an embedded document (i.e. subdocument), we can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using “dot” notation, to specify values for individual fields in the subdocument. • In the following example, the query matches all documents where the value of the field producer is a subdocument that contains only the field company with the value 'ABC123' and the field address with the value '123 Street', in the exact order: –db.inventory.find( { producer: { company: 'ABC123', address: '123 Street' } }); • In the following example, the query uses the dot notation to match all documents where the value of the field producer is a subdocument that contains a field company with the value 'ABC123' and may contain other fields –db.inventory.find( { '': 'ABC123' } ); 28
  29. 29. More examples in MongoDB..9 Matching on Arrays • To specify equality match on an array, use the query document { <field>: <value> } where <value> is the array to match. Equality matches on the array require that the array field match exactly the specified <value>, including the element order. • Exact Match: In the following example, the query matches all documents where the value of the field tags is an array that holds exactly three elements, 'fruit', 'food', and 'citrus', in this order: db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } ); – • Matching Array Elements: In the following example, the query matches all documents where the value of the field tags is an array that contains 'fruit' as one of its elements: –db.inventory.find( { tags: 'fruit' } ); • In the following example, the query uses the dot notation to match all documents where the value of the tags field is an array whose first element equals 'fruit‘. –db.inventory.find( { 'tags.0' : 'fruit' } ) 29
  30. 30. More examples in MongoDB..10 Array of subdocuments • Match a Field in the Subdocument Using the Array Index :The following example selects all documents where the memos contains an array whose first element (i.e. index is 0) is a subdocument with the field by with the value 'shipping': –db.inventory.find( { '': 'shipping' } ) . • Match a Field without specifying Array Index: The following example selects all documents where the memos field contains an array that contains at least one subdocument with the field by with the value 'shipping': –db.inventory.find( { '': 'shipping' } ) • Match multiple Fields: The following example uses dot notation to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping': –db.inventory.find( { 'memos.memo': 'on time', '': 'shipping' } ) 30
  31. 31. More examples in MongoDB..11 Using findOne() db.collection.findOne(<criteria>, <projection>) • • • The above returns one document that satisfies the specified query criteria. If multiple documents satisfy the query, this method returns the first document according to the natural order which reflects the order of documents on the disk. The <projection> parameter takes a document in the following form –{ field1: <boolean>, field2: <boolean> ... } –Boolean can be 1(true, to include) or 0(false, to exclude) Example: Create a collection named bios with multiple fields. Return “name”, “contribs” and “_id” fields: db.bios.findOne( { }, { name: 1, contribs: 1 } ) 31
  32. 32. Exercise I • Go to database “test” •Insert data in a collection named userdetails with the following attributes “user_id" : "ABCDBWN","password" :"ABCDBWN" ,"date_of_join" : "15/10/2010" ,"education" :"B.C.A." , "profession" : "DEVELOPER","interest" : "MUSIC","community_name" :["MODERN MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR. JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" : ["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]}); • View the inserted data using find() and pretty() •Insert another set of data in the same collection with the following –{"user_id" : "testuser","password" :"testpassword" ,"date_of_join" : "16/10/2010" ,"education" :"M.C.A." , "profession" : "CONSULTANT","interest" : "MUSIC","community_name" :["MODERN MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR. JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" : ["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]} 32
  33. 33. Exercise I..contd •Use update() to change password to “Newpd” and date_of_join to 12/12/2010 for user id "ABCDBWN” •Fetch only the "user_id" for all documents from the collection 'userdetails' which hold the educational qualification "M.C.A •Fetch the "user_id" , "password" and "date_of_join" for all documents from the collection 'userdetails' which hold the educational qualification "M.C.A." •Remove one record from collection userdetails where userid= testuser •Remove the entire collection userdetails using drop() 33