This document provides an introduction to NoSQL and MongoDB. It explains that NoSQL is a non-relational database designed for large volumes of unstructured data across distributed systems. It discusses the history and limitations of relational databases that led to the development of NoSQL technologies. The document also outlines different NoSQL database types, compares NoSQL to SQL databases, and provides an overview of MongoDB's features and operations."
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
This document discusses document databases and MongoDB. It defines documents as the main concept, which are simply named collections of fields that can be in formats like JSON, XML, or BSON. It covers designing document databases through modeling data as documents, denormalizing or normalizing documents, handling complex relations, indexing, and summarizing. Features of MongoDB like consistency, replication, transactions, availability, querying and scaling are examined. Examples of suitable use cases and when not to use document databases are provided. The document includes samples of documents, architectures, cases for product catalogs and order histories, and MongoDB tools.
This document discusses different types of distributed databases. It covers data models like relational, aggregate-oriented, key-value, and document models. It also discusses different distribution models like sharding and replication. Consistency models for distributed databases are explained including eventual consistency and the CAP theorem. Key-value stores are described in more detail as a simple but widely used data model with features like consistency, scaling, and suitable use cases. Specific key-value databases like Redis, Riak, and DynamoDB are mentioned.
MongoDB is a horizontally scalable, schema-free, document-oriented NoSQL database. It stores data in flexible, JSON-like documents, allowing for easy storage and retrieval of data without rigid schemas. MongoDB provides high performance, high availability, and easy scalability. Some key features include embedded documents and arrays to reduce joins, dynamic schemas, replication and failover for availability, and auto-sharding for horizontal scalability.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
This presentation is related to nosql database and nosql database types information. this presentationa also contains discussion about, how mongodb works and mongodb security and mongodb sharding information.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
This document discusses document databases and MongoDB. It defines documents as the main concept, which are simply named collections of fields that can be in formats like JSON, XML, or BSON. It covers designing document databases through modeling data as documents, denormalizing or normalizing documents, handling complex relations, indexing, and summarizing. Features of MongoDB like consistency, replication, transactions, availability, querying and scaling are examined. Examples of suitable use cases and when not to use document databases are provided. The document includes samples of documents, architectures, cases for product catalogs and order histories, and MongoDB tools.
This document discusses different types of distributed databases. It covers data models like relational, aggregate-oriented, key-value, and document models. It also discusses different distribution models like sharding and replication. Consistency models for distributed databases are explained including eventual consistency and the CAP theorem. Key-value stores are described in more detail as a simple but widely used data model with features like consistency, scaling, and suitable use cases. Specific key-value databases like Redis, Riak, and DynamoDB are mentioned.
MongoDB is a horizontally scalable, schema-free, document-oriented NoSQL database. It stores data in flexible, JSON-like documents, allowing for easy storage and retrieval of data without rigid schemas. MongoDB provides high performance, high availability, and easy scalability. Some key features include embedded documents and arrays to reduce joins, dynamic schemas, replication and failover for availability, and auto-sharding for horizontal scalability.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
This presentation is related to nosql database and nosql database types information. this presentationa also contains discussion about, how mongodb works and mongodb security and mongodb sharding information.
MongoDB NoSQL database a deep dive -MyWhitePaperRajesh Kumar
This document provides an overview of MongoDB, a popular NoSQL database. It discusses why NoSQL databases were created, the different types of NoSQL databases, and focuses on MongoDB. MongoDB is a document-oriented database that stores data in JSON-like documents with dynamic schemas. It provides horizontal scaling, high performance, and flexible data models. The presentation covers MongoDB concepts like databases, collections, documents, CRUD operations, indexing, sharding, replication, and use cases. It provides examples of modeling data in MongoDB and considerations for data and schema design.
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCLaura Ventura
One of the most popular NoSQL databases, MongoDB is one of the building blocks for big data analysis. MongoDB can store unstructured data and makes it easy to analyze files by commonly available tools. This session will go over how big data analytics can improve sales outcomes in identifying users with a propensity to buy by processing information from social networks. All attendees will have a MongoDB instance on a public cloud, plus sample code to run Big Data Analytics.
This document provides information about MongoDB, including:
- MongoDB is a cross-platform document-oriented database that provides high performance, high availability, and easy scalability.
- Data is stored in MongoDB in the form of JSON-like documents with dynamic schemas, instead of using fixed table schemas as in SQL-based databases.
- Relationships between documents can be modeled either by embedding one document inside another or by storing references between separate documents.
MongoDB is a document-oriented NoSQL database that stores data as JSON-like documents. It is schema-less, scales easily, supports dynamic queries on documents, and stores data in BSON format. MongoDB is good for high write loads, high availability, large and changing datasets. Installation is simple, and it supports replication and sharding for availability and scaling. Data can be embedded or referenced between documents. Indexes and text search are supported. Programming involves JavaScript and MongoDB methods.
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download the slides
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
The presentation provides an overview of NoSQL databases, including a brief history of databases, the characteristics of NoSQL databases, different data models like key-value, document, column family and graph databases. It discusses why NoSQL databases were developed as relational databases do not scale well for distributed applications. The CAP theorem is also explained, which states that only two out of consistency, availability and partition tolerance can be achieved in a distributed system.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability and performance, as well as flexibility in schemas. Cassandra finds use in large companies like Facebook, Netflix and eBay due to its abilities to scale and perform well under heavy loads. However, it may not be suited for applications requiring many joins, transactions or strong consistency guarantees.
- Mongo DB is an open-source document database that provides high performance, a rich query language, high availability through clustering, and horizontal scalability through sharding. It stores data in BSON format and supports indexes, backups, and replication.
- Mongo DB is best for operational applications using unstructured or semi-structured data that require large scalability and multi-datacenter support. It is not recommended for applications with complex calculations, finance data, or those that scan large data subsets.
- The next session will provide a security and replication overview and include demonstrations of installation, document creation, queries, indexes, backups, and replication and sharding if possible.
Cassandra from the trenches: migrating Netflix (update)Jason Brown
Update talk on Cassandra at Netflix, presented at the Silicon Valley NoSQL meetup on 9 Feb 2012. Includes an introduction to Astyanax, an open source cassandra client written in java.
The document compares NoSQL and SQL databases. It notes that NoSQL databases are non-relational and have dynamic schemas that can accommodate unstructured data, while SQL databases are relational and have strict, predefined schemas. NoSQL databases offer more flexibility in data structure, but SQL databases provide better support for transactions and data integrity. The document also discusses differences in queries, scaling, and consistency between the two database types.
This document provides an introduction and overview of NoSQL concepts and MongoDB database. It begins with the purpose of guiding beginners and discusses how the growth of data led to the development of NoSQL technologies. It then covers the history of databases, defines key terms, and describes the different types of NoSQL databases like key-value, column-oriented, document-oriented and graph oriented. Specifics about MongoDB are provided, including conceptual understanding, basic operations like insert and find, and comparison operators. The document aims to make learning MongoDB and NoSQL easy and fun for beginners.
Bioinformaticians constantly face challenges with data: from the large volumes of data to the need to integrate diverse data types. Relational databases have a long and successful history of managing data but have been unable to meet emerging needs of big data and highly integrated data stores. This talk discusses the limitations we face when using relational data models for bioinformatics applications. It describes the features, limitations and use cases of four alternative database models: key value databases, document databases, wide column data stores and graph databases. Use in bioinformatics applications is demonstrate with text mining and atherosclerosis research projects. The talk concludes with guidance on choosing an appropriate database model for varying bioinformatics requirements.
MongoDB NoSQL database a deep dive -MyWhitePaperRajesh Kumar
This document provides an overview of MongoDB, a popular NoSQL database. It discusses why NoSQL databases were created, the different types of NoSQL databases, and focuses on MongoDB. MongoDB is a document-oriented database that stores data in JSON-like documents with dynamic schemas. It provides horizontal scaling, high performance, and flexible data models. The presentation covers MongoDB concepts like databases, collections, documents, CRUD operations, indexing, sharding, replication, and use cases. It provides examples of modeling data in MongoDB and considerations for data and schema design.
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYCLaura Ventura
One of the most popular NoSQL databases, MongoDB is one of the building blocks for big data analysis. MongoDB can store unstructured data and makes it easy to analyze files by commonly available tools. This session will go over how big data analytics can improve sales outcomes in identifying users with a propensity to buy by processing information from social networks. All attendees will have a MongoDB instance on a public cloud, plus sample code to run Big Data Analytics.
This document provides information about MongoDB, including:
- MongoDB is a cross-platform document-oriented database that provides high performance, high availability, and easy scalability.
- Data is stored in MongoDB in the form of JSON-like documents with dynamic schemas, instead of using fixed table schemas as in SQL-based databases.
- Relationships between documents can be modeled either by embedding one document inside another or by storing references between separate documents.
MongoDB is a document-oriented NoSQL database that stores data as JSON-like documents. It is schema-less, scales easily, supports dynamic queries on documents, and stores data in BSON format. MongoDB is good for high write loads, high availability, large and changing datasets. Installation is simple, and it supports replication and sharding for availability and scaling. Data can be embedded or referenced between documents. Indexes and text search are supported. Programming involves JavaScript and MongoDB methods.
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download the slides
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
The presentation provides an overview of NoSQL databases, including a brief history of databases, the characteristics of NoSQL databases, different data models like key-value, document, column family and graph databases. It discusses why NoSQL databases were developed as relational databases do not scale well for distributed applications. The CAP theorem is also explained, which states that only two out of consistency, availability and partition tolerance can be achieved in a distributed system.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability and performance, as well as flexibility in schemas. Cassandra finds use in large companies like Facebook, Netflix and eBay due to its abilities to scale and perform well under heavy loads. However, it may not be suited for applications requiring many joins, transactions or strong consistency guarantees.
- Mongo DB is an open-source document database that provides high performance, a rich query language, high availability through clustering, and horizontal scalability through sharding. It stores data in BSON format and supports indexes, backups, and replication.
- Mongo DB is best for operational applications using unstructured or semi-structured data that require large scalability and multi-datacenter support. It is not recommended for applications with complex calculations, finance data, or those that scan large data subsets.
- The next session will provide a security and replication overview and include demonstrations of installation, document creation, queries, indexes, backups, and replication and sharding if possible.
Cassandra from the trenches: migrating Netflix (update)Jason Brown
Update talk on Cassandra at Netflix, presented at the Silicon Valley NoSQL meetup on 9 Feb 2012. Includes an introduction to Astyanax, an open source cassandra client written in java.
The document compares NoSQL and SQL databases. It notes that NoSQL databases are non-relational and have dynamic schemas that can accommodate unstructured data, while SQL databases are relational and have strict, predefined schemas. NoSQL databases offer more flexibility in data structure, but SQL databases provide better support for transactions and data integrity. The document also discusses differences in queries, scaling, and consistency between the two database types.
This document provides an introduction and overview of NoSQL concepts and MongoDB database. It begins with the purpose of guiding beginners and discusses how the growth of data led to the development of NoSQL technologies. It then covers the history of databases, defines key terms, and describes the different types of NoSQL databases like key-value, column-oriented, document-oriented and graph oriented. Specifics about MongoDB are provided, including conceptual understanding, basic operations like insert and find, and comparison operators. The document aims to make learning MongoDB and NoSQL easy and fun for beginners.
Bioinformaticians constantly face challenges with data: from the large volumes of data to the need to integrate diverse data types. Relational databases have a long and successful history of managing data but have been unable to meet emerging needs of big data and highly integrated data stores. This talk discusses the limitations we face when using relational data models for bioinformatics applications. It describes the features, limitations and use cases of four alternative database models: key value databases, document databases, wide column data stores and graph databases. Use in bioinformatics applications is demonstrate with text mining and atherosclerosis research projects. The talk concludes with guidance on choosing an appropriate database model for varying bioinformatics requirements.
This document provides an overview of NoSQL databases and their concepts. It begins with an introduction from the presenter and an agenda outlining the topics to be covered. The document then discusses the history and evolution of database management systems. It introduces relational database concepts and outlines some of the limitations of relational databases in handling big data. This leads to a discussion of the need for database systems beyond relational databases and a paradigm shift in database management. NoSQL databases are then defined as providing alternatives beyond the relational model. The remainder of the document covers types of NoSQL databases and their usage, as well as the future of relational databases.
The document provides an overview of SQL vs NoSQL databases. It discusses how RDBMS systems focus on ACID properties to ensure consistency but sacrifice availability and scalability. NoSQL systems embrace the CAP theorem, prioritizing availability and partition tolerance over consistency to better support distributed and cloud-scale architectures. The document outlines different NoSQL database models and how they are suited for high volume operations through an asynchronous and eventually consistent approach.
This document provides an overview and introduction to NoSQL databases. It begins with an agenda that explores key-value, document, column family, and graph databases. For each type, 1-2 specific databases are discussed in more detail, including their origins, features, and use cases. Key databases mentioned include Voldemort, CouchDB, MongoDB, HBase, Cassandra, and Neo4j. The document concludes with references for further reading on NoSQL databases and related topics.
This document provides an overview of NoSQL databases. It discusses that NoSQL databases are non-relational and do not follow the RDBMS principles. It describes some of the main types of NoSQL databases including document stores, key-value stores, column-oriented stores, and graph databases. It also discusses how NoSQL databases are designed for massive scalability and do not guarantee ACID properties, instead following a BASE model ofBasically Available, Soft state, and Eventually Consistent.
This document provides an introduction to NoSQL databases, including the motivation behind them, where they fit, types of NoSQL databases like key-value, document, columnar, and graph databases, and an example using MongoDB. NoSQL databases are a new way of thinking about data that is non-relational, schema-less, and can be distributed and fault tolerant. They are motivated by the need to scale out applications and handle big data with flexible and modern data models.
The document provides an agenda for a two-day training on NoSQL and MongoDB. Day 1 covers an introduction to NoSQL concepts like distributed and decentralized databases, CAP theorem, and different types of NoSQL databases including key-value, column-oriented, and document-oriented databases. It also covers functions and indexing in MongoDB. Day 2 focuses on specific MongoDB topics like aggregation framework, sharding, queries, schema-less design, and indexing.
This document discusses NoSQL databases and compares them to relational databases. It provides information on different types of NoSQL databases, including key-value stores, document databases, wide-column stores, and graph databases. The document outlines some use cases for each type and discusses concepts like eventual consistency, CAP theorem, and polyglot persistence. It also covers database architectures like replication and sharding that provide high availability and scalability.
This document discusses relational and non-relational databases. It begins by introducing NoSQL databases and some of their key characteristics like not requiring a fixed schema and avoiding joins. It then discusses why NoSQL databases became popular for companies dealing with huge data volumes due to limitations of scaling relational databases. The document covers different types of NoSQL databases like key-value, column-oriented, graph and document-oriented databases. It also discusses concepts like eventual consistency, ACID properties, and the CAP theorem in relation to NoSQL databases.
This document provides an introduction and overview of MongoDB. It begins with definitions of NoSQL databases and describes the main types: key-value stores, wide column stores, document stores, and graph stores. It then discusses MongoDB specifically, describing it as a free, open-source, document-oriented database that uses JSON-like documents with dynamic schemas. The document outlines how to quickly install MongoDB using Docker, and how to perform basic CRUD operations like creating databases and collections, inserting, reading, updating, and deleting documents. It also discusses some key MongoDB concepts like its support for the CAP theorem prioritizing availability and partition tolerance over strong consistency.
The document provides an overview of Big Data technology landscape, specifically focusing on NoSQL databases and Hadoop. It defines NoSQL as a non-relational database used for dealing with big data. It describes four main types of NoSQL databases - key-value stores, document databases, column-oriented databases, and graph databases - and provides examples of databases that fall under each type. It also discusses why NoSQL and Hadoop are useful technologies for storing and processing big data, how they work, and how companies are using them.
This document provides an overview of NoSQL databases and MongoDB. It states that NoSQL databases are more scalable and flexible than relational databases. MongoDB is described as a cross-platform, document-oriented database that provides high performance, high availability, and easy scalability. MongoDB uses collections and documents to store data in a flexible, JSON-like format.
The rising interest in NoSQL technology over the last few years resulted in an increasing number of evaluations and comparisons among competing NoSQL technologies From survey we create a concise and up-to-date comparison of NoSQL engines, identifying their most beneficial use from the software engineer point of view.
The document provides an overview of NoSQL and MongoDB. It discusses that NoSQL databases were built for large datasets and cloud applications. It covers some of the main types of NoSQL databases like document stores, key-value stores, and column family stores. The document also compares NoSQL to SQL/relational databases, discussing how NoSQL is more flexible and scales horizontally. MongoDB is presented as a popular document-oriented NoSQL database, covering its flexible schema, horizontal scaling, and replication features.
What is NoSQL? How does it come to the picture? What are the types of NoSQL? Some basics of different NoSQL types? Differences between RDBMS and NoSQL. Pros and Cons of NoSQL.
What is MongoDB? What are the features of MongoDB? Nexus architecture of MongoDB. Data model and query model of MongoDB? Various MongoDB data management techniques. Indexing in MongoDB. A working example using MongoDB Java driver on Mac OSX.
The document provides an overview of NoSQL databases and MongoDB. It discusses:
- What NoSQL is and why it was created
- The different categories of NoSQL databases, including key-value stores, document databases, column family stores, and graph databases
- MongoDB specifically, including its flexible schema, horizontal scalability, replication support, and data modeling approach
- Comparisons between relational and NoSQL databases
This document provides an introduction to MongoDB, a non-relational NoSQL database. It discusses what NoSQL databases are and their benefits compared to SQL databases, such as being more scalable and able to handle large, changing datasets. It then describes key features of MongoDB like high performance, rich querying, and horizontal scalability. The document outlines concepts like document structure, collections, and CRUD operations in MongoDB. It also covers topics such as replication, sharding, and installing MongoDB.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
The document provides an introduction and overview of NoSQL databases. It discusses why NoSQL databases were created, the different categories of NoSQL databases including column stores, document stores, and key-value stores. It also provides an overview of Hadoop, describing it as a framework that allows distributed processing of large datasets across computer clusters.
This document discusses how to implement operations like selection, joining, grouping, and sorting in Cassandra without SQL. It explains that Cassandra uses a nested data model to efficiently store and retrieve related data. Operations like selection can be performed by creating additional column families that index data by fields like birthdate and allow fast retrieval of records by those fields. Joining can be implemented by nesting related entity data within the same column family. Grouping and sorting are also achieved through additional indexing column families. While this requires duplicating data for different queries, it takes advantage of Cassandra's strengths in scalable updates.
Why noSQL? Unstructured data now represents most data. NoSQL databases allow for flexible schemas and easy scaling to accommodate dynamic user numbers. NoSQL databases are also often open source.
The main types of NoSQL databases are key-value stores, document databases, column-oriented databases, and graph databases. Couchbase is a high performance, scalable NoSQL database that can be used as a cache, key-value store, or document database. It uses a distributed architecture and allows zero downtime operations like adding or removing nodes.
The One With The Wizards and Dragons. PrelimsRuru Chowdhury
Dumbledore returned to Hogwarts as the Professor of Transfiguration. During the battle with Voldemort in the Department of Mysteries, he used his skill in Transfiguration to force Voldemort to retreat.
The quiz questions covered a range of topics relating to Harry Potter, Game of Thrones, Friends, and other pop culture works. Players were asked to identify characters, plot points, and connections between the different fictional universes.
Rules for the quiz included points being awarded for each correct answer, with some starred questions worth more points if answered right. The top eight highest scoring teams would qualify for the next round.
This document provides an overview of time series forecasting techniques. It discusses the components of time series data including trends, cycles, seasonality and irregular fluctuations. It also covers stationary and non-stationary time series. Forecasting techniques covered include naive methods, smoothing techniques like moving averages and exponential smoothing, and decomposition methods. Regression models for trend analysis and measuring forecast accuracy are also discussed.
This document provides an overview of simple and multiple linear regression analysis. It discusses key concepts such as:
- Dependent and independent variables in bivariate linear regression
- Using scatter plots to explore relationships
- Estimating regression coefficients and equations for simple and multiple regression models
- Using regression models to predict outcomes based on independent variable values
- Conducting statistical tests on overall regression models and individual coefficients
This document provides an overview of several nonparametric statistical tests including the chi-square goodness-of-fit test, chi-square test of independence, runs test, Mann-Whitney U test, and their appropriate uses and calculations. It also discusses key differences between parametric and nonparametric statistics, advantages and disadvantages of nonparametric techniques, and considerations for sample size selection for some common nonparametric tests. Examples are provided to demonstrate applications and calculations for the chi-square goodness-of-fit test, chi-square test of independence, and runs test.
This document provides an overview of experimental design and analysis of variance (ANOVA). It defines key terms like independent and dependent variables, experimental units, treatments, and blocks. It explains different types of experimental designs like completely randomized designs, randomized block designs, and factorial experiments. It also covers ANOVA computations and assumptions for one-way and randomized block ANOVA models. Multiple comparison procedures like Tukey's HSD are introduced to identify differences between specific treatment means. Examples are provided to demonstrate applications of one-way and randomized block ANOVA.
The document outlines learning objectives related to hypothesis testing and constructing confidence intervals for statistical analyses. Key objectives include: testing hypotheses about single and two population parameters using z-tests, t-tests, and chi-squared tests; calculating type II error rates; and constructing confidence intervals for differences between two population means and proportions. Examples are provided for hypothesis tests of a single population proportion, comparing variances, and differences between two population means.
The document outlines learning objectives related to hypothesis testing and constructing confidence intervals for statistical analyses. Key objectives include: testing hypotheses about single and two population parameters using z-tests, t-tests, and chi-squared tests; calculating type II error rates; and constructing confidence intervals for differences between two population means and proportions. Examples are provided for hypothesis tests of a single population proportion, comparing variances, and differences between two population means.
- The document outlines the steps for hypothesis testing including establishing null and alternative hypotheses, determining the appropriate statistical test, setting the significance level, establishing the decision rule, gathering and analyzing data, reaching a statistical conclusion, and making a business decision.
- It provides examples of hypothesis tests for a single mean when the population variance is known and unknown, including one-tailed and two-tailed tests. R code is given for working through hypothesis testing problems step-by-step in R.
This document provides an overview of correlation and regression analysis concepts including:
- Correlation measures the relationship between two variables while regression analysis is used to predict one variable based on another.
- Simple linear regression involves predicting a dependent variable Y based on an independent variable X.
- The least squares method is used to fit a regression line that minimizes the squared errors between observed and predicted Y values.
- Residual analysis and other statistical measures like the standard error can be used to evaluate the fit of the regression model.
Here are the key steps to construct confidence intervals in R:
1. Generate sample data from a population distribution. For example, to generate a random sample of size 30 from a normal distribution with mean 100 and standard deviation 15:
x <- rnorm(30, 100, 15)
2. Calculate the sample mean and standard deviation:
mean(x)
sd(x)
3. Determine the appropriate t-statistic value based on the confidence level and degrees of freedom (n-1). For example, for a 95% CI with 29 df, the t-stat is 2.045:
qt(0.975, 29)
4. Calculate the confidence interval limits as:
This document discusses sampling techniques and concepts in statistics. It begins by outlining learning objectives related to sampling, errors, and statistical analysis. It then discusses reasons for sampling such as saving money and time compared to a census. The document contrasts random and non-random sampling methods. It provides examples of random sampling techniques including simple random sampling, systematic random sampling, stratified random sampling, and cluster sampling. It also discusses non-random convenience sampling and sources of non-sampling errors. Finally, it introduces the concepts of sampling distributions and the central limit theorem, and provides examples of using normal approximations.
Aggregation operations in MongoDB allow examining and performing calculations on data sets. Aggregations process data records and return computed results. MongoDB provides a rich set of aggregation operations implemented using an aggregation pipeline or map-reduce. Map-reduce applies map and reduce functions to input documents to emit and aggregate results. Full text search in MongoDB tokenizes, stems, and scores documents matching search terms. Text analytics identifies meaningful information from unstructured text through techniques like information extraction, sentiment analysis, and named entity recognition.
This document provides information about discrete and continuous probability distributions. It defines discrete and continuous random variables and gives examples of each. It describes how to calculate the mean and variance of discrete distributions. It also introduces the binomial, Poisson, and normal distributions and provides the key properties and formulas to describe and calculate probabilities for each distribution.
- The document demonstrates various commands for exploring and summarizing data in R such as the iris data set including head(), tail(), str(), class(), summary(), and $-operator.
- The iris data set contains measurement data for 150 flowers across 4 variables and is stored as a data frame object in R.
- Data frames allow storing different data types together and can be explored using commands like summary() which provides summaries tailored to each variable type.
- Matrices can also be used to store multi-dimensional data and various functions like dim(), apply(), and cbind() allow manipulating the dimensions and combining matrices.
The document discusses R commands for generating sequences of numbers, including seq() and related functions. It demonstrates how to create sequences with regular increments, decreasing values, negative numbers, and non-integer increments. Examples show adding color and labels to histograms to visualize data distributions. The summary discusses measures of central tendency and spread for a sample dataset, including mean, variance, standard deviation, median, and the difference between fivenum and quantile summaries.
This document discusses probability and related concepts. It covers:
- Defining probability and methods of assigning probabilities.
- Classical probability and how it assigns probabilities based on outcomes.
- Key probability terms like sample space, events, experiments, and trials.
- Laws of probability like addition, multiplication, and conditional probability.
- Examples are provided to illustrate concepts like independent events and finding probabilities.
The document provides an introduction to basic operations and functions in R including:
- Creating and manipulating numeric vectors using functions like c(), mean(), max(), and indexing
- Creating and manipulating character vectors
- Using positive and negative indexing to subset vectors
- Appending values to existing vectors
- Creating and summarizing categorical data using factors and functions like table()
- Creating bar plots and pie charts to visualize categorical data
- Creating a stem-and-leaf plot to visualize a distribution
The document discusses various measures of variability that can be used to describe the spread or dispersion of data, including the range, interquartile range, mean absolute deviation, variance, standard deviation, and coefficient of variation. It also covers how to calculate and interpret these measures of variability for both ungrouped and grouped data. Various other concepts are introduced such as the empirical rule, z-scores, skewness, the 5-number summary, and how to construct and interpret a box-and-whisker plot.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
2. Introduction to NoSQL
•
•
NoSQL stands for “Not only SQL”
NoSQL is
– SQL for non-relational database management system
– Different from traditional relational database system
– designed for distributed data storage that
• typically not requires fixed schema,
• avoid join operations and
• scale horizontally
Used by Facebook, Google and other applications requiring large volume
of unstructured Web application data
2
3. History of NoSQL
•
RDBMS systems have limitations with respect to the following
– Scalability,
– Parallelization
– Cost
•
•
Example: Google that gets billions of requests a month across applications
which are geographically distributed.
The above led to research on the following concepts
–
–
–
–
GFS: Distributed files System
Chubby: Distributed coordination system
MapReduce: Parallel execution system
Big Data: Column oriented database
3
4. NoSQL..Where to use
•
NoSQL is useful in the following cases
– Online stores and portals like amazon where the transaction of an individual
should not “lock” a database or part of a database
– Where “committed” transaction is not critical (e.g. a buyer orders an item and
someone else clicks for the same item at the same time, one of them may end
up not getting the item if the same is the last piece left. An “apology” mail and
refund can sort the matter.
– Cost
•
NoSQL SHOULD NOT be used in the following cases
– Stock exchanges or banking where transactions are critical, cached or state data
will just not work
– Other non financial transactions where completion of transactions are critical
4
5. Benefits of NoSQL
•
Schemaless data representation:
– Almost all NoSQL implementations offer schema-less data representation. This
means that we do not have to think too far ahead to define a structure and we
can continue to evolve over time, including adding new fields or even nesting the
data, for example, in case of JSON representation.
•
Development time:
– reduced development time time because one doesn’t have to deal with complex
SQL queries and joins
•
Speed:
– NoSQL databases are much faster than relational databases
•
Ability to plan ahead for scalability:
– The applications can be quite elastic, can handle sudden spikes of load.
– Provides horizontal scalability and partitioning to new servers
5
7. Storage Types for NoSQL Databases
•
Column Oriented storage
–
–
•
Data stored as columns as opposed to rows (in traditional RDBMS)
Used for On Line Analytical Processing type databases
Example: We want to store the following information
Employee ID
1234
3242
5678
4543
•
First Name
Asim
Noel
Raj
Rohan
Last Name
Das
David
Malhotra
Singh
Dept
HR
Marketing
Production
R&D
Advantages:
–
New column can be added without worrying for filling up default values of existing rows
Efficient for computing maxima, minima, averages and sums, specifically on large datasets
Traditional RDBMS Approach
Column Oriented Approach
• Data serialized as follows
• Data stored as follows
1234, Asim, Das, HR
3242, Noel, David, Marketing
5678, Raj, Malhotra, Production
4543, Rohan, Singh, R&D
1234, 3242, 5678, 4543
Asim, Noel, Raj, Rohan
Das,David, Malhotra, Singh
HR, Marketing, Production, R&D
7
8. Storage Types for NoSQL Databases..2
•
Document Oriented storage
–
–
–
–
–
•
Allows the inserting, retrieving, and manipulating of semi-structured data
Documents themselves act as records (or rows)
two records may have completely different set of fields or columns
The records may or may not adhere to a specific schema
Most of the databases available under this category use XML, JSON, BSON data types
Example: Different record contain different level of Employee information as
follows
Record 1
{"EmployeeID":
"SM1",
"FirstName" :
"Anuj",
"LastName" :
"Sharma",
"Age" : 45,
"Salary" : 10000000
}
Record 2
{"EmployeeID": "MM2",
"FirstName" : "Anand",
"Age" : 34,
"Salary" : 5000000,
"Address" : {
"Line1" : "123, 4th Street",
"City" : "Bangalore",
"State" : "Karnataka"
},
"Projects" : [
"nosql-migration",
"top-secret-007"
]}
8
9. Storage Types for NoSQL Databases..3
•
Key value storage
–
–
–
•
Similar to document oriented storage with the following differences
Unlike a document store that can create a key when a new document is inserted, a key-value
store requires the key to be specified
Unlike a document store where the value can be indexed and queried, for a key-value store,
the value is opaque and as such, the key must be known to retrieve the value
Advantages:
–
–
Key-value stores are optimized for querying against keys.
They serve great in-memory caches.
9
10. RDBMS vs NoSQL
NoSQL
RDBMS
•
•
•
•
•
•
Structured and organized data
Structured query language (SQL)
Data and its relationships are stored in
separate tables.
Follows ACID rules
Data Manipulation Language, Data
Definition Language
Tight Consistency
ACID:
Atomic
Consistent
Isolated
Durable
•
•
•
•
•
•
•
No declarative query language
No predefined schema
Key-Value pair storage, Column Store,
Document Store, Graph databases
Eventual consistency rather ACID
property
Unstructured and unpredictable data
CAP Theorem
Prioritizes high performance, high
availability and scalability
BASE:
Basically Available
Soft State
Eventual Consistency
CAP:
Consistency
Availability
Partition Tolerance
10
11. CAP
•
CAP theorem (Brewer’s Theorem) : Three basic requirements which exist in a
special relation when designing applications for a distributed architecture.
–
–
–
•
•
•
Consistency : This means that the data in the database remains consistent after the
execution of an operation. For example after an update operation all clients see the same
data.
Availability : This means that the system is always on (service guarantee availability), no
downtime.
Partition Tolerance : This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers may be partitioned into
multiple groups that cannot communicate with one another.
It is theoretically impossible to fulfill all 3 requirements C, A and P
CAP provides the basic requirements for a distributed system to follow 2 of the
3 requirements.
Therefore all the current NoSQL database follow the different combinations of
the C, A, P from the CAP theorem.
11
12. BASE
•
BASE system gives up on Consistency
– Basically Available indicates that the system does guarantee availability, in
terms of the CAP theorem.
– Soft state indicates that the state of the system may change over time, even
without input. This is because of the eventual consistency model.
– Eventual consistency indicates that the system will become consistent over
time, given that the system doesn't receive input during that time.
12
14. MongoDB
•
•
Open Source database written in C++.
Document Oriented database
–
•
Used to store data for very high performance applications with unforeseen growth in
data
–
•
Example format : FirstName="Arun", Address="St. Xavier's Road", Spouse=[{Name:"Kiran"}],
Children=[{Name:"Rihit", Age:8}] –
If load increases (more storage space, more processing power), it can be distributed to other
nodes across computer networks (sharding)
MongoDB supports Map/Reduce framework for batch processing of data and
aggregation operation
–
–
Map : A master node takes an input. Splits it into smaller sections. Sends it to the associated
nodes. These nodes may perform the same operation in turn to send those smaller section of
input to other nodes. It processes the problem (taken as input) and sends it back to the
Master Node.
Reduce : The master node aggregates those results to find the output.
14
16. NoSQL operations in MongoDB
•
Creating Table (Collections)
Other SQL Schema
MongoDb statement
CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
})
)
db.createCollection("users")
Alternatively,
In MongoDB, collections are implicitly created on first insert() operation. The primary
key _id is automatically added if _id field is not specified.
Reference:
See insert() and db.createCollection() for more information.
16
17. NoSQL operations in MongoDB
•
Altering Table (Collections)
Other SQL Schema
Adding a Column
ALTER TABLE users
ADD join_date DATETIME
Dropping a Column
ALTER TABLE users
DROP column join_date
MongoDb statement
Adding a field
db.users.update(
{ },
{ $set: { join_date: new () } },
{ multi: true }
)
Dropping a field
db.users.update(
{ },
{ $unset: { join_date: “” } },
{ multi: true }
)
Collections do not describe or enforce the structure of its documents; i.e. there is no
structural alteration at the collection level for adding/removing fields
Reference:
See the Data Modeling Considerations for MongoDB Applications, update(), $set
and $unset for more information
17
18. NoSQL operations in MongoDB
•
INSERT and SELECT operations
Other SQL Schema
Inserting data
INSERT INTO users(user_id,
age,
status)
VALUES (“abc001",
35,
“U")
SELECT operation
SELECT *
FROM users
WHERE status = "A"
MongoDb statement
Inserting data
db.users.insert( {
user_id: “abc001",
age: 35,
status: “U"
})
Find operation
db.users.find(
{ status: "A" }
)
Reference:
See insert() and find() for more information
Use pretty() to display data in formatted way:
db.users.find().pretty();
18
19. NoSQL operations in MongoDB
•
UPDATE and DELETE operations
Other SQL Schema
UPDATE
MongoDb statement
UPDATE
UPDATE users
SET status = "C"
db.users.update(
{ age: { $gt: 10 } },
{ $set: { status: "C" } },
{ multi: true }
)
WHERE age > 10
UPDATE users
SET age = age + 5
WHERE status = “U"
DELETE
DELETE FROM users
WHERE status = "D"
db.users.update(
{ status: “U" } ,
{ $inc: { age: 5 } },
{ multi: true }
)
REMOVE
db.users.remove( { status: "D" } )
Reference:
See update(), $gt, $inc , $set and remove() for more information
19
20. More examples in MongoDB
•
Run the database(Windows):
–
–
–
•
Connect to the database:
–
–
–
–
•
•
Open Command prompt
Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-1004/bin
Run mongod.exe
Open Command prompt
Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-1004/bin
Run mongo
A mongo “shell” will open
Show database: show dbs
Select a database: use <database name>
20
21. More examples in MongoDB..2
•
Switch to database testData (use testData;)
•
Task I: Insert data directly : The following operation inserts a row/document in Collections
testData
– db.testData.insert({ name : "OtherDB" } );
•
Task 2: Insert data with JavaScript operations : The following operation inserts 2 rows/
documents in Collections testData
j = { name : "mongo" }
k={x:3}
db.testData.insert( j );
db.testData.insert( k );
•
Task 3: Check to see that the 3 records are inserted in the collections testData
– db.testData.find();
21
22. More examples in MongoDB..3
Inserting multiple documents using a For loop
•
Task : use the following loop from the mongo shell
–
for (var i = 1; i <= 25; i++) db.testData.insert( { x : i } )
•
Use find() to see the result. 25 records will be shown
–
db.testData.find()
Note: If the collection and database do not exist, MongoDB creates
them implicitly before inserting documents.
22
23. More examples in MongoDB..4
Queries with conditions
•
Task : In the above example, 25 rows were created. We want to show the rows there
x is less than 15. We also want to limit to first 5 rows in the display.
–
db.testData.find( { "x": { $lt: 15 } }).limit(5)
Condition
: x<15
Limit to 5
rows
23
24. More examples in MongoDB..5
Inserting with explicit “id”
• Task : Insert a record in collections named “inventory” with explicit id, type and
quantity.
–
db.inventory.insert( { _id: 10, type: "misc", item: "card", qty: 15 } );
Explicit ID
Inserting with update() method
• Call the update() method with the upsert flag to create a new document if no
document matches the update’s query criteria. .
–db.inventory.update(
{ type: "book", item : "journal" },
{ $set : { qty: 10 } },
{ upsert : true }
);
The above example creates a new document if no document in the inventory collection contains
{ type: "books", item : "journal" } and assigns an unique ID
24
25. More examples in MongoDB..6
Inserting with save() method
• To insert a document with the save() method, pass the method a document
that does not contain the _id field or a document that contains an _id field
that does not exist in the collection. .
–
db.inventory.save( { type: "book", item: "notebook", qty: 40 } )
The above example creates a new document in the inventory collection , adds the
ID field and assigns an unique ID
25
26. More examples in MongoDB..6
Conditional queries
• Task: Select all documents in the inventory collection where the value of the
type field is either 'food' or 'snacks‘
–db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } );
•
Task: “AND” condition- specifying an equality match on the field food AND
a less than ($lt) comparison match on the field price
–db.inventory.find( { type: 'food', price: { $lt: 9.95 } } );
•
Task: “OR” condition- the query document selects all documents in the
collection where the field qty has a value greater than ($gt) 100 OR the
value of the price field is less than ($lt) 9.95
–db.inventory.find(
{ $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] }
);
26
27. More examples in MongoDB..7
Compound queries (using “AND” and “OR” both)
• Task: Select all documents in the collection where the value of the type field
is 'food' and either the qty has a value greater than ($gt) 100 or the value of
the price field is less than ($lt) 9.95:
–db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} );
27
28. More examples in MongoDB..8
Matching on “subdocuments”
•
When the field holds an embedded document (i.e. subdocument), we can either
specify the entire subdocument as the value of a field, or “reach into” the
subdocument using “dot” notation, to specify values for individual fields in the
subdocument.
•
In the following example, the query matches all documents where the value of the
field producer is a subdocument that contains only the field company with the value
'ABC123' and the field address with the value '123 Street', in the exact order:
–db.inventory.find(
{ producer: { company: 'ABC123', address: '123 Street' }
});
•
In the following example, the query uses the dot notation to match all documents
where the value of the field producer is a subdocument that contains a field company
with the value 'ABC123' and may contain other fields
–db.inventory.find( { 'producer.company': 'ABC123' } );
28
29. More examples in MongoDB..9
Matching on Arrays
•
To specify equality match on an array, use the query document { <field>: <value> }
where <value> is the array to match. Equality matches on the array require that the
array field match exactly the specified <value>, including the element order.
•
Exact Match: In the following example, the query matches all documents where the
value of the field tags is an array that holds exactly three elements, 'fruit', 'food', and
'citrus', in this order:
db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } );
–
•
Matching Array Elements: In the following example, the query matches all documents
where the value of the field tags is an array that contains 'fruit' as one of its elements:
–db.inventory.find( { tags: 'fruit' } );
•
In the following example, the query uses the dot notation to match all documents
where the value of the tags field is an array whose first element equals 'fruit‘.
–db.inventory.find( { 'tags.0' : 'fruit' } )
29
30. More examples in MongoDB..10
Array of subdocuments
•
Match a Field in the Subdocument Using the Array Index :The following example selects
all documents where the memos contains an array whose first element (i.e. index is
0) is a subdocument with the field by with the value 'shipping':
–db.inventory.find( { 'memos.0.by': 'shipping' } )
.
•
Match a Field without specifying Array Index: The following example selects all
documents where the memos field contains an array that contains at least one
subdocument with the field by with the value 'shipping':
–db.inventory.find( { 'memos.by': 'shipping' } )
•
Match multiple Fields: The following example uses dot notation to query for documents
where the value of the memos field is an array that has at least one subdocument
that contains the field memo equal to 'on time' and the field by equal to 'shipping':
–db.inventory.find( { 'memos.memo': 'on time', 'memos.by': 'shipping' } )
30
31. More examples in MongoDB..11
Using findOne()
db.collection.findOne(<criteria>, <projection>)
•
•
•
The above returns one document that satisfies the specified query criteria. If multiple
documents satisfy the query, this method returns the first document according to the
natural order which reflects the order of documents on the disk.
The <projection> parameter takes a document in the following form
–{ field1: <boolean>, field2: <boolean> ... }
–Boolean can be 1(true, to include) or 0(false, to exclude)
Example: Create a collection named bios with multiple fields. Return “name”,
“contribs” and “_id” fields:
db.bios.findOne(
{ },
{ name: 1, contribs: 1 }
)
31
32. Exercise I
• Go to database “test”
•Insert data in a collection named userdetails with the following attributes
“user_id" : "ABCDBWN","password" :"ABCDBWN" ,"date_of_join" : "15/10/2010" ,"education"
:"B.C.A." , "profession" : "DEVELOPER","interest" : "MUSIC","community_name" :["MODERN
MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR.
JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" :
["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]});
• View the inserted data using find() and pretty()
•Insert another set of data in the same collection with the following
–{"user_id" : "testuser","password" :"testpassword" ,"date_of_join" : "16/10/2010" ,"education"
:"M.C.A." , "profession" : "CONSULTANT","interest" : "MUSIC","community_name" :["MODERN
MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR.
JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" :
["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]}
32
33. Exercise I..contd
•Use update() to change password to “Newpd” and date_of_join to 12/12/2010
for user id "ABCDBWN”
•Fetch only the "user_id" for all documents from the collection 'userdetails'
which hold the educational qualification "M.C.A
•Fetch the "user_id" , "password" and "date_of_join" for all documents from the
collection 'userdetails' which hold the educational qualification "M.C.A."
•Remove one record from collection userdetails where userid= testuser
•Remove the entire collection userdetails using drop()
33