An Intro to NoSQL Databases -- NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice. (source : http://martinfowler.com/)
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This document provides an overview of non-relational (NoSQL) databases. It discusses the history and characteristics of NoSQL databases, including that they do not require rigid schemas and can automatically scale across servers. The document also categorizes major types of NoSQL databases, describes some popular NoSQL databases like Dynamo and Cassandra, and discusses benefits and limitations of both SQL and NoSQL databases.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
The document provides an introduction to NoSQL databases. It discusses that NoSQL databases provide a mechanism for storage and retrieval of data without using tabular relations like relational databases. NoSQL databases are used in real-time web applications and for big data. They also support SQL-like query languages. The document outlines different data modeling approaches, distribution models, consistency models and MapReduce in NoSQL databases.
This document provides an overview of NoSQL databases and compares them to relational databases. It discusses the different types of NoSQL databases including key-value stores, document databases, wide column stores, and graph databases. It also covers some common concepts like eventual consistency, CAP theorem, and MapReduce. While NoSQL databases provide better scalability for massive datasets, relational databases offer more mature tools and strong consistency models.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This document provides an overview of non-relational (NoSQL) databases. It discusses the history and characteristics of NoSQL databases, including that they do not require rigid schemas and can automatically scale across servers. The document also categorizes major types of NoSQL databases, describes some popular NoSQL databases like Dynamo and Cassandra, and discusses benefits and limitations of both SQL and NoSQL databases.
This document provides an introduction to NoSQL databases. It discusses the history and limitations of relational databases that led to the development of NoSQL databases. The key motivations for NoSQL databases are that they can handle big data, provide better scalability and flexibility than relational databases. The document describes some core NoSQL concepts like the CAP theorem and different types of NoSQL databases like key-value, columnar, document and graph databases. It also outlines some remaining research challenges in the area of NoSQL databases.
The document provides an introduction to NoSQL databases. It discusses that NoSQL databases provide a mechanism for storage and retrieval of data without using tabular relations like relational databases. NoSQL databases are used in real-time web applications and for big data. They also support SQL-like query languages. The document outlines different data modeling approaches, distribution models, consistency models and MapReduce in NoSQL databases.
This document provides an overview of NoSQL databases and compares them to relational databases. It discusses the different types of NoSQL databases including key-value stores, document databases, wide column stores, and graph databases. It also covers some common concepts like eventual consistency, CAP theorem, and MapReduce. While NoSQL databases provide better scalability for massive datasets, relational databases offer more mature tools and strong consistency models.
Hive is a data warehouse infrastructure tool that allows users to query and analyze large datasets stored in Hadoop. It uses a SQL-like language called HiveQL to process structured data stored in HDFS. Hive stores metadata about the schema in a database and processes data into HDFS. It provides a familiar interface for querying large datasets using SQL-like queries and scales easily to large datasets.
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a Master and Slave architecture with a NameNode that manages metadata and DataNodes that store data blocks. The NameNode tracks locations of data blocks and regulates access to files, while DataNodes store file blocks and manage read/write operations as directed by the NameNode. HDFS provides high-performance, scalable access to data across large Hadoop clusters.
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
Graph databases are a type of NoSQL database that is optimized for storing and querying connected data and relationships. A graph database represents data in graphs consisting of nodes and edges, where the nodes represent entities and the edges represent relationships between the entities. Graph databases are well-suited for applications that involve complex relationships and connected data, such as social networks, knowledge graphs, and recommendation systems. They allow for flexible querying of relationships and connections via graph traversal operations.
This document provides an overview of NoSQL databases. It begins with a brief history of relational databases and Edgar Codd's 1970 paper introducing the relational model. It then discusses modern trends driving the emergence of NoSQL databases, including increased data complexity, the need for nested data structures and graphs, evolving schemas, high query volumes, and cheap storage. The core characteristics of NoSQL databases are outlined, including flexible schemas, non-relational structures, horizontal scaling, and distribution. The major categories of NoSQL databases are explained - key-value, document, graph, and column-oriented stores - along with examples like Redis, MongoDB, Neo4j, and Cassandra. The document concludes by discussing use cases and
The presentation provides an overview of NoSQL databases, including a brief history of databases, the characteristics of NoSQL databases, different data models like key-value, document, column family and graph databases. It discusses why NoSQL databases were developed as relational databases do not scale well for distributed applications. The CAP theorem is also explained, which states that only two out of consistency, availability and partition tolerance can be achieved in a distributed system.
This document provides an overview and introduction to NoSQL databases. It begins with an agenda that explores key-value, document, column family, and graph databases. For each type, 1-2 specific databases are discussed in more detail, including their origins, features, and use cases. Key databases mentioned include Voldemort, CouchDB, MongoDB, HBase, Cassandra, and Neo4j. The document concludes with references for further reading on NoSQL databases and related topics.
This presentation about HBase will help you understand what is HBase, what are the applications of HBase, how is HBase is different from RDBMS, what is HBase Storage, what are the architectural components of HBase and at the end, we will also look at some of the HBase commands using a demo. HBase is an essential part of the Hadoop ecosystem. It is a column-oriented database management system derived from Google’s NoSQL database Bigtable that runs on top of HDFS. After watching this video, you will know how to store and process large datasets using HBase. Now, let us get started and understand HBase and what it is used for.
Below topics are explained in this HBase presentation:
1. What is HBase?
2. HBase Use Case
3. Applications of HBase
4. HBase vs RDBMS
5. HBase Storage
6. HBase Architectural Components
What is this Big Data Hadoop training course about?
Simplilearn’s Big Data Hadoop training course lets you master the concepts of the Hadoop framework and prepares you for Cloudera’s CCA175 Big data certification. The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
- Polyglot persistence involves using multiple data storage technologies to handle different data storage needs within a single application. This allows using the right technology for the job rather than trying to solve all problems with a single database.
- For example, a key-value store may be better for transient session or shopping cart data before an order is placed, while relational databases are better for structured transactional data after an order is placed.
- Using services that abstract the direct usage of different data stores allows sharing of data between applications in an enterprise. This improves reuse of data across systems.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
MongoDB is a document-oriented NoSQL database written in C++. It uses a document data model and stores data in BSON format, which is a binary form of JSON that is lightweight, traversable, and efficient. MongoDB is schema-less, supports replication and high availability, auto-sharding for scaling, and rich queries. It is suitable for big data, content management, mobile and social applications, and user data management.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
Hive is a data warehouse infrastructure tool that allows users to query and analyze large datasets stored in Hadoop. It uses a SQL-like language called HiveQL to process structured data stored in HDFS. Hive stores metadata about the schema in a database and processes data into HDFS. It provides a familiar interface for querying large datasets using SQL-like queries and scales easily to large datasets.
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications. It employs a Master and Slave architecture with a NameNode that manages metadata and DataNodes that store data blocks. The NameNode tracks locations of data blocks and regulates access to files, while DataNodes store file blocks and manage read/write operations as directed by the NameNode. HDFS provides high-performance, scalable access to data across large Hadoop clusters.
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
Graph databases are a type of NoSQL database that is optimized for storing and querying connected data and relationships. A graph database represents data in graphs consisting of nodes and edges, where the nodes represent entities and the edges represent relationships between the entities. Graph databases are well-suited for applications that involve complex relationships and connected data, such as social networks, knowledge graphs, and recommendation systems. They allow for flexible querying of relationships and connections via graph traversal operations.
This document provides an overview of NoSQL databases. It begins with a brief history of relational databases and Edgar Codd's 1970 paper introducing the relational model. It then discusses modern trends driving the emergence of NoSQL databases, including increased data complexity, the need for nested data structures and graphs, evolving schemas, high query volumes, and cheap storage. The core characteristics of NoSQL databases are outlined, including flexible schemas, non-relational structures, horizontal scaling, and distribution. The major categories of NoSQL databases are explained - key-value, document, graph, and column-oriented stores - along with examples like Redis, MongoDB, Neo4j, and Cassandra. The document concludes by discussing use cases and
The presentation provides an overview of NoSQL databases, including a brief history of databases, the characteristics of NoSQL databases, different data models like key-value, document, column family and graph databases. It discusses why NoSQL databases were developed as relational databases do not scale well for distributed applications. The CAP theorem is also explained, which states that only two out of consistency, availability and partition tolerance can be achieved in a distributed system.
This document provides an overview and introduction to NoSQL databases. It begins with an agenda that explores key-value, document, column family, and graph databases. For each type, 1-2 specific databases are discussed in more detail, including their origins, features, and use cases. Key databases mentioned include Voldemort, CouchDB, MongoDB, HBase, Cassandra, and Neo4j. The document concludes with references for further reading on NoSQL databases and related topics.
This presentation about HBase will help you understand what is HBase, what are the applications of HBase, how is HBase is different from RDBMS, what is HBase Storage, what are the architectural components of HBase and at the end, we will also look at some of the HBase commands using a demo. HBase is an essential part of the Hadoop ecosystem. It is a column-oriented database management system derived from Google’s NoSQL database Bigtable that runs on top of HDFS. After watching this video, you will know how to store and process large datasets using HBase. Now, let us get started and understand HBase and what it is used for.
Below topics are explained in this HBase presentation:
1. What is HBase?
2. HBase Use Case
3. Applications of HBase
4. HBase vs RDBMS
5. HBase Storage
6. HBase Architectural Components
What is this Big Data Hadoop training course about?
Simplilearn’s Big Data Hadoop training course lets you master the concepts of the Hadoop framework and prepares you for Cloudera’s CCA175 Big data certification. The Big Data Hadoop and Spark developer course have been designed to impart in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
- Polyglot persistence involves using multiple data storage technologies to handle different data storage needs within a single application. This allows using the right technology for the job rather than trying to solve all problems with a single database.
- For example, a key-value store may be better for transient session or shopping cart data before an order is placed, while relational databases are better for structured transactional data after an order is placed.
- Using services that abstract the direct usage of different data stores allows sharing of data between applications in an enterprise. This improves reuse of data across systems.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
MongoDB is a document-oriented NoSQL database written in C++. It uses a document data model and stores data in BSON format, which is a binary form of JSON that is lightweight, traversable, and efficient. MongoDB is schema-less, supports replication and high availability, auto-sharding for scaling, and rich queries. It is suitable for big data, content management, mobile and social applications, and user data management.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
This document discusses graph databases and introduces DataStax Enterprise Graph. It defines a graph database as one that prioritizes relationships between entities over the entities themselves. It provides examples of problems well-suited for graph databases, such as customer 360 views, recommendations, and fraud detection. The document contrasts graph and relational databases, noting graphs are better for highly connected data. It then introduces DataStax Enterprise Graph as a native graph implementation built on Apache TinkerPop and integrated with Cassandra for scale-out performance and DSE's enterprise features.
This document provides an overview of NoSQL databases. It discusses that NoSQL databases are non-relational and were created to overcome limitations of scaling relational databases. The document categorizes NoSQL databases into key-value stores, document databases, graph databases, XML databases, and distributed peer stores. It provides examples like MongoDB, Redis, CouchDB, and Cassandra. The document also explains concepts like CAP theorem, ACID properties, and reasons for using NoSQL databases like horizontal scaling, schema flexibility, and handling large amounts of data.
The document discusses how the database world is changing with the rise of NoSQL databases. It provides an overview of different categories of NoSQL databases like key-value stores, column-oriented databases, document databases, and graph databases. It also discusses how these NoSQL databases are being used with cloud computing platforms and how they are relevant for .NET developers.
Agenda
- What is NOSQL?
- Motivations for NOSQL?
- Brewer’s CAP Theorem
- Taxonomy of NOSQL databases
- Apache Cassandra
- Features
- Data Model
- Consistency
- Operations
- Cluster Membership
- What Does NOSQL means for RDBMS?
The document provides an overview of NoSQL databases, including their history, characteristics, and categories. It discusses the evolution of database systems from hierarchical and network models to relational and object-oriented databases. It also covers the limitations of relational databases that led to the rise of NoSQL databases, including lack of scalability, availability, and flexibility. The key advantages of NoSQL databases like schema flexibility and horizontal scalability are outlined. Popular categories of NoSQL databases like key-value stores, document databases, and graph databases are also described.
A practical introduction to Oracle NoSQL Database - OOW2014Anuj Sahni
Not familiar with Oracle NoSQL Database yet? This great product introduction session discusses the primary functionality included with the product as well as integration with other Oracle products. It includes a live demo that illustrates installation and configuration as well as data modeling and sample NoSQL application development.
Big Data and NoSQL for Database and BI ProsAndrew Brust
This document provides an agenda and overview for a conference session on Big Data and NoSQL for database and BI professionals held from April 10-12 in Chicago, IL. The session will include an overview of big data and NoSQL technologies, then deeper dives into Hadoop, NoSQL databases like HBase, and tools like Hive, Pig, and Sqoop. There will also be demos of technologies like HDInsight, Elastic MapReduce, Impala, and running MapReduce jobs.
Scaling SQL and NoSQL Databases in the Cloud RightScale
Database performance is the number-one cause of poor performance for scalable web applications, and the problem is magnified in cloud environments where I/O and bandwidth are generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier by operating on top of proven RDBMS products such as MySQL and Postgres as well as the new NoSQL database platforms. In this session, you'll learn what it really takes to implement sharding, the role it plays in the effective end-to-end lifecycle management of your entire database environment, why it is crucial for ensuring reliability, and how to choose the best technology for a specific application. We'll also share a case study on a high-volume social networking application that demonstrates the effectiveness of database sharding for scaling cloud-based applications.
The document provides guidelines for creating an ontology, including defining what an ontology is, why they are useful, and the basic components and methodology for building one. It discusses evaluating ontology taxonomies and provides two examples - an e-commerce ontology and a banking ontology - to demonstrate the concepts. The key steps outlined are identifying important terms and concepts, organizing them hierarchically, defining attributes and relationships, and evaluating for issues like redundant or incomplete information.
This document provides an overview of NoSQL databases and their concepts. It begins with an introduction from the presenter and an agenda outlining the topics to be covered. The document then discusses the history and evolution of database management systems. It introduces relational database concepts and outlines some of the limitations of relational databases in handling big data. This leads to a discussion of the need for database systems beyond relational databases and a paradigm shift in database management. NoSQL databases are then defined as providing alternatives beyond the relational model. The remainder of the document covers types of NoSQL databases and their usage, as well as the future of relational databases.
This document discusses several existing ontologies for modeling restaurant menus and food data on the semantic web:
- Schema.org allows menus to be attached to restaurants as text or URLs but does not link individual menu items. Recipe classes could be used instead.
- RPI has published a Wine ontology and food ontology that could describe menu items. They also held a Food Semantic Web meetup to discuss ontology development.
- LOV includes a vocabulary focused on nutrition characteristics of food.
- Locu is a platform that enables multi-channel dissemination of data for restaurants and similar businesses.
- LinkedFood, from TU Berlin, aims to link food data on the semantic web.
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
This session covers our experience with using the Spark and Shark frameworks for running real-time queries on top of Cassandra data.We will start by surveying the current Cassandra analytics landscape, including Hadoop and HIVE, and touch on the use of custom input formats to extract data from Cassandra. We will then dive into Spark and Shark, two memory-based cluster computing frameworks, and how they enable often dramatic improvements in query speed and productivity, over the standard solutions today.
Online Marketing with Schema.org and Multi-channel CommunicationAnna Fensel
This document discusses using semantic annotations and a multi-channel communication tool to increase the online visibility and marketing success of hotels. It describes how the Kaysers Hotel semantically annotated its website pages in multiple languages to improve search engine optimization. A multi-channel communication tool was also used to suggest and publish social media posts for the hotel across different channels. An evaluation found the hotel's website traffic and traffic from social media increased while time spent on social media marketing decreased. Future work could involve extending semantic standards for the hotel domain and applying semantic technologies more broadly to online communication.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
Microsoft Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository. Journey through Azure Data Lake Storage Gen1 with Microsoft Data Platform Specialist, Audrey Hammonds. In this video she explains the fundamentals to Gen 1 and Gen 2, walks us through how to provision a Data Lake, and gives tips to avoid turning your Data Lake into a swamp.
Learn more about Data Lakes with our blog - Data Lakes: Data Agility is Here Now https://bit.ly/2NUX1H6
A NoSQL (often interpreted as Not Only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersEdelweiss Kammermann
This session is based on a full-day big data workshop delivered to 40 database professionals at the German User Group (DOAG) conference in 2016, garnering fantastic feedback (www.munzandmore.com/2016/ora/big-data-cloudera-oracle-training-feedback-doag). There are zillions of open source big data projects these days. In the session, you will learn about the core principles of four key technologies that are most often used in projects: Hadoop, Spark, Hive, and Kafka. The presentation first explains the fundamentals of those four big data technologies. Then you will see how to take the first easy steps into the big data world yourself, with Oracle Big Data Cloud Service and Oracle Event Hub Cloud Service live demos.
This document discusses polyglot persistence and multi-cloud data management solutions. It begins by noting the huge amounts of data being generated and stored globally, such as the billions of pieces of content shared daily on social media platforms. It then discusses challenges in storing and accessing these massive datasets, which can range from the petabyte to exabyte scale. The document introduces the concept of polyglot persistence, where enterprises use a variety of data storage technologies suited to different types of data, rather than assuming a single relational database. It also discusses using NoSQL databases and deploying databases across multiple cloud platforms.
NoSQL, SQL, NewSQL - methods of structuring data.Tony Rogerson
Today’s environment is a polyglot database, that is to say, it’s made up of a number of different database sources and possibly types. In this session we’ll look at some of the options of storing data – relational, key/value, document etc. I’ll overview what is SQL, NoSQL and NewSQL to give you some context for today’s world of data storage.
Presentation on NoSQL Database related RDBMSabdurrobsoyon
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This document discusses different types of databases including NoSQL databases. It describes four types of data: structured, unstructured, dynamic, and static. It then discusses scaling traditional relational databases vertically and horizontally. It introduces the concepts of data sharding, Amdahl's law, and data replication. The challenges of consistency in replicated databases and solutions like two-phase commit are covered. The CAP theorem and eventual consistency are explained. Finally, different types of NoSQL databases are classified including document stores, graph databases, key-value stores, and columnar databases. Specific NoSQL databases like MongoDB, Neo4j, DynamoDB, HBase, and Cassandra are also overviewed.
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Andre Essing
This document summarizes an introduction presentation about Azure Cosmos DB. It discusses key aspects of Cosmos DB including that it is a globally distributed, massively scalable database that supports multiple data models. It also covers request units, partitioning, indexing, consistency models, and other architectural aspects that allow Cosmos DB to elastically scale storage and throughput worldwide.
Cassandra is an open-source NoSQL database that provides high availability with no single point of failure. It uses a distributed system to ensure there are multiple copies of data across different nodes. Cassandra uses a column-oriented data model and supports querying, managing and recovering data across many servers. It can handle large amounts of data across commodity servers and provides high availability without a single point of failure. Security in Cassandra includes encryption of data both at rest and in transit between nodes and clients, as well as authentication and authorization of users.
In this technical overview of Azure Cosmos DB you will learn how easy it is to get started building planet-scale applications with Azure Cosmos DB. We’ll then take a closer look at important design aspects around global distribution, consistency, and server-side partitioning. How to model your data to fit your app’s needs using tools and APIs you love.
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
This document discusses NoSQL databases and compares them to relational databases. It provides information on different types of NoSQL databases, including key-value stores, document databases, wide-column stores, and graph databases. The document outlines some use cases for each type and discusses concepts like eventual consistency, CAP theorem, and polyglot persistence. It also covers database architectures like replication and sharding that provide high availability and scalability.
Prague data management meetup 2018-03-27Martin Bém
This document discusses different data types and data models. It begins by describing unstructured, semi-structured, and structured data. It then discusses relational and non-relational data models. The document notes that big data can include any of these data types and models. It provides an overview of Microsoft's data management and analytics platform and tools for working with structured, semi-structured, and unstructured data at varying scales. These include offerings like SQL Server, Azure SQL Database, Azure Data Lake Store, Azure Data Lake Analytics, HDInsight and Azure Data Warehouse.
The document provides an overview of NoSQL and MongoDB. It discusses that NoSQL databases were built for large datasets and cloud applications. It covers some of the main types of NoSQL databases like document stores, key-value stores, and column family stores. The document also compares NoSQL to SQL/relational databases, discussing how NoSQL is more flexible and scales horizontally. MongoDB is presented as a popular document-oriented NoSQL database, covering its flexible schema, horizontal scaling, and replication features.
OLAP (online analytical processing) allows users to easily extract and analyze data from different perspectives. It stores data in multidimensional databases to allow for complex queries. There are three main types of OLAP - relational, multidimensional, and hybrid. OLAP is used with data warehouses to enable analytics like data mining and decision making. It provides benefits over transactional systems by facilitating flexible analysis of integrated data over time.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.vinithamaniB
This document provides an overview of NewSQL databases, comparing SQL, NoSQL, and NewSQL. It discusses that NewSQL is a class of modern relational database systems that provide the scalable performance of NoSQL systems for online transaction processing workloads while still maintaining the ACID guarantees of traditional databases. It provides examples of NewSQL databases like VoltDB and TokuDB and explains that NewSQL was created to have the scalability of NoSQL with the ACID properties and SQL interface of traditional databases. It also briefly discusses the multi-version concurrency control and architecture of NewSQL systems.
What is NoSQL? How does it come to the picture? What are the types of NoSQL? Some basics of different NoSQL types? Differences between RDBMS and NoSQL. Pros and Cons of NoSQL.
What is MongoDB? What are the features of MongoDB? Nexus architecture of MongoDB. Data model and query model of MongoDB? Various MongoDB data management techniques. Indexing in MongoDB. A working example using MongoDB Java driver on Mac OSX.
The document provides an overview of NoSQL databases and MongoDB. It discusses:
- What NoSQL is and why it was created
- The different categories of NoSQL databases, including key-value stores, document databases, column family stores, and graph databases
- MongoDB specifically, including its flexible schema, horizontal scalability, replication support, and data modeling approach
- Comparisons between relational and NoSQL databases
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
6. 6 6
RELATIONAL DATABASE MANAGEMENT
SYSTEM
Relational Model - data represented in terms of tuples (rows).
Key Concepts
o Table - collection of data elements organized in terms of rows and
columns
o Field - column in a table designed to maintain specific information
about every record in the table
o Record - horizontal entity represents set of related data
o Column - vertical entity containing values of particular type
7. 7 7
RELATIONAL DATABASE MANAGEMENT
SYSTEM INTEGRITY RULES
o Entity Integrity
o Domain Integrity
o Referential integrity
o User-Defined Integrity
8. 8 8
RELATIONAL DATABASE MANAGEMENT
SYSTEM
Pros Cons
Support simple data structure Poor representation of real
world
Limit redundancy Difficult to represent hierarchies
Better integrity Difficult represent complex data
types
Offer logical database
independence
Support one off queries using
SQL
Better backup & recovery
procedure
9. 9 9
RDBMS VS NOSQL
RDBMS NoSQL
Scale up Scale out
Handle Structured Data
Semi-Structured data /
Unstructured data
Atomic transaction
Eventual consistency
impedance mismatch Object model
Strict schema Schema-less
10. 10 10
DISTRIBUTED SYSTEMS
Distributed database system consists of loosely-
coupled sites that share no physical components.
Homogeneous DDBMS
All sites have identical software & aware of each other.
work corporately in processing user requests
Heterogeneous DDBMS
Different sites may use different schema and software.
provide limited facilities for cooperation in transaction processing
11. 11 11
DISTRIBUTED SYSTEMS
Sharding
Split the data among multiple machines while ensuring that data is
always accessed from the correct place.
Replication
Multiple instances of the Database which each mirror all the data
of each other.
75GB
25GB 25GB 25GB
75GB
75GB 75GB 75GB
12. 12 12
WHY NOSQL
The global NoSQL market is forecast to reach
$3.4 Billion in 2020,
representing a compound
annual growth rate (CAGR) of 21%
for the period
2015 – 2020.
http://www.technologies.org/?p=102
http://www.marketresearchmedia.com/?p=568
19. 19 19
WHAT IS ACID?
o Atomicity
A transaction is all or nothing
o Consistency
Only valid data is written to the database
o Isolation
Pretend all transactions are happening serially and the data is
correct
o Durability
What you write is what you get
20. 20 20
CAP THEOREM
A
PC
Availability :
Each client can always read
and write
Partition Tolerance :
The system works well despite
physical network partitions.
Consistency :
All clients always have the
same view of the data.
You can have at most two
of these properties for any
shared Data Systems.
21. 21 21
AN ALTERNATIVE TO ACID IS BASE
o Basic Availability
System seems to work all the time
o Soft-State
It doesn't have to be consistent all the time
o Eventual Consistency
Becomes consistent at some later time
22. 22 22
NOSQL DATABASE CATEGORIES
NoSQL
Database
Categories
Key Value
Store
Document
Store
Wide
Column
Store
Graph
Databases
23. 23 23
KEY VALUE STORE - OVERVIEW
o Most basic type of NoSQL Database and basis for other three
o Schema-free
o Store data as Key-Value pair
o Key-Value stores can be used as collections, dictionaries,
associative arrays etc.
Example DBs: Redis, Project Voldemort, Amazon DyanmoDB
Key: Value Row_Id:100
First_Name: Saman
Last_Name: Silva
Address: 123, Galle Rd,
Beruwala
Last_Order: 2001
24. 24 24
WIDE COLUMN STORE - OVERVIEW
o Stored data in a columnar format
o Semi-Schematic
o Allow key-value pairs to be stored
o Each key(Super Column) is associate with multiple attributes
o Stores data in column specific file
Example DBs: Apache Hbase, Cassendra, Big Table, Hadoop
Super_Column:Value
Sub_Coulmn->Key:Value
Sub_Coulmn->Key:Value
Super_Column:Name
First_Name:Saman
Last_Name:Silva
Super_Column:Address
No:125
Road:Galle Rd
City:Beruwala
25. 25 25
DOCUMENT STORE - OVERVIEW
o Everything is stored in a Document
o Schema-free
o Data is stored inside documents as JSON or BSON formats
o Document is a Key-Value collection
Example DBs: MongoDb, CouchDB
Database: Customers Database: Orders
Document_Id:100
First_Name:Saman
Last_Name:Silva
Address:
Order:
Number: 125
Road: Galle Rd
City: Beruwala
Most_Recent:
2001
Document_Id:2001
Price: Rs 450
Item1: 1001
Item2: 1002
Document_Id:2002
Price: Rs 750
Item1: 1003
Item2: 1001
26. 26 26
GRAPH DATABASE - OVERVIEW
o Collection of nodes & edges
o Node represent an entity & an edge represent a connection
between two nodes
o Stores data in a Graph
o Within nodes data stored as Key : Value pairs
o Mostly use in Social network applications such as Facebook,
Twitter and etc.
o Example DBs: Neo4j, Titan
Nodes & EDGES
With Key : Value
Name:
Shelan
Name:
Hansa
WorkPlace:
Virtusa
NODE
WORKS_IN
WORKS_IN
IS_FRIEND_OF
EDGE
27. 27 27
KEY VALUE STORE
o Most Basic NoSQL Database Type
o Storing data as a dictionary or hash
o Dictionaries contain collection of objects or records
o Different than RDBMS
29. 29 29
WHEN TO USE KEY VALUE STORE
o Caching: Quickly storing and retrieving
o Queuing: Some K/V stores support lists,
sets, queues and more
o Distributing information and tasks
o Keeping live information
30. 30 30
ADVANTAGES OF KEY VALUE STORE
o Support horizontal scaling
o Highly Performance
o Lack of Schema/Schema-less Data store
o Different than RDBMS
o Flexibility and more closely follow modern concepts like OOP
o Provide basic K/V concept for other major 3 NoSQL DB types
31. 31 31
REDIS – KEY STORE VALUE DATABASE
o Open Source, Advanced Key-Value store
o 3 main specialties
o Holds its database entirely in memory
o Has a relatively rich set of data types
o Can replicate data to any number of slaves
o 2 types of Persistence
o RDB Persistence
o AOF Persistence
o 5 Data Types
http://www.redis.io
http://redis.io/download
32. 32 32
REDIS FEATURES
o Exceptionally Fast
o Support Rich data types
o Operations are Atomic
o MultiUtility Tool
33. 33 33
REDIS DATA TYPES
“This is a String Value”
name
customer:1
address
Hasangi Hasangi Hansa HijasRajith
0
Hansa
1
Hasangi
2 Hijas
4
Shelan
3 Rajith
Hasangi Hansa HijasRajith
Shelan
Beruwala
customer:2
name
address
Rajith
Homagama
Hashes
Lists
Sets
Sorted Sets
String
34. 34 34
REDIS - STRING
“This is a String Value”
>SET stringvalue “This is a String Value”
>OK
>GET stringvalue
>“This is a String Value”
39. 39 39
REDIS – PUBS/SUBS
o Publish and Subscribe to message Channels
o Publisher/s can Subscribe to a channel/s
Publisher
Subscriber SubscriberSubscriber
“RedisChat” ChannelHi, I’m RedisChat
Publisher
Publisher
I’m Another RedisChat
Publisher
40. 40 40
REDIS – TRANSACTIONS
o Execute group of command in a single step
o Has 2 properties
o All commands in a transaction are sequentially executed as a
single isolated operation
o Redis transaction is also atomic
>MULTI
>INCRBY accountA -50
>QUEUED
>INCRBY accountB +50
>QUEUED
>EXEC
>(integer)50
>(integer)150
>SET accountA 100
>OK
>SET accountB 100
>OK
>GET accountA
>”100”
>GET accountB
>”100”
>GET accountA
>”50”
>GET accountB
>”150”
41. 41 41
REDIS – DISK PERSISTENCE
o Point-in-time snapshot of all
dataset
o Compact, ideal for regular
backup/archive
o Multiple save-points available
o Faster restarts compared to
AOF
o Very good for disaster
recovery
o Writes every command like a
tape
o Gets re-written when it gets too
big
o Can be easily parsed & edited
o AOF files bigger than RDB files
o Slower than RDB
RDB Persistence AOF Persistence
42. 42 42
REDIS – REPLICATION
o Use asynchronous replication
o A master can have multiple slaves
o Slaves accept connection from other slaves
o Non-blocking on both master and slave side
o Redis Sentinel
Redis Master Redis Slave
Sentinel
Redis Master
Redis Slave Redis Slave
Redis SlaveRedis Slave
• Automatic Failover
• Monitoring
• Notification
• Configuration Provider
High AvailabilityScalability
43. 43 43
WIDE-COLUMN STORE DATABASES
o Stores data as sections of columns of data rather than rows of data
o Ability to hold very large numbers of dynamic columns
o Benefit of storing data in columns, is fast search/ access and data
aggregation
o Advantages for data warehouses, customer relationship
management (CRM) systems.
o A wide variety of companies and organizations use Hadoop for
both research and production.
44. 44 44
HADOOP
o Its not a software. Its a framework of tools.
o Objective is to running applications on big data.
o Open source set of tools distributed under Apache license.
o A distributed file system (HDFS)
o An environment to run Map-Reduce tasks – typically Batch
mode
o NOSQL Database – HBase
o Real Time Query Engine (Impala)
45. 45 45
HADOOP’S APPROACH
Big Data is broken
into pieces
Computation
Computation
Computation
Computation
Combined Result
47. 47 47
HADOOP DISTRIBUTED MODEL
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
48. 48 48
HADOOP DISTRIBUTED MODEL
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
49. 49 49
HADOOP DATA ACCESS
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
Application
50. 50 50
HADOOP DATA FAULT TOLERANCE
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
Application
Data
Node
Data
Node
Data
Node
Task
Tracker
51. 51 51
HOW HADOOP SOLVES BIG DATA
CHALLENGES OF PROGRAMMERS
File location
Manage failures
Break computations into pieces
Scaling
Focus on scale free programs
53. 53 53
HBASE
An open-source, distributed, versioned, non-relational database
modeled after Google's Big Table.
Features
o Linear and modular scalability.
o Strictly consistent reads and writes.
o Automatic and configurable sharding of tables
o Automatic failover support between Region Servers.
o Convenient base classes for backing Hadoop MapReduce jobs
with Apache HBase tables.
o Easy to use Java API for client access.
o Block cache and Bloom Filters for real-time queries.
54. 54 54
What is a Graph ?
F
o
ll
o
w
s
Shelan
Hansa
Hijaz
Follows
F
o
ll
o
w
s
Hasangi Follows
@Hansa
#nosql
GRAPH DATABASES
55. 55 55
What is a Graph Database?
Database that uses graph structures to represent & store data.
Key-Features
o Excellent in dealing with relationships
o High Performance
o Flexible
o Query language support
Rajith
Name:Rajith
City:Kottawa
Married:false
Works for
Since:2014/11/24
Virtusa
Name:Virtusa
City:Colombo
GRAPH DATABASES
56. 56 56
GRAPH DATABASES
Graph databases vs Relational databases
Relational Graph
Tables Nodes
Schema with nullables No schema
Relationships with foreign
keys
Relation is first class citizen
Related data fetch with joins
Related data fetched with
pattern
60. 60 60
NEO4J - CYPHER
Node with properties
( a { name : “rajith”, born : 1989 } )
Relationships with properties
( a ) - [:WORKED_IN { roles:[“ASE”] } ] - > ( b )
Labels
( a : Person { name: “rajith”} )
61. 61 61
NEO4J - CYPHER
Quering with Cypher
MATCH ( a ) - - > ( b )
RETURN a, b;
MATCH ( a ) – [ r ] – > ( b )
RETURN a.name, type ( r );
Using Clauses
MATCH ( a : Person)
WHERE a.name = “rajith”
RETURN a;
62. 62 62
DOCUMENT STORE
o A collection of documents
o Data in this model is stored inside documents.
o A document is a key value collection where the key allows
access to its value.
o Documents are not typically forced to have a schema and
therefore are flexible and easy to change.
o Documents are stored into collections in order to group
different kinds of data.
o Documents can contain many different key-value pairs, or key-
array pairs, or even nested documents.
o Usually use JSON (BSON) like interchange model then
application logic can be write easily.
63. 63 63
WHAT IS MONGODB ?
o Scalable High-Performance Open-source, Document-
orientated database written in C++.
o Built for Speed
o Rich Document based queries for Easy readability
o Full Index Support for High Performance
o Replication and Failover for High Availability
o Auto Sharding for Easy Scalability.
o Map / Reduce for Aggregation.
65. 65 65
MONGODB ADVANCED FEATURES
o Replication
o Indexing
o Aggregation
o Sharding
o Capped Collections
66. 66 66
REPLICATION
o Replication is the process of synchronizing data across
multiple servers
o Replication provides redundancy and increases data
availability
Primary
DB
Secondary
DB
Arbiter
DB
Minimum Replica set in MongoDB
REPLICA SET
68. 68 68
INDEXING
o Indexes support the efficient execution of queries in MongoDB
o MongoDB can use the index to limit the number of documents
it must inspect
o Indexes use a B-tree data structure.
o Using “ensureIndex” method can create index.
>db.COLLECtION_NAME.ensureIndex({KEY:1})
o Key is the name of field on which want to create index.
o 1 is for ascending order.
o -1 is for descending order.
69. 69 69
WITH OUT INDEXING
Client says
Server have to read
every document to find
the result.
Document Storage
71. 71 71
INDEX TYPES
o Default _id Index
o Single Field Index
o Compound Index
o Multikey Index
o Geo Index
o Text Index
o Hashed Index
72. 72 72
AGGREGATIONS
Aggregations are operations that process data records and return
computed results.
MongoDB provides a rich set of aggregation operations.
Aggregation concepts
o Aggregation Pipelines
o Map-Reduce
o Single Purpose Aggregation Operation
73. 73 73
AGGREGATION PIPELINES
The pipeline provides efficient data aggregation using native
operations within MongoDB, and is the preferred method for data
aggregation in MongoDB
75. 75 75
SINGLE PURPOSE AGGREGATION
OPERATION
MongoDB provides special purpose database commands.
All of operations aggregate documents from a single collection.
Common aggregation
operations are:
o returning a count of
matching documents
o returning the distinct values
for a field
o grouping data based on
the values of a field
76. 76 76
SHARDING
Sharding is a method for storing data across multiple machines.
MongoDB uses sharding to support deployments with very large data
sets and high throughput operations.
77. 77 77
CAPPED COLLECTIONS
o It is fixed-size circular collections that follow the insertion order
to support high performance for create, read and delete
operations.
o Capped collections restrict updates to the documents if the
update results in increased document size.
o Capped collections are best for storing log information, cache
data or any other high volume data.
78. 78 78
NOSQL DATABASE CATEGORIES
NoSQL
Database
Categories
Key Value
Store
Document
Store
Wide
Column
Store
Graph
Databases
79. 79 79
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
Database
model
Wide column
store
Document store Graph DBMS Key-value store
Initial release 2008 2009 2007 2009
License Open Source Open Source Open Source Open Source
DBaaS no no no no
Implementation
language
Java C++ Java C
Server
operating
systems
• Linux
• Unix
• Windows
• Linux
• OS X
• Solaris
• Windows
• Linux
• OS X
• Windows
• BSD
• Linux
• OS X
• Windows
Data scheme schema-free schema-free schema-free schema-free
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
80. 80 80
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
2nd indexes no yes yes no
SQL no no no no
APIs and other
access
methods
Java API
RESTful HTTP
Thrift
proprietary protocol
using JSON
Cypher query
language
Java API
RESTful HTTP
proprietary
protocol
Supported
programming
languages
C
C#
C++
Groovy
Java
PHP
Python
Scala
Actionscript, C, C#,
C++, Clojure,
ColdFusion, D, Dart,
Delphi, Erlang, Go,
Groovy, Haskell, Java,
JavaScript, Lisp, Lua,
MatLab, Perl, PHP,
PowerShell, Prolog,
Python, R, Ruby,
Scala, Smalltalk
.Net
Clojure
Go
Groovy
Java
JavaScript
Perl
PHP
Python
Ruby
Scala
C, C#, C++,
Clojure, Dart
Erlang, Go,
Haskell, Java
JavaScript,
Lisp, Lua
Objective-C,
Perl, PHP,
Python, Ruby,
Scala,
Smalltalk, Tcl
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
82. 82 82
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
Foreign keys no no yes no
Transaction
concepts
no no ACID
optimistic
locking
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory
capabilities
yes
User concepts
Access Control
Lists (ACL)
Access rights for
users and roles
no
very simple
password-based
access control
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis
A Relational Database management System(RDBMS) is a database management system based on relational model introduced by E.F Codd.
Many popular databases currently in use are based on the relational database model.
The data in RDBMS is stored in database objects called tables. The table is a collection of related data entries and it consists of columns and rows.
table is the most common and simplest form of data storage in a relational database.
A field is a column in a table that is designed to maintain specific information about every record in the table.
A record, also called a row of data, is each individual entry that exists in a table. record is a horizontal entity in a table that represents set of related data.
A column is a vertical entity in a table that contains all information associated with a specific field in a table. a column is a set of value of a particular type
Entity Integrity: There are no duplicate rows in a table. the rows in a relational table should all be distinct.
Domain Integrity: Enforces valid entries for a given column by restricting the type, the format, or the range of values.
column values must not be repeating groups or arrays
Referential integrity: Rows cannot be deleted, which are used by other records.
User-Defined Integrity: Enforces some specific business rules that do not fall into entity, domain or referential integrity.
the concept of a null value- A blank is considered equal to another blank, a zero is equal to another zero, but two null values are not considered equal.
NOSQL market is expected to grow 21 percent annually and reach 3.4 billion US dollars in 2020.
Why this growth is expected? Because it’s being proved that developing NOSQL applications in Facebook, Twitter, Biotechnology, Defense, Image processing and many more, has gained more success. NOSQL is moving in to become a major player in database market place.
NOSQL supports Big Users. Early days, 10000 concurrent users was an extreme case. But now apps should support millions of different users a day, and must support global users 24 hours a day, 365 days a year. Supporting large numbers of concurrent users is important, but because app usage requirements are hard to predict, it’s just as important to dynamically support rapidly growing numbers of concurrent users. With relational technologies, many application developers find it difficult, or even impossible, to get the dynamic scalability and level of scale they need while also maintaining the performance user’s demand. Only NOSQL can help to achieve this target.
NOSQL also supports Big Data. You can see according to the graph, the usage of structured and semi-structured data usage has increased with time. Explosive growth in internet usage, in addition to the use of mobile and social apps, and machine-to-machine communications, has introduced new data types. However, capturing and using big data requires a very different type of database. Unfortunately, the rigidly defined schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data and is a poor fit for unstructured and semi-structured data. NOSQL provides a much more flexible data model that better maps to an applications data organization.
Today 20 billion devices are connected to Internet. For example: smart phones, tablets, home appliances, devices in cars, hospitals, warehouses and more. These devices receive data on environment, location movement, temperature, and etc. Innovative enterprises are relying on NoSQL technology to scale concurrent data access to millions of connected devices and systems, store billions of data points, and meet the performance requirements can be achieved by NOSQL.
Today, most new applications run in a public, private, or hybrid cloud, support large numbers of users, and use a three-tier internet architecture. In the cloud, a load balancer directs the incoming traffic to a scale-out tier of web/application servers that process the logic of the application. NoSQL databases are built from the ground up to be distributed, scale-out technologies and are therefore a better fit with the highly distributed nature of the three-tier internet architecture.
Relational and NOSQL data models are very different. The relational model takes data and separates it into many interrelated tables that contain rows and columns. You can store a JSON document in NOSQL which might take all the data stored in 20 tables of a relational database. Another major difference is that relational technologies have rigid schemas. NOSQL has no strict schema like relational database. The format of the data being inserted can be changed at any time, without application disruption.
There are two options to deal with increased concurrent users and volume of data. They are, scale up the database or scale down. Relational database has limitations in scaling up. To support more concurrent users and store more data, relational databases require a bigger and more expensive server with more CPUs, memory, and disk storage. At some point, the capacity of even the biggest server can be outstripped and the relational database cannot scale further. Scale-out Database Tier with NoSQL provide an easier, linear, and cost effective approach to database scaling. As the number of concurrent users grows, simply add additional low-cost, commodity servers to your cluster. There’s no need to modify the application, since the application always sees a single (distributed) database.
A transaction is a logical unit that is independently executed for data retrieval or update.
ACID is a set of properties that apply specifically to database transactions. A database truncations are processed reliably, referred to as ACID. Let's examine the ACID requirement for a database transaction system in more detail.
Atomicity means either the task or tasks within a transaction are performed or none are performed (all or none rule).
Consistency means the transaction meets all rules defined by the system at all times. The transaction does not violate those rules and the database must remain in a consistent state at the beginning and end of a transaction. There are no half-completed transactions.
Isolation: No transaction has access to any other transaction that is in an intermediate or unfinished state. Each transaction is independent.
Finally, durability means the transaction is complete and it will persist. The completed transaction will survive system failure, power loss and other types of system breakdowns.
CAP Theorem, also known as Brewer’s Theorem, CAP theorem says, that there are three essential system requirements necessary for the successful design, implementation and deployment of applications in distributed computing systems. They are Consistency, Availability and Partition Tolerance.
Consistency: means that each client always has the same view of the data. This is the same idea of consistency in ACID.
High Availability: means that all clients can always read and write.
Partition-tolerance: means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.
Attaining all three is not however possible. If you can't have all of the ACID guarantees it turns out you can have two of these three characteristics.
The BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP theorem. The types of large systems based on CAP aren't ACID they are BASE. Everyone who builds big applications builds them on CAP and BASE: Google, Yahoo, Facebook, Amazon, eBay, etc. Let's review BASE standards:
Basically Available: This constraint states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account.
Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’
Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.
The BASE model isn't appropriate for every situation, but it is certainly a flexible alternative to the ACID model for databases that don't require strict adherence to a relational model.
Key Value Store
Global Collection of Key:Value Pair eg : Name is the Key, Value is the “Saman”Schema Free : every record can have different keysMost common, basis for other 3 nosql database categories…
Examples, Redis, Amazon Simple DB, Project Voldermart, Riazk. Windows Azure
Document Store
Similar to key/value, but major different is value is document
Flexible Schema, Schema Free – any number of fields can be added
Values (Documents) stored in JSON or BSON
Wide Column Store
Each key, key -> super column is associate with multiple attributes
Semi schematic, not schema free, we need to specify groups of column(knowns as column families)
Data stores in column specific file
Graph databases
Is a collection of nodes and edges and each node represent a entity & each edge represent a connection or relationship between two nodes
This stores data in a graph
Key Value Store
Column Oriented Store
Document Store
Graph Database
Multimodal Databases
Object Databases
Unresolved and Uncategorized
Basic type of nosql database category and basic one for other major three database categories
Schema-free: allow developers to store schema less data (every record can have different keys)
database stores data as key value pair, each key is unique and the value can be string, JSON, BLOB (basic large object)
Key-Value stores can be used as collections, dictionaries, associative arrays etc.
For example, think we have sales database and it have customer and order tables and each tables have unique rows. Here we have got one row here 100 and it have key value pairs first name, last name, address and last order will point to a another table. But there is no explicit relation between customer and orders
Stored data in a columnar format those column are treated individually
Wide columns have tables, but tables doesn’t belongs to a database. There is no such thing as database. Tables have rows, and rows have super columns and columns within them. So super columns are define when the tables are defined. In this example Name and Address
Everything is stored in a Document, we can say collection of documents
Schema Free : Documents are not typically forced to have a schema and therefore are flexible and easy to change.
Instead of contain rows, they contain documents. But conceptually document is a similar to row. But still have the key value pairs inside the documents.
The little difference is value of key can actually it self be a document. Value of that key point to an another document in a another database.
As an example Customer document id 100 has a address key and value of that key it self a document
And orders key has a value it self as a document it is point to an Orders database document id 2001s
Key / Value Store
KV can be considered the most basic and backbone implementation of NoSQL.
This is designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash.
Stores data as hash table.
each Key is unique, key may be strings, hashes, lists, sets, sorted sets
Value can be string, JSON, BLOB (basic large object) etc.
These type of databases work by matching keys with values, similar to a dictionary. There is no structure nor relation. After connecting to the database server (e.g. Redis), an application can state a key (e.g.Name) and provide a matching value (e.g. ”Saman”) which can later be retrieved the same way by supplying the key.
Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data.
These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database.
KV stores work in a very different fashion than the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well defined data types. Exposing the data types to the database program allows it to apply a number of optimizations. In contrast, key-value systems treat the data as a single opaque collection which may have different fields for every record.
This offers considerable flexibility and more closely follows modern concepts like object-oriented programming.
Some popular key / value based data stores are:
Redis:In-memory K/V store with optional persistence.
Riak:Highly distributed, replicated K/V store.
Memcached / MemcacheDB:Distributed memory based K/V store.
Key / value DBMSs are usually used for quickly storing basic information, and sometimes not-so-basic ones after performing, for example, a CPU and memory intensive computation. They are extremely preferment, efficient and usually easily scalable.
When To Use
Caching:Quickly storing data for - sometimes frequent - future use.
Queue-ing:Some K/V stores (e.g. Redis) supports lists, sets, queues and more.
Distributing information / tasks:They can be used to implement Pub/Sub.
Keeping live information:Applications which need to keep a state cane use K/V stores easily.
One of the biggest benefit for most NoSQL solutions, including Key Value Stores, would be horizontal scaling. We all know that horizontal scaling and SQL Server, while it’s possible, does not play well. Typically if you need more from SQL Server you scale vertically, which can be costly.
Key / Value data stores are highly performant, easy to work with and they usually scale well.
Another benefit for Key Value stores is a lack of schema, this allows for changing the data structure as needed, thus being a bit more flexible. Whereas with SQL Server altering a table could result in stored procedures, functions, views, etc… needing updates, which take time and a DBA resource.
Because optional values are not represented by placeholders as in most RDBs, key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads.
key-value systems treat the data as a single opaque collection which may have different fields for every record. This offers considerable flexibility and more closely follows modern concepts like object-oriented programming.
The key value stores are typically written in some type of programming language, commonly Java. This gives the application developer the freedom to store data how they see fit, in a schema-less data store.
A subclass of the key-value store is the document-oriented database, which offers additional tools that use the metadata in the data to provide a richer key-value database that more closely matches the use patterns of RDBM systems. Some graph databases are also key-value stores internally, adding the concept of the relationships (pointers) between records as a first class data type.
Key Value stores support “Eventual Consistency”, if a feature in your application doesn’t need to fully support ACID, then may not be a significant draw back.
Redis is an open source, advanced key-value store and a serious solution for building high-performance, scalable web applications.
Redis has three main peculiarities that set it apart from much of its competition:
Redis holds its database entirely in memory, using the disk only for persistence.
Redis has a relatively rich set of data types when compared to many key-value data stores.
Redis can replicate data to any number of slaves – Redis Replication
Redis Persist in 2 ways
RDB Persistence
AOF(Append Only File) Persistence
Now Redis is quite a bit different than other noSQL databases. Besides just being different than relational databases, like SQL server. You may be familiar with document databases like Ravendb or Mongodb. And while they are certainly good choices for noSQL databases, they operate quite a bitdifferently than Redis does. With document databases, like Ravendb or Mongodb. The focus is on creating documents which are persisted to disk and can be indexed. Just like relational tables are indexed in SQL server or Oracle. Redis on the other hand stores its data using keys, and the data it stores can be in the form of different data structures, not just a document. The data is also stored in memory with persistence as a secondary consideration. And there is no indexing of any kind. You can, of course, implement your own indexes by creating them as additional data. But Redis does not do any of that for you. This can be a bit of a shock to you, some developers that are use to being able to query a database. After all, isn't that what databases are for? Databases like SQL server and Oracle allow you to query the database using SQL. Databases like RavenDB and MongoDB, allow you to query the data using indexes you create ahead of time or on the fly. But Redis only lets you get data by specifying a key. At first, this may seem like a ludicrous tradeoff to make. Why would you want to give up the ability to query your data? And it's true, in some case, using Redis will not make any sense at all, but you'll probably find that where Redis is appropriate. Although you have to do a little bit of extra work in designing your data, and working out how to access that data. It will be extremely fast with very little overhead, and so that's really the advantage, and the consideration that you need totake into account when deciding whether or not to use Redis.
Exceptionally Fast : Redis is very fast and can perform about 110000 SETs per second, about 81000 GETs per second.
Supports Rich data types : Redis natively supports most of the datatypes that most developers already know like list, set, sorted set, hashes. This makes it very easy to solve a variety of problems because we know which problem can be handled better by which data type.
Operations are atomic : All the Redis operations are atomic, which ensures that if two clients concurrently access Redis server will get the updated value.
MultiUtility Tool : Redis is a multi utility tool and can be used in a number of use cases like caching, messaging-queues (Redis natively supports Publish/ Subscribe ), any short lived data in your application like web application sessions, web page hit counts, etc.
Redis supports 5 types of data types,
Bitmaps and HyperLogLogs
Redis also supports Bitmaps and HyperLogLogs which are actually data types based on the String base type, but having their own semantics.
Strings – Redis String is a Sequence of bytes.
Binary safe, meaning they have a known length not determined by any special terminating characters.
Can store anything up to 512 megabytes in one string.
Lists - Redis Lists are simply lists of strings, sorted by insertion order.
You can add elements to a Redis List on the head or on the tail. The max length of a list is 2-32 - 1 elements (more than 4 billion of elements per list).
Internally maintained as a linked list.
Ideal for Queues, Stacks, TopN, Recent News, Time Line
Sets - Redis Sets are an unordered collection of Strings.
In redis you can add, remove, and test for existence of members in O(1) time complexity.
In the above example Hasangi is added twice but due to unique property of set it is added only once.
The max number of members in a set is 232 - 1 (4294967295, more than 4 billion of members per set).
Sample usage tracking unique Ips, Tagging.
Sorted Sets - Redis Sorted Sets are, similarly to Redis Sets, non repeating collections of Strings.
The difference is that every member define with a score, that is used to take set ordered, from the smallest to the greatest score.
Members are unique, but scores may be repeated.
Sample Usage: Leaders Boards, Most Page Views, Sort for a given age, friends, comments, likes range
Sorted Sets - Redis Sorted Sets are, similarly to Redis Sets, non repeating collections of Strings.
The difference is that every member define with a score, that is used to take set ordered, from the smallest to the greatest score.
Members are unique, but scores may be repeated.
Sample Usage: Leaders Boards, Most Page Views, Sort for a given age, friends, comments, likes range
Redis pub/sub implements the messaging system where senders/client (called publishers) sends the messages while receivers (subscribers) receive them. The link by which messages are transferred is called channel.
In Redis a client(Publisher) can subscribe any number of channels.
Subscriber also get messages from (published) multiple clients(Publishers) who are publishing message to a particular Channel
Redis transactions allow the execution of a group of commands in a single step. Transactions has two properties in it, which are described below:
All commands in a transaction are sequentially executed as a single isolated operation. It is not possible that a request issued by another client is served in the middle of the execution of a Redis transaction.
Redis transaction is also atomic. Atomic means either all of the commands or none are processed.
Redis Persistence
Redis provides a different range of persistence options:
The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
the AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log on background when it gets too big.
If you wish, you can disable persistence at all, if you want your data to just exist as long as the server is running.
It is possible to combine both AOF and RDB in the same instance. Notice that, in this case, when Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.
The most important thing to understand is the different trade-offs between the RDB and AOF persistence. Let's start with RDB:
RDB advantages
RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance you may want to archive your RDB files every hour for the latest 24 hours, and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters.
RDB is very good for disaster recovery, being a single compact file can be transferred to far data centers, or on Amazon S3 (possibly encrypted).
RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike.
RDB allows faster restarts with big datasets compared to AOF.
RDB disadvantages
RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, but you can have multiple save points). However you'll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason you should be prepared to lose the latest minutes of data.
RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability.
AOF advantages
Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second write performances are still great (fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress.) but you can only lose one second worth of writes.
The AOF log is an append only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with an half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily.
Redis is able to automatically rewrite the AOF in background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.
AOF contains a log of all the operations one after the other in an easy to understand and parse format. You can even easily export an AOF file. For instance even if you flushed everything for an error using a FLUSHALL command, if no rewrite of the log was performed in the meantime you can still save your data set just stopping the server, removing the latest command, and restarting Redis again.
AOF disadvantages
AOF files are usually bigger than the equivalent RDB files for the same dataset.
AOF can be slower than RDB depending on the exact fsync policy. In general with fsync set to every secondperformances are still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load.
In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to not reproduce exactly the same dataset on reloading. This bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is ok, but this kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However - 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). 2) We never had a single report from users about an AOF corruption that was detected in the real world.
Redis replication is a very simple to use and configure master-slave replication that allows slave Redis servers to be exact copies of master servers. The following are some very important facts about Redis replication:
Redis uses asynchronous replication. Starting with Redis 2.8, however, slaves will periodically acknowledge the amount of data processed from the replication stream.
A master can have multiple slaves.
Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure.
Redis replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more slaves perform the initial synchronization.
Replication is also non-blocking on the slave side. While the slave is performing the initial synchronization, it can handle queries using the old version of the dataset, assuming you configured Redis to do so in redis.conf. Otherwise, you can configure Redis slaves to return an error to clients if the replication stream is down. However, after the initial sync, the old dataset must be deleted and the new one must be loaded. The slave will block incoming connections during this brief window.
Replication can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy SORT operations can be offloaded to slaves), or simply for data redundancy.
http://blog.concretesolutions.com.br/2013/03/redis-parte-2/
http://redis.io/topics/sentinel
http://redis.io/topics/replication
Redis Sentinel provides high availability for Redis. In practical terms this means that using Sentinel you can create a Redis deployment that resists without human intervention to certian kind of failures.
Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients.
This is the full list of Sentinel capabilities at a macroscopical level (i.e. the big picture):
Monitoring. Sentinel constantly checks if your master and slave instances are working as expected.
Notification. Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
The important difference here is that columns are created for each row rather than being predefined by the table structure.
Map-Reduce - An algorithm for efficiently processing large amounts of data in parallel
Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
Avro™: A data serialization system.
Cassandra™: A scalable multi-master database with no single points of failure.
Chukwa™: A data collection system for managing large distributed systems.
HBase™: A scalable, distributed database that supports structured data storage for large tables.
Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout™: A Scalable machine learning and data mining library.
Pig™: A high-level data-flow language and execution framework for parallel computation.
Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
ZooKeeper™: A high-performance coordination service for distributed applications.
Labels
Labels
Labels
Labels
Labels
Labels
Map-Reduce - An algorithm for efficiently processing large amounts of data in parallel
Labels
Cypher is a query language specially designed for neo4j graph database.it is still in active development.
Cypher is declarative, that means you specify what you need to retrieve , not how neo should retrieve it.
Cypher use patters to match data in the database.
Cypher works with clauses e.g. where , orderby
Document Store is type of NOSQL database. Its store collection of documents. Data model store inside the document.
Documents are not typically forced to have a schema and therefore are flexible and easy to change.
Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Usually use JSON (BSON) like interchange model then application logic can be write easily.
MongoDB is open source document database. It is written in C++.
Data is stored in an open format such as XML, JSON, Binary JSON (BSON), etc. then easy readability of data.
Allows server side operations on data, and easy to create tools to manipulate data.
Fully index support then give high performance.
In mongoDB have automatically fail recovery and replication then high availability.
MongoDB has horizontal scaling like sharding then have easy scalability.
It is provide aggregation framework then easy to handle lager amount of data.
Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.
Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table.
A document is a set of key-value pairs.
This is example for a document.
MongoDB has lot of its own features. Those are some advance features in mongoDB.
Now we review one by one those features.
Replication is the process of synchronizing data across multiple servers.
It’s provide redundancy and its increase the data availability because it keeps multiple copies of data on different database servers,
replication protects a database from the loss of a single server.
Replication also allows you to recover from hardware failure and service interruptions. We can dedicate one to disaster recovery, reporting, or backup
////////////
When we have single server DB then it is danger. when DB crashed, all data will be lost But if we have a backup then can restore it however this is a traditional approach for fail safety. In this situation ,Mongo DB support concept call replica set to achieves replication.
Replica set is a group of mongod instances that host the same data set. Generally replica set contain minimum 3 nodes.
One is primary node and one or more secondary nodes and arbiter node.
All data replicates from primary to secondary node.
1)Primary node only can have one primary instance in replica set. that receives all write operations. That means at any client write data to the database then have to connected to the primary.
2)Secondary node those are read only databases. Can have many secondary database. That means can have more scalability because can preform many more read against the replicas rather than attacking single server.
3) Arbiter node An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections of for primary.
It can be a smaller machine does not need lot of hard.
At some point primary db going to fail then one of the secondary will take over and become the primary. this is great because mongod support automatically recovery from a crash on primary.
If one of secondary will break it not big deal because still have primary and depends on the application can have many secondary also.
NO Data loss and NO lot of functionality.
When primary Server will fail then one of the secondary will take over but there can be multiple secondary then which one become primary. So what mongo does it is hold an election. In election will look simple majority more than the 50% in order to become primary server. Those data will store in arbiter db server and its responsible for election.
Mongodb has lager volume of data.
Index support to speed up query and when using index then can limit the number of document to scan.
Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. Indexes use B-tree structure.
Using “ensureIndex” method can create index on field .
Here key is the name of filed on which you want to create index and 1 is for ascending order. To create index in descending order then use as -1.
As well “ensureIndex” method can pass multiple fields, to create index on multiple fields.
The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index.
MongoDB with out Indexing
Lets look at this example,
You have a collection named “foo” and you want to find all document where is the value field x is 10.then
What the server does in order to find the document. Server has to scan each and every document and check if the value field x is equal to 10. then have to scan every document and compare those. This is very wasteful operation.
Without indexes, MongoDB must perform a collection scan. then solution is the use index.
Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB defines indexes at the collection level and supports indexes on any field or sub-field of the documents in a MongoDB collection.
MongoDB provides a number of different index types to support specific types of data and queries.
1) Default _id index All MongoDB collections have an index on the _id field that exists by default. If applications do not specify a value for _id the driver or the mongod will create an _id field with an ObjectId value. The _id index is unique and prevents clients from inserting two documents with the same value for the_id field.
2) Single Field index In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
3) Compound Index MongoDB also supports user-defined indexes on multiple fields. The order of fields listed in a compound index has significance. index sorts first by first field and then, within each document , sorts by other field.
4) Multikey Index If ,index a field that holds an array value, MongoDB creates multikey index on that field . These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays. MongoDB automatically determines whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.
5) Geo index To support efficient queries of geospatial coordinate data,
6) Text indexes It is supports searching for string content in a collection
7) Hashed indexes To support hash based sharding, MongoDB provides a hashed index type, which indexes the hash of the value of a field. These indexes have a more random distribution of values along their range, but only support equality matches and cannot support range-based queries.
Aggregations are operations that process data records and return computed results. Aggregation operations group values from multiple documents together, and can preform a variety of operations on the groped data to return a single result.
MongoDB provides a rich set of aggregation operations.
Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents.
There are 3 concepts in aggregation.
1)Aggregation pipelines
2)map-reduce
3)Single purpose aggregation operation
The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.
Documents enter a multi-stage pipeline that transforms the documents into an aggregated result.
In first stage it take document as input and process it then producing result output documents as the input for next stage and so on.
Possible stages in aggregation framework are, $project ,$match , $group , $sort , $skip, $limit , $unwind.
In this example have two stages such as $match and $group,
first happen $match stage then here filter status field value equal “A” then output document will be the input document to the next $group stage. then group according to the cust_id and get sum of amount as total.
MongoDB also provides map-reduce operations to perform aggregation.
In general, map-reduce operations have two phases such as map and reduce
Optionally, map-reduce can have a finalize stage to make final modifications to the result.
Map-reduce uses custom JavaScript functions to perform the map and reduce operations, as well as the optional finalize operation.
There are some syntax:
Map - JavaScript function that maps a value with a key and emits a key – values pair.
Reduce - JavaScript function that reduce or groups all the documents having the same key.
Out – specifies the location of the map-reduce query result
query- specifies the optional selection criteria for selecting documents
Sort – specifies the optional sort criteria
Limit – specifies the optional maximum number of documents to be returned.
In this example, have orders collection. Then get query with status field value equal “A” after that map the documents, key as cust_id and value as amount. Next reduce stage here return the sum of amount array and the result will store in order_totals.
MongoDB provides special purpose database commands.
These common aggregation operations are:
returning a count of matching documents,
returning the distinct values for a field,
and grouping data based on the values of a field.
All of these operations aggregate documents from a single collection
Sharding is a method for storing data across multiple machines.
MongoDB uses sharding to support deployments with very large data sets and high throughput operations
As the size of the data increases, a single machine may not be sufficient to store the data and can not acceptable all read write request
Sharding solves the problem with horizontal scaling.
With shading can add more machines to support data growth and the demands of read and write operations.
Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set.
Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards
Query Routers(MongoS): Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients.
It is fixed-size circular collections.
It get high performance for create, read and delete operations
By circular, it means that when the fixed size allocated to the collection is exhausted, it will start deleting the oldest document in the collection without providing any explicit commands.
Capped collections restrict updates to the documents if the update results in increased document size
Capped collections are best for storing log information,