Introduction to NoSQL


Published on

Introduction to NoSQL

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to NoSQL

  1. 1. Introduction To NOSQL
  2. 2. Agenda  Overview of NoSQL  Why NoSQL?  NoSQL Market Overview  Categories of NoSQL databases  Hadoop – Overview
  3. 3. Overview of NoSQL A term which stands for
  4. 4. Overview of NoSQL (Contd…)  NoSQL doesn’t mean to stop using SQL or SQL won’t be used.  The term refers to those databases that differ from relational databases.  Simply Non-relational databases.  NoSQL is a non-relational database management systems, different from traditional relational database management systems in some significant ways.  It is designed for distributed data stores where very large scale of data storing needs (for example Google or Facebook which collects terabits of data every day for their users). These type of data storing may not require fixed schema, avoid join operations and typically scale horizontally.
  5. 5. NoSQL databases are eventually consistent / CAP (not ACID). CAP theorem:  Consistency - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.  Availability - This means that the system is always on (service guarantee availability), no downtime. Node failures do not prevent survivors from continuing to operate  Partition Tolerance - This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. Overview of NoSQL (Contd…)
  6. 6. Overview of NoSQL (Contd…) NoSQL Features: 1. Scalability To maintain performance.  Horizontal Scalability: To increase the number of machines but maintaining proportional performance. Vertical scalability: To add more resources to your single machine to optimize performance 2. Open Source Most of the NoSQL Projects are Open source. So any one can use, modify it, like  Cassandra by facebook.  Bigtable by Google but only allowed for Google application.
  7. 7. 3. Schema Freeness  NoSQL databases doesn’t use any fixed schema like relational database.  Internal schema  External schema etc  The original intention of NoSQL is the modern web-scale databases. There are large number of companies using NoSQL. To name a few : • Google • Facebook • Mozilla • Adobe Overview of NoSQL (Contd…) • Foursquare • LinkedIn • Digg • McGraw-Hill Education
  8. 8. WHY NOSQL? Benefits of NOSQL: 1. Scaling RDBs weren’t easy to scale out. On the other hand NoSQL DBs are specially designed to scale out. 2. Big data Single RDBMS is almost unable to handle today’s huge amount of data and the transaction on that data. But Non-Relational databases are specially designed to handle big data. Data is becoming easier to capture and access through third parties such as Facebook, D&B, and others. Personal user information, geo location data, social graphs, user-generated content, machine logging data, and sensor- generated data are just a few examples of the ever-expanding array of data being captured. 3. Needs no Expert DBAs Although RDMS vendors claim that RDBMS provide management facilities but it still need an expert DBA to operate it. In contrast NoSQL DBs don’t need expert DBAs, as it provides automatic repair, data distribution, and simpler data models, which lead to lower administration.
  9. 9. WHY NOSQL? (CONTD…) 4. Economics RDBMS requires expensive components for providing efficient service. NoSQL uses cheap commodity servers to manage the same amount of data for which RDBMS needs expensive server. So NoSQL is economical as well. 5. Flexibility of data models There can occur changes in the requirements of an organization with the passage of time. Changes in RDBMS after its deployment creates many problems and also affects its services or some time it’s even almost impossible to make changes. NoSQL database can be changed at any instance, i.e. existing columns can be altered and new can be added.
  10. 10. WHY NOSQL? (CONTD…) Scale up with relational technology: limitations at the database tier Source:
  11. 11. WHY NOSQL? (CONTD…) Source: Scale out with NoSQL technology at the database tier
  12. 12. NOSQL MARKET OVERVIEW Source: Wikibon 2013 ( NoSQL_Software_and_Services_Market_Forecast_2012-2017) Hadoop/NoSQL Software and Services Marketshare, 2012
  13. 13. NOSQL MARKET OVERVIEW (CONTD…) Hadoop/NoSQL Software and Services Market Forecast, 2012-2017 Source: Wikibon 2013 ( NoSQL_Software_and_Services_Market_Forecast_2012-2017)
  14. 14. CATEGORIES OF NOSQL DATABASES There is a variety of types: • Column Store – Each storage block contains data from only one column • Document Store – stores documents made up of tagged elements • Key-Value Store – Hash table of keys 1. Column Store • Each storage block contains data from only one column • Example: Hadoop/Hbase   Clients : Yahoo, Facebook • Example: Ingres VectorWise  Column Store integrated with an SQL database • More efficient than row (or document) store if:  Multiple row/record/documents are inserted at the same time so updates of column blocks can be aggregated  Retrievals access only some of the columns in a row/record/document
  15. 15. CATEGORIES OF NOSQL DATABASES (CONTD…) 2. Document Store: • It stores documents made up of tagged elements. • Example: CouchDB   Clients - BBC • Example: MongoDB   Clients - Foursquare, Shutterfly
  16. 16. CATEGORIES OF NOSQL DATABASES (CONTD…) 3. Key-Value Store: • Hash table of keys • Values stored with Keys • Fast access to small data values • Example – Project-Voldemort   Clients : Linkedin • Example – MemCacheDB 
  17. 17. HADOOP - OVERVIEW  The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The Apache Hadoop framework is composed of the following modules :  Hadoop Common - contains libraries and utilities needed by other Hadoop modules  Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster.  Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.  Hadoop MapReduce - a programming model for large scale data processing.
  18. 18. Thank You