NoSql Databases
By Ankit Dubey
Introduction
• Non Relational Databases
• Not only SQL
• Doesn't require fixed schema
• Easy to scale & avoids join
• Used for Big Data & real time web app.
Why SQL?
Brief History of NoSQL Databases
• 1998- Carlo Strozzi use the term NoSQL for his
lightweight, open-source relational database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is
released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
Features of NoSQL
• Non Relational
• Schema free
• Simple API
• Distributed
Types of NoSQL Databases
• Key-value Pair Based
• Column-oriented
• Graphs based
• Document-oriented
Key Value Pair Based:
1. Data is stored in
Key/value pair
2. Store data as a hash
table
3. Key=Unique,Value=JSON
,BLOB,String etc
Column Based
1. works on column
2. every column is treated
as separatly
3. values of each column
are stored contiguously
Document Oriented
1. Stores & retrieves data as
a key value pair
2. But the value is stored as
a Document
Graph Based
1. Stores entities as well as
relations between them
2. Entity=node,Relation=Ed
ges
3. Each node & edge has
unique identifier
4. Traversing is fast
The CAP Theorem
 The limitations of distributed databases can be
described in the so called the CAP theorem
 Consistency: every node always sees the same data at
any given instance (i.e., strict consistency)
 Availability: the system continues to operate, even if
nodes in a cluster crash, or some hardware or software
parts are down due to upgrades
 Partition Tolerance: the system continues to operate in
the presence of network partitions
CAP theorem: any distributed database with shared data, can have at
most two of the three desirable properties, C, A or P
The CAP Theorem (Cont’d)
 Let us assume two nodes on opposite sides of a
network partition:
 Availability + Partition Tolerance forfeit Consistency
 Consistency + Partition Tolerance entails that one side
of the partition must act as if it is unavailable, thus
forfeiting Availability
 Consistency + Availability is only possible if there is no
network partition, thereby forfeiting Partition Tolerance
Large-Scale Databases
 When companies such as Google and Amazon were
designing large-scale databases, 24/7 Availability was
a key
 A few minutes of downtime means lost revenue
 When horizontally scaling databases to 1000s of
machines, the likelihood of a node or a network failure
increases tremendously
 Therefore, in order to have strong guarantees on
Availability and Partition Tolerance, they had to
sacrifice “strict” Consistency (implied by the CAP
theorem)
Trading-Off Consistency
 Maintaining consistency should balance between the
strictness of consistency versus availability/scalability
 Good-enough consistency depends on your application
Trading-Off Consistency
 Maintaining consistency should balance between the
strictness of consistency versus availability/scalability
 Good-enough consistency depends on your application
Strict Consistency
Generally hard to implement,
and is inefficient
Loose Consistency
Easier to implement,
and is efficient
The BASE Properties
 The CAP theorem proves that it is impossible to guarantee
strict Consistency and Availability while being able to
tolerate network partitions
 This resulted in databases with relaxed ACID guarantees
 In particular, such databases apply the BASE properties:
 Basically Available: the system guarantees Availability
 Soft-State: the state of the system may change over time
 Eventual Consistency: the system will eventually
become consistent
Eventual Consistency
 A database is termed as Eventually Consistent if:
 All replicas will gradually become consistent in the
absence of updates
Eventual Consistency
 A database is termed as Eventually Consistent if:
 All replicas will gradually become consistent in the
absence of updates
Webpage-A
Event: Update
Webpage-AWebpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Webpage-A
Advantages of NoSql
• It can serve as the primary data source for online
applications.
• Handles big data which manages data velocity, variety,
volume, and complexity
• Excels at distributed database and multi-data center
operations
• Eliminates the need for a specific caching layer to store
data
• Offers a flexible schema design which can easily be
altered without downtime or service disruption
• Object-oriented programming which is easy to use and
flexible
• NoSQL databases don't need a dedicated high-
performance server
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• No Single Point of Failure
• Easy Replication
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured
data with equal effect

No sql databases

  • 1.
  • 2.
    Introduction • Non RelationalDatabases • Not only SQL • Doesn't require fixed schema • Easy to scale & avoids join • Used for Big Data & real time web app.
  • 3.
  • 4.
    Brief History ofNoSQL Databases • 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational database • 2000- Graph database Neo4j is launched • 2004- Google BigTable is launched • 2005- CouchDB is launched • 2007- The research paper on Amazon Dynamo is released • 2008- Facebooks open sources the Cassandra project • 2009- The term NoSQL was reintroduced
  • 5.
    Features of NoSQL •Non Relational • Schema free • Simple API • Distributed
  • 6.
    Types of NoSQLDatabases • Key-value Pair Based • Column-oriented • Graphs based • Document-oriented
  • 7.
    Key Value PairBased: 1. Data is stored in Key/value pair 2. Store data as a hash table 3. Key=Unique,Value=JSON ,BLOB,String etc
  • 8.
    Column Based 1. workson column 2. every column is treated as separatly 3. values of each column are stored contiguously
  • 9.
    Document Oriented 1. Stores& retrieves data as a key value pair 2. But the value is stored as a Document
  • 10.
    Graph Based 1. Storesentities as well as relations between them 2. Entity=node,Relation=Ed ges 3. Each node & edge has unique identifier 4. Traversing is fast
  • 12.
    The CAP Theorem The limitations of distributed databases can be described in the so called the CAP theorem  Consistency: every node always sees the same data at any given instance (i.e., strict consistency)  Availability: the system continues to operate, even if nodes in a cluster crash, or some hardware or software parts are down due to upgrades  Partition Tolerance: the system continues to operate in the presence of network partitions CAP theorem: any distributed database with shared data, can have at most two of the three desirable properties, C, A or P
  • 13.
    The CAP Theorem(Cont’d)  Let us assume two nodes on opposite sides of a network partition:  Availability + Partition Tolerance forfeit Consistency  Consistency + Partition Tolerance entails that one side of the partition must act as if it is unavailable, thus forfeiting Availability  Consistency + Availability is only possible if there is no network partition, thereby forfeiting Partition Tolerance
  • 14.
    Large-Scale Databases  Whencompanies such as Google and Amazon were designing large-scale databases, 24/7 Availability was a key  A few minutes of downtime means lost revenue  When horizontally scaling databases to 1000s of machines, the likelihood of a node or a network failure increases tremendously  Therefore, in order to have strong guarantees on Availability and Partition Tolerance, they had to sacrifice “strict” Consistency (implied by the CAP theorem)
  • 15.
    Trading-Off Consistency  Maintainingconsistency should balance between the strictness of consistency versus availability/scalability  Good-enough consistency depends on your application
  • 16.
    Trading-Off Consistency  Maintainingconsistency should balance between the strictness of consistency versus availability/scalability  Good-enough consistency depends on your application Strict Consistency Generally hard to implement, and is inefficient Loose Consistency Easier to implement, and is efficient
  • 17.
    The BASE Properties The CAP theorem proves that it is impossible to guarantee strict Consistency and Availability while being able to tolerate network partitions  This resulted in databases with relaxed ACID guarantees  In particular, such databases apply the BASE properties:  Basically Available: the system guarantees Availability  Soft-State: the state of the system may change over time  Eventual Consistency: the system will eventually become consistent
  • 18.
    Eventual Consistency  Adatabase is termed as Eventually Consistent if:  All replicas will gradually become consistent in the absence of updates
  • 19.
    Eventual Consistency  Adatabase is termed as Eventually Consistent if:  All replicas will gradually become consistent in the absence of updates Webpage-A Event: Update Webpage-AWebpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A Webpage-A
  • 20.
    Advantages of NoSql •It can serve as the primary data source for online applications. • Handles big data which manages data velocity, variety, volume, and complexity • Excels at distributed database and multi-data center operations • Eliminates the need for a specific caching layer to store data • Offers a flexible schema design which can easily be altered without downtime or service disruption
  • 21.
    • Object-oriented programmingwhich is easy to use and flexible • NoSQL databases don't need a dedicated high- performance server • Support Key Developer Languages and Platforms • Simple to implement than using RDBMS • No Single Point of Failure • Easy Replication
  • 22.
    • It providesfast performance and horizontal scalability. • Can handle structured, semi-structured, and unstructured data with equal effect