No sq lv2

NoSQL
Presented By: Nusrat Sharmin

What is NoSQL?
 Stands for Not Only SQL
implying that when designing a software solution or product there are more than one
storage mechanism that could be used based on the needs
 Class of non-relational data storage systems
 Usually do not require fixed table schema that is schema-less nor do they use
concept of joins
 Running well on clusters
 Mostly open-source, distributed, & built for 21st web estates
 Designed to cope up with the scale & agility challenges that face modern
applications
 Built to take advantage of the cheap storage & processing power available today

Why NoSQL Databases?
 Allows developers to develop
without having to convert in-memory structures to relational structures

Why NoSQL Databases?
 Using databases as
 integration points in favor of
encapsulating databases with
applications & integrating using services
The rise of the web as a platform also
created a vital factor change in data
storage
 need to support large volumes of data by
running on clusters
Relational databases were not
designed to run on clusters
 for example the data storage for ERP
application are lot more different than
data storage needs of a Facebook or an
Etsy

Data Models of NoSQL
 A data model is a set of constructs for representing the information
Relational model: tables, columns & rows
Storage model: how the DBMS stores & manipulates the data internally
 A data model is usually independent of the storage model
 Data models for NoSQL systems
 Aggregate Data Models
 key-value
 document
 column-family
 Distribution Models

Aggregate Data Models
 Data as units that have a complex structure
 more structure than just a set of tuples
 example:
 complex record with: simple fields, arrays, records nested inside
 Aggregate in Domain-Driven Design
 a collection of related objects that we treat as unit
 a unit for data manipulation and management of consistency
 Advantages of aggregates:
 easier for application programmers to work with
 easier for database systems to handle operating on cluster

Distribution Models
 Aggregate oriented databases make distribution of data easier
 the distribution mechanism has to move the aggregate that contained all the related
data in the aggregate
 There are two styles of distributing data
 Sharding
 distributes different data across multiple servers
 each server acts as the single source for a subset of data
 Replication
 copies data in multiple servers, so each bit of data can be found in multiple places
 comes in two forms
 Master-slave replication makes one node the authoritative copy that handles writes while slaves
synchronize with the master and may handle reads
 reduces the chance of update conflicts
 Peer-to-peer replication allows writes to any node that nodes coordinate to synchronize their copies of
the data
 avoids loading all writes onto a single server creating a single point of failure

CAP Theorem
 Proposed by Eric Brewer (talk on
Principles of Distributed
Computing July 2000)
 Three properties of a system:
consistency, availability and
partitions
 Can have at most two of these
three properties for any shared-
data system
 To scale out, partition will need.
That leaves either consistency or
availability to choose from
 In almost all cases, choose
availability over consistency
Consistency
Partition
tolerance
Availability

CAP Theorem
 Once a writer has written, all
readers will see that write
 Two kinds of consistency:
 strong consistency – ACID(Atomicity
Consistency Isolation Durability)
 weak consistency – BASE(Basically
Available Soft-state Eventual consistency )
Consistency
Partition
tolerance
Availability

CAP Theorem
 System is available
during software & hardware upgrades
& node failures
 Traditionally, thought of as the
server/process available five 9’s
(99.999 %)
 However, for large node system,
at almost any point in time there’s
a good chance that a node is either
down or there is a network
disruption among the nodes
 Want a system that is resilient in the
face of network disruption
Consistency
Partition
tolerance
Availability

CAP Theorem
 A system can continue to operate
in the presence of a network
partitions
Consistency
Partition
tolerance
Availability

CAP Theorem
 Theorem: Can have at most two of
these properties for any shared-data
system
Consistency
Partition
tolerance
Availability

Types of NoSQL Databases
NoSQL
Key-Value or ‘the
big hash table’
Schema-less
Column-based
Document-based
Graph-based

Key-Value databases
 Simplest NoSQL data stores to use from
an API perspective
 The client can
 either get the value for the key
 put a value for a key
 or delete a key from the data store
 The data stores just store the value is blob
without caring what is inside
 Can store whatever like in the aggregate
 Can only access an aggregate by lookup
based on its key
 Examples: Riak, Redis, Memcached,
Berkely DB, HamsterDB, Amazon
DynamoDB (not open-source), Project
Voldemort & Couchbase

Document databases
 Main concept are – ‘Documents’
 Database stores & retrieves documents
which can be
 XML, JSON, BSON and so on
 Documents are
 Self-describing
 Hierarchical tree data structures that can
consist of maps, collections & scalar values
 Documents are stored similar to each other
but do not have to be exactly the same
 Store documents in the ‘value’
 i.e. part of the key-value store where the values are
examinable
 Example: MongoDB, CouchDB, Terrastore,
OrientDB, RavenDB

Column family stores
 Store data in column families as
rows
that have many columns associated
with a row key
 Column families are group of
related data
that is often accessed together
 Various rows do not have the
same columns
 Columns can be added
to any rows at any time without having
to add it to other rows
 Example: Cassandra, Hbase,
Hypertable, Amazon DynamoDB

Graph stores
Allows to store entities & relationships
between these entities
Entities are also known as nodes
 can be an instance of an object in the
application
Relations are known as edges
Nodes are organized by relationships
 allows you to find interesting patterns
between the nodes
 complex relationship requires complex
join
 Like storing a graph like structure in
RDBMS in relational databases model
the graph beforehand the traversal
need.
 Traversal will change the data
movement

Graph stores
 In database traversing
 the joins or relationships are very fast
 Nodes can have
 different types of relationships
 Value of the graph databases
 derived from the relationships
 Relationships don’t only have a type
but also
 a start node &
 an end node
 Adding new relationship types is easy
 Changing existing nodes &
relationships are similar to
 data migration
 Example : Neo4J, Infinite Graph,
OrientDB or FlockDB

Key/Value Vs. Schema-less
Key/Value
 Pros:
very fast
very scalable
simple model
able to distribute horizontally
 Cons:
many data structures (objects) can’t be
easily modeled as key value pairs
Schema-less
 Pros:
Schema-less data model is richer than
key/value pairs
eventual consistency
many are distributed
still provide excellent performance and
scalability
 Cons:
typically no ACID transactions or joins

SQL Vs. NoSQL
Topics SQL NoSQL
Types One type : SQL Database (with minor
variations)
Many different types: Key/Value,
document database, column stores
database, graph database
Development
History
Developed in 1970s Developed in 2000s
Deal with First wave of data storage applications Limitations of SQL databases, particularly
concerning scale, replication &
unstructured data storage
Examples MySQL, Postgres, Oracle MongoDB, Cassandra, Hbase, Neo4J
Data Storage Model Individual records are stored as rows in
tables with columns much like
spreadsheet. Separate data stored in
separate tables & used joined
operation for querying data
Varies based on database type. For
example, key-value stores function similar
to the SQL but have only two columns:
‘key’ & ‘value’ with more information
sometimes stored in ‘value’ & Document
databases work with table & row model
storing all relevant data in single
document like JSON, XML etc.

Topics SQL NoSQL
Schemas Predefined i.e. structure & datatypes are
fixed
Dynamic. Unlike SQL can store dissimilar data
if necessary.
Scaling Vertically i.e. single sever must be made
increasingly powerful. To spread SQL
database over many servers additional
engineering required
Horizontally i.e. to add capacity, a database
administrator can simply add more
commodity servers & cloud instances
Sharding Manual sharding Auto sharding
Development
Model
Mix of open-source (e.g. Postgres, MySQL)
and closed source (e.g. Oracle)
Open-source
Supports
Transactions
Update can be configured entirely or not
at all
In certain circumstances and at certain levels
(e.g. document level vs. database level)
Data
Manipulation
Specific language using select, insert &
update statements e.g. SELECT fields
FROM table WHERE
Object oriented APIs
Consistency Strong consistency Depends on product. Some provide strong
consistency (e.g. MongoDB) whereas others
eventual consistency (e.g. Cassandra)
SQL Vs. NoSQL

Handling Relational Data
 Lack ability of joins in queries
 Three main techniques for handling relational data
 Multiple queries
 instead of retrieving all data with one query, it’s acceptable to do several queries
 Caching/replication/non-normalized data
 instead of storing only foreign keys, it’s common to store actual foreign values with model’s data
 Nesting data
 put more data in a smaller number of collections so that a single document can contains all the
data that need for a specific task

Benefits of NoSQL
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore identical & fault tolerant) and can
be partitioned
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don’t require a schema
 Can scale up and down
 Relax the data consistency requirement (CAP)

Conclusion
 NoSQL database doesn’t mean
 the demise of RDBMS databases
 improve programmer productivity
 improve data access performance via some combination
 handling larger data volumes
 reducing latency
 improving throughput
 Entering an era of ‘Polyglot Persistence’
 a technique that uses different data storage technologies to handle varying data storage
needs
 can apply across an enterprise or within a single application

References
1. http://www.thoughtworks.com/insights/blog/nosql-databases-overview
2. http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx
3. http://en.wikipedia.org/wiki/NoSQL
4. http://www.mongodb.com/nosql-explained
5. http://nosql-database.org/

No sq lv2

More Related Content

What's hot

Viewers also liked

Similar to No sq lv2

Recently uploaded

No sq lv2

Editor's Notes