Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others.https://foundationdb.com/white-papers/the-cap-theorem/The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?
NO SQL Databases, Big Data and the cloud
The Cloud, Big Data and
Problems with RDBMS
Understand NO SQL
Types of databases
Pros and Cons
Lots of Data
Data is doubles every 18 month
. . . (Infinite list)
With the cloud it is now possible to mount any
size if cluster and conduct any computation in
The one who will make sense of all available
data will rule the world.
Use the cloud to analyze large scale of data.
Lets Talk about data
When we think of data we think of …
Data has many forms
Yet data comes in many forms and shapes
Problems with RDBMS
Does not scale very well
Models data according to the relational model
Is this the best model for all data types?
Complex and Expensive
Require a DBA
Expensive to buy
Not all types of data fit well into the relational
Not all data use cases fit well into the ACID
The relational model does not scale very good
Difficult to distribute
Difficult to replicate
The CAP Theory
During a network partition, a distributed system must choose
either Consistency or Availability.
Large family of databases
No relations enforced
Designed for high scale and distribution
Types of NO SQL DB
Motivation for NO SQL
Large Scale and Distribution
Good fit with the data model
Volume, Velocity and Variety
What Is No Schema
Some data is structured, and some does not.
No SQL databases do not ENFORCE a
schema like RDBMS systems.
You can leverage data structure by creating
indexes and smart queries.
Types of NO SQL Databases
Data is ordered as a key - values pair
Query by key and values
Simple indexes (by partition key)
Azure Table Storage
Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5
Israel 1234 1 2 3
France 2345 4 5 8
Wide column / Column Families
Data is ordered as a key – value groups
Store data by column
A column family is how the
data is stored on the disk
Query by keykey range only
No Indexes (on some dbs)
Example – Cassandra Data Model
Collection of columns
Dictionary of columns
Super Column Family
Dictionary of Column Families
Data is ordered in elements and relations.
Query by relations
Supports complicated mathematical graph
StarDog (used for sematic web)
RDF and OWL
Subject - Predicate – Object
RDF (Resource Description Framework)
Defines some extra structure to triples.
Example: "rdf:type“ is used to say that things are of certain types.
Defines some classes which represent the concept of subjects,
objects, predicates etc.
Enables making statements about classes of thing, and types of
Adds semantics to the schema.
Expressed in triples.
Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".
There is no one NO SQL solution for all
There are over than 150 possible offerings…
Replication and Sharding
No SQL databases can span over a large
Copy the data to multiple servers
Usually each data element is copied 3 times
One master two slaves
Result: High Availability
Split the data between servers
Horizontal partitioning of the data
Result: Horizontal scale
Replication and Sharding can be done together
The Cloud and NO SQL
All Cloud Providers have NO SQL solutions
Google Big Table
NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for
Cassandra on Google Compute engine