NoSQL Databases

Agenda
● History
● Relational databases
● Horizontal vs vertical scaling
● CAP theorem
● Document databases
● Key value databases
● Graph databases
● Column family databases

History
● Non SQL (not traditional tabular database)
● Facebook, Google, Amazon..etc (Big data and real
time applications)
● Horizontal scaling is a problem in relational
database
● Not only SQL (SQL like queries)

Relational Databases :)
● MySQL, Oracle, SQL Server, Postgres..etc
● Carpenter Hammer
● Easy & Popular
● Avoid data duplication but complex queries
● Atomicity (transactions)

Relational Databases :(
● Defined schema, optional attributes (NULLs)
● Use joins to aggregate related data
● Large data VOLUME and high rate of READ
(scalability)

Scaling
source: https://commons.wikimedia.org/wiki/File:They_started_our_car_by_pushing_it_backwards_up_the_hill!_(3854246685).jpg

Scaling
source: http://slashnode.com/the-12-factor-php-app-part-2/

Horizontal (Master-Slave Replication)

CAP Theorem
● Consistency
(all nodes
see the
same data
at the same
time)

CAP Theorem
● Availability
(every request
definitely receives
a response with
success or failure)

CAP Theorem
● Partition tolerance
(the system continues to
operate )

Pick
Only
“TWO”
source: http://www.abramsimon.com/

SQL Vs NoSQL
Relational Databases NoSQL Databases
Vertical and not too many horizontal Horizontal scaling
Consistent Consistent or Eventual consistent
Scalable reads Scalable reads/writes
Transactions on multiple tables Difficult to support transactions
No partition tolerance Partition tolerance
Schema/tables Schemaless
Flexible queries (joins) Limited queries

1) Document Databases
● Simple & popular
● Close to relational database
● MongoDB was a rising star in 2009

1) Document Databases
● Simple & Popular
● Seven Databases in Seven Weeks

JSON Document Vs Row
● Document Vs Row
● Collection Vs Table
● Nesting no joins
● Query in sub-doc
● Duplicate data to
avoid joins
● Schemaless

MongoDB CP
● Consistency
Master-Slave (elections)
● CouchDB is AP

MongoDB Conclusion
● Simple
● Scalable
● Embedded document
● CP
● No joins
● May need to duplicate data
● Writes should go through master node
● Built-in Geo-spatial support

2) Key-Value Databases
● Light & compact
● Hash table (values; text, blob, json, image..etc)
● Reads are fast, writes are faster

Key-Value Databases
● Redis Hash

Redis Complex Data Types
● List

● Blocking List

● Publish-Subscribe

● Set

● Expiry Caching

Redis in Memory
● No instant persistency by default in memory
● Persist periodically by taking snapshots

Redis CP
● Sharding (A,B,C)
● Replication A => A1, B => B1, C => C1
● If master B fails, B1 is the promoted to be a master
● Redis is NOT strong consistent (if both A, A1 fails)
● Riak is AP

Redis Conclusion
● Light & Compact
● Key-value
● Complex data types
● Fast in memory
● Dataset should be less than RAM size
● Transforming data, caching, messaging
● CP but not strongly consistent
● Flexible persistence levels
● Rarely used alone

3) Graph Databases
● Directed graph
● Node has properties
● Relation has properties

Graph Databases (AP)
● Tens of billions of nodes and edges
● No Sharding; replicate all the graph
● High availability over Consistency
● Elect a gold master but writes to
slaves directly
● Community edition is free but full
version is NOT

4) Column-Family Databases
Row family database:
● Many columns
● Seek disk operation
● Low compression
rate

Column-Family Databases
● In RDBMS,
heavy writes,
so store rows
as a bulk
● In columns,
heavy reads,
store columns
together

HBase
● Database for HDFS (RDBMS vs files)
● Widely used with Hadoop
● Scalability! At least five nodes in
production
● Facebook messaging system
infrastructure 2010

HBase Column Family
● Key-Value pairs
(Map of maps)
● Column families
should be defined
but the columns are
schema-less

HBase Versioning
● Versioning
● It became map of map
of map (asc, asc, desc)
● Garbage collector for
expired data
● Everything is binary
● Compression rate

FB Messaging Index Table
● The row keys are user IDs
● Column qualifiers are words that appear in
that user’s messages
● Timestamps are message IDs of messages
that contain that word
● Value is offset of word in message

HBase Vs Cassandra
● HBase on Hadoop, Cassandra is standalone
● HBase community is more active
● HBase is CP, Cassandra is AP
● Cassandra more suitable for high concurrent writes

The right tool for the right job

NoSQL Databases

In this document

More Related Content

What's hot

Similar to NoSQL Databases

More from BADR

Recently uploaded

NoSQL Databases