An Intro to NoSQL Databases

3
Source :- http://wearesocial.net/blog/2015/01/digital-social-mobile-worldwide-2015/

4
3 V
o VELOCITY
o VOLUME
o VARIETY

6 6
RELATIONAL DATABASE MANAGEMENT
SYSTEM
Relational Model - data represented in terms of tuples (rows).
Key Concepts
o Table - collection of data elements organized in terms of rows and
columns
o Field - column in a table designed to maintain specific information
about every record in the table
o Record - horizontal entity represents set of related data
o Column - vertical entity containing values of particular type

7 7
SYSTEM INTEGRITY RULES
o Entity Integrity
o Domain Integrity
o Referential integrity
o User-Defined Integrity

8 8
SYSTEM
Pros Cons
Support simple data structure Poor representation of real
world
Limit redundancy Difficult to represent hierarchies
Better integrity Difficult represent complex data
types
Offer logical database
independence
Support one off queries using
SQL
Better backup & recovery
procedure

9 9
RDBMS VS NOSQL
RDBMS NoSQL
Scale up Scale out
Handle Structured Data
Semi-Structured data /
Unstructured data
Atomic transaction
Eventual consistency
impedance mismatch Object model
Strict schema Schema-less

10 10
DISTRIBUTED SYSTEMS
Distributed database system consists of loosely-
coupled sites that share no physical components.
Homogeneous DDBMS
All sites have identical software & aware of each other.
work corporately in processing user requests
Heterogeneous DDBMS
Different sites may use different schema and software.
provide limited facilities for cooperation in transaction processing

11 11
DISTRIBUTED SYSTEMS
Sharding
Split the data among multiple machines while ensuring that data is
always accessed from the correct place.
Replication
Multiple instances of the Database which each mirror all the data
of each other.
75GB
25GB 25GB 25GB
75GB
75GB 75GB 75GB

12 12
WHY NOSQL
The global NoSQL market is forecast to reach
$3.4 Billion in 2020,
representing a compound
annual growth rate (CAGR) of 21%
for the period
2015 – 2020.
http://www.technologies.org/?p=102
http://www.marketresearchmedia.com/?p=568

18 18
SCALABILITY AND PERFORMANCE

19 19
WHAT IS ACID?
o Atomicity
A transaction is all or nothing
o Consistency
Only valid data is written to the database
o Isolation
Pretend all transactions are happening serially and the data is
correct
o Durability
What you write is what you get

20 20
CAP THEOREM
A
PC
Availability :
Each client can always read
and write
Partition Tolerance :
The system works well despite
physical network partitions.
Consistency :
All clients always have the
same view of the data.
You can have at most two
of these properties for any
shared Data Systems.

21 21
AN ALTERNATIVE TO ACID IS BASE
o Basic Availability
System seems to work all the time
o Soft-State
It doesn't have to be consistent all the time
o Eventual Consistency
Becomes consistent at some later time

22 22
NOSQL DATABASE CATEGORIES
NoSQL
Database
Categories
Key Value
Store
Document
Store
Wide
Column
Store
Graph
Databases

23 23
KEY VALUE STORE - OVERVIEW
o Most basic type of NoSQL Database and basis for other three
o Schema-free
o Store data as Key-Value pair
o Key-Value stores can be used as collections, dictionaries,
associative arrays etc.
Example DBs: Redis, Project Voldemort, Amazon DyanmoDB
Key: Value Row_Id:100
First_Name: Saman
Last_Name: Silva
Address: 123, Galle Rd,
Beruwala
Last_Order: 2001

24 24
WIDE COLUMN STORE - OVERVIEW
o Stored data in a columnar format
o Semi-Schematic
o Allow key-value pairs to be stored
o Each key(Super Column) is associate with multiple attributes
o Stores data in column specific file
Example DBs: Apache Hbase, Cassendra, Big Table, Hadoop
Super_Column:Value
Sub_Coulmn->Key:Value
Sub_Coulmn->Key:Value
Super_Column:Name
First_Name:Saman
Last_Name:Silva
Super_Column:Address
No:125
Road:Galle Rd
City:Beruwala

25 25
DOCUMENT STORE - OVERVIEW
o Everything is stored in a Document
o Schema-free
o Data is stored inside documents as JSON or BSON formats
o Document is a Key-Value collection
Example DBs: MongoDb, CouchDB
Database: Customers Database: Orders
Document_Id:100
First_Name:Saman
Last_Name:Silva
Address:
Order:
Number: 125
Road: Galle Rd
City: Beruwala
Most_Recent:
2001
Document_Id:2001
Price: Rs 450
Item1: 1001
Item2: 1002
Document_Id:2002
Price: Rs 750
Item1: 1003
Item2: 1001

26 26
GRAPH DATABASE - OVERVIEW
o Collection of nodes & edges
o Node represent an entity & an edge represent a connection
between two nodes
o Stores data in a Graph
o Within nodes data stored as Key : Value pairs
o Mostly use in Social network applications such as Facebook,
Twitter and etc.
o Example DBs: Neo4j, Titan
Nodes & EDGES
With Key : Value
Name:
Shelan
Name:
Hansa
WorkPlace:
Virtusa
NODE
WORKS_IN
WORKS_IN
IS_FRIEND_OF
EDGE

27 27
KEY VALUE STORE
o Most Basic NoSQL Database Type
o Storing data as a dictionary or hash
o Dictionaries contain collection of objects or records
o Different than RDBMS

28 28
KEY VALUE STORE
Database
Customer Order
Row_Id:100
First_Name: Saman
Last_Name: Silva
Address: 123, Galle Rd,
Beruwala
Last_Order: 2001
Row_Id:101
First_Name: Nuwan
Last_Name: Perera
Address: 1/2, Galle Rd, Kalutara
Last_Order: 2002
Row_Id: 2001
Price: Rs 450
Item1: 1001
Item2: 1003
Item3: 1005
Row_Id:2002
Price: Rs 750
Item1: 1001
Item2: 1002
Item3: 1003

29 29
WHEN TO USE KEY VALUE STORE
o Caching: Quickly storing and retrieving
o Queuing: Some K/V stores support lists,
sets, queues and more
o Distributing information and tasks
o Keeping live information

30 30
ADVANTAGES OF KEY VALUE STORE
o Support horizontal scaling
o Highly Performance
o Lack of Schema/Schema-less Data store
o Different than RDBMS
o Flexibility and more closely follow modern concepts like OOP
o Provide basic K/V concept for other major 3 NoSQL DB types

31 31
REDIS – KEY STORE VALUE DATABASE
o Open Source, Advanced Key-Value store
o 3 main specialties
o Holds its database entirely in memory
o Has a relatively rich set of data types
o Can replicate data to any number of slaves
o 2 types of Persistence
o RDB Persistence
o AOF Persistence
o 5 Data Types
http://www.redis.io
http://redis.io/download

32 32
REDIS FEATURES
o Exceptionally Fast
o Support Rich data types
o Operations are Atomic
o MultiUtility Tool

33 33
REDIS DATA TYPES
“This is a String Value”
name
customer:1
address
Hasangi Hasangi Hansa HijasRajith
0
Hansa
1
Hasangi
2 Hijas
4
Shelan
3 Rajith
Hasangi Hansa HijasRajith
Shelan
Beruwala
customer:2
name
address
Rajith
Homagama
Hashes
Lists
Sets
Sorted Sets
String

34 34
REDIS - STRING
“This is a String Value”
>SET stringvalue “This is a String Value”
>OK
>GET stringvalue
>“This is a String Value”

35 35
REDIS - LISTS
>LPUSH customer Hansa
>(integer)1
>LPUSH customer Hasangi
>(integer)2
>RPUSH customer Rajith
>(integer)3
>LPUSH customer Hasangi
>(integer)4
>RPUSH customer Hijas
>(integer)5
>LRANGE customer 0 4
1) “Hasangi”
2) “Hasangi”
3) “Hansa”
4) “Rajith”
5) “Hijas”
Hasangi Hasangi Hansa HijasRajith

36 36
REDIS - SETS
>SADD customer Hansa
>(integer)1
>SADD customer Hasangi
>(integer)1
>SADD customer Rajith
>(integer)1
>SADD customer Hasangi
>(integer)0
>SADD customer Hijas
>(integer)1
>SMEMBERS customer
1) “Hijas”
2) “Rajith”
3) “Hasangi”
4) “Hansa”
HasangiHansa HijasRajith

37 37
REDIS – SORTED SETS
>ZADD customer 1 Hasangi
>(integer)1
>ZADD customer 3 Rajith
>(integer)1
>ZADD customer 4 Shelan
>(integer)1
>ZADD customer 2 Hijas
>(integer)1
>ZADD customer 0 Hansa
>(integer)1
>ZRANGE customer 0 4
1) “Hansa”
2) “Hasangi”
3) “Hijas”
4) “Rajith”
5) “Shelan”
0 Hansa 1 Hasangi 2 Hijas 4 Shelan3 Rajith

38 38
REDIS - HASHES
>HMSET customer:1 name “Shelan” address “Beruwala”
>OK
>HMSET customer:2 name “Rajith” address “Homagama”
>OK
>HGETALL customer:1
1) “name”
2) “Shelan”
3) “address”
4) “Beruwala”
name
customer:1
address
Shelan
Beruwala
customer:2
name
address
Rajith
Homagama
>HGETALL customer:2
1) “name”
2) “Rajith”
3) “address”
4) “Homagama”

39 39
REDIS – PUBS/SUBS
o Publish and Subscribe to message Channels
o Publisher/s can Subscribe to a channel/s
Publisher
Subscriber SubscriberSubscriber
“RedisChat” ChannelHi, I’m RedisChat
Publisher
Publisher
I’m Another RedisChat
Publisher

40 40
REDIS – TRANSACTIONS
o Execute group of command in a single step
o Has 2 properties
o All commands in a transaction are sequentially executed as a
single isolated operation
o Redis transaction is also atomic
>MULTI
>INCRBY accountA -50
>QUEUED
>INCRBY accountB +50
>QUEUED
>EXEC
>(integer)50
>(integer)150
>SET accountA 100
>OK
>SET accountB 100
>OK
>GET accountA
>”100”
>GET accountB
>”100”
>GET accountA
>”50”
>GET accountB
>”150”

41 41
REDIS – DISK PERSISTENCE
o Point-in-time snapshot of all
dataset
o Compact, ideal for regular
backup/archive
o Multiple save-points available
o Faster restarts compared to
AOF
o Very good for disaster
recovery
o Writes every command like a
tape
o Gets re-written when it gets too
big
o Can be easily parsed & edited
o AOF files bigger than RDB files
o Slower than RDB
RDB Persistence AOF Persistence

42 42
REDIS – REPLICATION
o Use asynchronous replication
o A master can have multiple slaves
o Slaves accept connection from other slaves
o Non-blocking on both master and slave side
o Redis Sentinel
Redis Master Redis Slave
Sentinel
Redis Master
Redis Slave Redis Slave
Redis SlaveRedis Slave
• Automatic Failover
• Monitoring
• Notification
• Configuration Provider
High AvailabilityScalability

43 43
WIDE-COLUMN STORE DATABASES
o Stores data as sections of columns of data rather than rows of data
o Ability to hold very large numbers of dynamic columns
o Benefit of storing data in columns, is fast search/ access and data
aggregation
o Advantages for data warehouses, customer relationship
management (CRM) systems.
o A wide variety of companies and organizations use Hadoop for
both research and production.

44 44
HADOOP
o Its not a software. Its a framework of tools.
o Objective is to running applications on big data.
o Open source set of tools distributed under Apache license.
o A distributed file system (HDFS)
o An environment to run Map-Reduce tasks – typically Batch
mode
o NOSQL Database – HBase
o Real Time Query Engine (Impala)

45 45
HADOOP’S APPROACH
Big Data is broken
into pieces
Computation
Computation
Computation
Computation
Combined Result

46 46
HADOOP ARCHITECTURE
Map Reduce
File System
(HDFS)
Projects
(Set of Hadoop Tools)
Ambari Cassandra HBase Mahout Spark ZooKeeper

47 47
HADOOP DISTRIBUTED MODEL
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s

48 48
HADOOP DISTRIBUTED MODEL
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s

49 49
HADOOP DATA ACCESS
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
Application

50 50
HADOOP DATA FAULT TOLERANCE
Commodity Hardware
Task
Tracker
Data
Node
Task
Tracker
Data
Node
Task
Tracker
Task
Tracker
Data
Node
Slave Computers
Task
Tracker
Data
Node
Job
Tracker
Name
Node
Master Computer/s
Application
Data
Node
Data
Node
Data
Node
Task
Tracker

51 51
HOW HADOOP SOLVES BIG DATA
CHALLENGES OF PROGRAMMERS
File location
Manage failures
Break computations into pieces
Scaling
Focus on scale free programs

52 52
SCALABILITY
ProcessingSpeed
No of Computers
… …
Master
Slave
Cost

53 53
HBASE
An open-source, distributed, versioned, non-relational database
modeled after Google's Big Table.
Features
o Linear and modular scalability.
o Strictly consistent reads and writes.
o Automatic and configurable sharding of tables
o Automatic failover support between Region Servers.
o Convenient base classes for backing Hadoop MapReduce jobs
with Apache HBase tables.
o Easy to use Java API for client access.
o Block cache and Bloom Filters for real-time queries.

54 54
What is a Graph ?
F
o
ll
o
w
s
Shelan
Hansa
Hijaz
Follows
F
o
ll
o
w
s
Hasangi Follows
@Hansa
#nosql
GRAPH DATABASES

55 55
What is a Graph Database?
Database that uses graph structures to represent & store data.
Key-Features
o Excellent in dealing with relationships
o High Performance
o Flexible
o Query language support
Rajith
Name:Rajith
City:Kottawa
Married:false
Works for
Since:2014/11/24
Virtusa
Name:Virtusa
City:Colombo
GRAPH DATABASES

56 56
GRAPH DATABASES
Graph databases vs Relational databases
Relational Graph
Tables Nodes
Schema with nullables No schema
Relationships with foreign
keys
Relation is first class citizen
Related data fetch with joins
Related data fetched with
pattern

57 57
NEO4J
ACID
Graph
DB JAVA
Enterprise
Features
Billions
of
Entities
Rest API

58 58
NEO4J
What is Cypher?
Graph
Query
Language
Declarative Pattern
matching
Clauses

59 59
NEO4J
Cypher Basic Syntax
(a) - [ r ] - > (b)
a b
r
nodesrelation

60 60
NEO4J - CYPHER
Node with properties
( a { name : “rajith”, born : 1989 } )
Relationships with properties
( a ) - [:WORKED_IN { roles:[“ASE”] } ] - > ( b )
Labels
( a : Person { name: “rajith”} )

61 61
NEO4J - CYPHER
Quering with Cypher
MATCH ( a ) - - > ( b )
RETURN a, b;
MATCH ( a ) – [ r ] – > ( b )
RETURN a.name, type ( r );
Using Clauses
MATCH ( a : Person)
WHERE a.name = “rajith”
RETURN a;

62 62
DOCUMENT STORE
o A collection of documents
o Data in this model is stored inside documents.
o A document is a key value collection where the key allows
access to its value.
o Documents are not typically forced to have a schema and
therefore are flexible and easy to change.
o Documents are stored into collections in order to group
different kinds of data.
o Documents can contain many different key-value pairs, or key-
array pairs, or even nested documents.
o Usually use JSON (BSON) like interchange model then
application logic can be write easily.

63 63
WHAT IS MONGODB ?
o Scalable High-Performance Open-source, Document-
orientated database written in C++.
o Built for Speed
o Rich Document based queries for Easy readability
o Full Index Support for High Performance
o Replication and Failover for High Availability
o Auto Sharding for Easy Scalability.
o Map / Reduce for Aggregation.

64 64
KEYWORDS COMPARISON
RDBMS MongoDB
Database Database
Table,
View
Collection
Row Document (JSON,
BSON)
Column Field
Index Index
Join Embedded
Document
Foreign
Key
Reference
Partition Shard
> db.user.findOne({age:39})
{
"_id" :
ObjectId("5114e0bd42…"),
"first" : "John",
"last" : "Doe",
"age" : 39,
"interests" : [
"Reading",
"Mountain Biking ]
"favorites": {
"color": "Blue",
"sport": "Soccer"}
}

65 65
MONGODB ADVANCED FEATURES
o Replication
o Indexing
o Aggregation
o Sharding
o Capped Collections

66 66
REPLICATION
o Replication is the process of synchronizing data across
multiple servers
o Replication provides redundancy and increases data
availability
Primary
DB
Secondary
DB
Arbiter
DB
Minimum Replica set in MongoDB
REPLICA SET

68 68
INDEXING
o Indexes support the efficient execution of queries in MongoDB
o MongoDB can use the index to limit the number of documents
it must inspect
o Indexes use a B-tree data structure.
o Using “ensureIndex” method can create index.
>db.COLLECtION_NAME.ensureIndex({KEY:1})
o Key is the name of field on which want to create index.
o 1 is for ascending order.
o -1 is for descending order.

69 69
WITH OUT INDEXING
Client says
Server have to read
every document to find
the result.
Document Storage

71 71
INDEX TYPES
o Default _id Index
o Single Field Index
o Compound Index
o Multikey Index
o Geo Index
o Text Index
o Hashed Index

72 72
AGGREGATIONS
Aggregations are operations that process data records and return
computed results.
MongoDB provides a rich set of aggregation operations.
Aggregation concepts
o Aggregation Pipelines
o Map-Reduce
o Single Purpose Aggregation Operation

73 73
AGGREGATION PIPELINES
The pipeline provides efficient data aggregation using native
operations within MongoDB, and is the preferred method for data
aggregation in MongoDB

74 74
MAP-REDUCE
MongoDB also provides map-reduce operations to perform
aggregation

75 75
SINGLE PURPOSE AGGREGATION
OPERATION
MongoDB provides special purpose database commands.
All of operations aggregate documents from a single collection.
Common aggregation
operations are:
o returning a count of
matching documents
o returning the distinct values
for a field
o grouping data based on
the values of a field

76 76
SHARDING
Sharding is a method for storing data across multiple machines.
MongoDB uses sharding to support deployments with very large data
sets and high throughput operations.

77 77
CAPPED COLLECTIONS
o It is fixed-size circular collections that follow the insertion order
to support high performance for create, read and delete
operations.
o Capped collections restrict updates to the documents if the
update results in increased document size.
o Capped collections are best for storing log information, cache
data or any other high volume data.

78 78
NOSQL DATABASE CATEGORIES
NoSQL
Database
Categories
Key Value
Store
Document
Store
Wide
Column
Store
Graph
Databases

79 79
NOSQL DATABASES SUMMARY
Name HBase MongoDB Neo4j Redis
Database
model
Wide column
store
Document store Graph DBMS Key-value store
Initial release 2008 2009 2007 2009
License Open Source Open Source Open Source Open Source
DBaaS no no no no
Implementation
language
Java C++ Java C
Server
operating
systems
• Linux
• Unix
• Windows
• Linux
• OS X
• Solaris
• Windows
• Linux
• OS X
• Windows
• BSD
• Linux
• OS X
• Windows
Data scheme schema-free schema-free schema-free schema-free
Source :-
http://db-engines.com/en/system/HBase%3BMongoDB%3BNeo4j%3BRedis

80 80
2nd indexes no yes yes no
SQL no no no no
APIs and other
access
methods
Java API
RESTful HTTP
Thrift
proprietary protocol
using JSON
Cypher query
language
Java API
RESTful HTTP
proprietary
protocol
Supported
programming
languages
C
C#
C++
Groovy
Java
PHP
Python
Scala
Actionscript, C, C#,
C++, Clojure,
ColdFusion, D, Dart,
Delphi, Erlang, Go,
Groovy, Haskell, Java,
JavaScript, Lisp, Lua,
MatLab, Perl, PHP,
PowerShell, Prolog,
Python, R, Ruby,
Scala, Smalltalk
.Net
Clojure
Go
Groovy
Java
JavaScript
Perl
PHP
Python
Ruby
Scala
C, C#, C++,
Clojure, Dart
Erlang, Go,
Haskell, Java
JavaScript,
Lisp, Lua
Objective-C,
Perl, PHP,
Python, Ruby,
Scala,
Smalltalk, Tcl
Source :-

81 81
Triggers yes no yes no
Partitioning
methods
Sharding Sharding none Sharding
Replication
methods
selectable
replication factor
Master-slave
replication
Master-slave
replication
Master-slave
replication
MapReduce yes yes no no
Consistency
concepts
• Immediate
• Consistency
• Eventual
• Consistency
• Immediate
• Consistency
• Eventual
• Consistency
configurable
in High
Availability
• Cluster setup
Immediate
Consistency
• Eventual
• Consistency
Source :-

82 82
Foreign keys no no yes no
Transaction
concepts
no no ACID
optimistic
locking
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory
capabilities
yes
User concepts
Access Control
Lists (ACL)
Access rights for
users and roles
no
very simple
password-based
access control
Source :-

An Intro to NoSQL Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to An Intro to NoSQL Databases

Similar to An Intro to NoSQL Databases (20)

Recently uploaded

Recently uploaded (20)

An Intro to NoSQL Databases

Editor's Notes