NoSQL databases

NoSQL databases
STATE OF THE ART
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 1

I - Overview

What is NoSQL?

(typically) NoSQL is …
Non-relational
Distributed
Horizontally scalable
Big data
Performant
Open source

Relational VS NoSQL
Property Relational NoSQL
Performance for high
data volume
Low High
Horizontal scalability Complex, error-prone Simple
Flexibility Low High
Consistency Strong (ACID) Eventual (BASE)
Indexing Multiple columns Single column
Data duplication Not possible Allowed
Standard query
language
Yes No
Data model Single Multiple

II - Models

Main NoSQL database models
Key-value
Document
Column
Graph

Key-value store. Data model
Key 1
Key 2
Key 3
Value 1
Value 2
Value 3
KEYS VALUES

Key-value store. Characteristics
PROS
Frequent reads / writes
Simple data model
Rapid query execution
CONS
Small reads / writes
Simple data model
Poor query capabilities

Key-value store. Implementations

Document store. Data model
Document 1 – ID 1
{
id: ‘1’
name: ‘foo’
attributeX: ‘bar’
}
JSON
Document 2 – ID 2
{
id: ‘2’
name: ‘bar’
}
JSON
Document 3 – ID 3
<element>
<name>A</name>
<content>
<type>B</type>
<color>red</color>
</content>
</element>
XML
Document 4 – ID 4
<element>
<name>B</name>
<value>5</value>
</element>
XML

Document store. Characteristics
Flexible
Object in single document
Rich querying capabilities
PROS CONS
No joins

Document store. Implementations

Column store. Data model
Column Family
Row1
Row2
Row
Key1
Row
Key2
Column1
name1 : value1
timestamp1
Column2
name2 : value2
timestamp2
ColumnN
nameN : valueN
timestampN
Column1
name1 : value1
timestamp1
Column3
name3 : value3
timestamp3
ColumnM
nameM : valueM
timestampM

Column store. Data model
Super Column Family
Row1
Row
Key1
SuperColumnX
…
name1
value1
time
stamp1
nameN
valueN
time
stampN
SuperColumnY
…
name1
value1
time
stamp1
nameM
valueM
time
stamp
M

Column store. Characteristics
Large number of data
(in dynamic columns)
Fast queries on columns
(usually reads)
PROS CONS
Slow queries on rows (usually
writes)

Column store. Implementations

Graph store. Data model
Node1
Node2
Node4
Node3
Node6
Node5
Edge1
Property1
Property2
Property3
Edge2
Edge3
Edge4
Edge5
Edge6

Graph store. Characteristics
Network modelling
Graph-like queries
Rapid deep traversal
Fully ACID
PROS CONS
No sharding
Poor horizontal scalability
Complex data model

Graph store. Implementations

Other NoSQL database models
• Based on few other modelsMultimodel
• Follows OOP principlesObject-oriented
• Mutli-valued attributesMultiValue
• Optimized to managa time series dataTime series
• …And many more

Comparison of NoSQL models *
Model Performance Scalability Flexibility Complexity Functionality
Key-value high high high none variable (none)
Document high variable (high) high low variable (low)
Column high high moderate low minimal
Graph variable variable high high graph theory
Relational variable variable low moderate relational
algebra
* Summary of a presentation by Ben Scofield: https://www.slideshare.net/bscofield/nosql-codemash-2010

Comparison by data size / complexity
Key-value Column Document Graph
Data size
Data complexity

III – Software

Criteria for evaluation
Popularity rank *
Data model
Consistency
Availability
Concurrency
Scalability
Querying
* According to DB-Engines ranking https://db-engines.com/en/ranking (April 2017). Relational DBMSs where discarded.

TOP 4 Systems
MongoDB
Cassandra
Redis
Elasticsearch
1
2
3
4
Document
Column + key-value
In-memory key-value
Document (search engine)

Consistency
MongoDB
• Configurable
• Strong by default
Cassandra
• Configurable
Redis
• Eventual
Elasticsearch
• Configurable
• Consistent, with
options

Availability
MongoDB
• Replicated
Cassandra
• Distributed
Redis
• Replicated
Elasticsearch
• Replicated
High
availability

Concurrency
• Multi-
granularity
locking
(MGL)
MongoDB
• Multiversion
concurrency
control
(MVCC)
Cassandra
• Optimistic
concurrency
control (OCC)
Redis
• Optimistic
concurrency
control (OCC)
Elasticsearch

Scalability
• High (automatic
data sharding)
MongoDB
• High (automatic
addition /
removal of
nodes in cluster)
Cassandra
• Poor
Redis
• High (dynamic
sharding on live
cluster)
Elasticsearch

Querying
• Internal API
(MapReduce)
• Complex query
support
MongoDB
• Internal API, CQL
SQL-like
• Complex query
support
Cassandra
• By key or value
range
• Rapid
• No complex
queries
Redis
• Own query
language (Query
DSL)
• Full text search,
filters
Elasticsearch

IV – Geospatial

GIS (geographic information system)

Idea behind GIS « magic »
Geospatial
data
Geohash API
GIS
support

Available solutions

Solutions
New document format GeoJSON (MongoDB)
GeoMesa + Apache Spark (Hadoop)
CQL extension (Cassandra)
GeoCouch extension (CouchDB)
Fast I/O in-memory geospatial operations (Redis)
Library Neo4j Spatial (Neo4j)

V - Conclusion

NoSQL databases

More Related Content

Similar to NoSQL databases

Recently uploaded

NoSQL databases

Editor's Notes