NoSQL databases
STATE OF THE ART
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 1
I - Overview
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 2
What is NoSQL?
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 3
(typically) NoSQL is …
Non-relational
Distributed
Horizontally scalable
Big data
Performant
Open source
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 4
Relational VS NoSQL
Property Relational NoSQL
Performance for high
data volume
Low High
Horizontal scalability Complex, error-prone Simple
Flexibility Low High
Consistency Strong (ACID) Eventual (BASE)
Indexing Multiple columns Single column
Data duplication Not possible Allowed
Standard query
language
Yes No
Data model Single Multiple
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 5
II - Models
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 6
Main NoSQL database models
Key-value
Document
Column
Graph
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 7
Key-value store. Data model
Key 1
Key 2
Key 3
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 8
Value 1
Value 2
Value 3
KEYS VALUES
Key-value store. Characteristics
PROS
Frequent reads / writes
Simple data model
Rapid query execution
CONS
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 9
Small reads / writes
Simple data model
Poor query capabilities
Key-value store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 10
Document store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 11
Document 1 – ID 1
{
id: ‘1’
name: ‘foo’
attributeX: ‘bar’
}
JSON
Document 2 – ID 2
{
id: ‘2’
name: ‘bar’
}
JSON
Document 3 – ID 3
<element>
<name>A</name>
<content>
<type>B</type>
<color>red</color>
</content>
</element>
XML
Document 4 – ID 4
<element>
<name>B</name>
<value>5</value>
</element>
XML
Document store. Characteristics
Flexible
Object in single document
Rich querying capabilities
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 12
PROS CONS
No joins
Document store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 13
Column store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 14
Column Family
Row1
Row2
Row
Key1
Row
Key2
Column1
name1 : value1
timestamp1
Column2
name2 : value2
timestamp2
ColumnN
nameN : valueN
timestampN
Column1
name1 : value1
timestamp1
Column3
name3 : value3
timestamp3
ColumnM
nameM : valueM
timestampM
Column store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 15
Super Column Family
Row1
Row
Key1
SuperColumnX
…
name1
value1
time
stamp1
nameN
valueN
time
stampN
SuperColumnY
…
name1
value1
time
stamp1
nameM
valueM
time
stamp
M
Column store. Characteristics
Large number of data
(in dynamic columns)
Fast queries on columns
(usually reads)
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 16
PROS CONS
Slow queries on rows (usually
writes)
Column store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 17
Graph store. Data model
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 18
Node1
Node2
Node4
Node3
Node6
Node5
Edge1
Property1
Property2
Property3
Edge2
Edge3
Edge4
Edge5
Edge6
Graph store. Characteristics
Network modelling
Graph-like queries
Rapid deep traversal
Fully ACID
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 19
PROS CONS
No sharding
Poor horizontal scalability
Complex data model
Graph store. Implementations
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 20
Other NoSQL database models
• Based on few other modelsMultimodel
• Follows OOP principlesObject-oriented
• Mutli-valued attributesMultiValue
• Optimized to managa time series dataTime series
• …And many more
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 21
Comparison of NoSQL models *
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 22
Model Performance Scalability Flexibility Complexity Functionality
Key-value high high high none variable (none)
Document high variable (high) high low variable (low)
Column high high moderate low minimal
Graph variable variable high high graph theory
Relational variable variable low moderate relational
algebra
* Summary of a presentation by Ben Scofield: https://www.slideshare.net/bscofield/nosql-codemash-2010
Comparison by data size / complexity
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 23
Key-value Column Document Graph
Data size
Data complexity
III – Software
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 24
Criteria for evaluation
Popularity rank *
Data model
Consistency
Availability
Concurrency
Scalability
Querying
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 25
* According to DB-Engines ranking https://db-engines.com/en/ranking (April 2017). Relational DBMSs where discarded.
TOP 4 Systems
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 26
MongoDB
Cassandra
Redis
Elasticsearch
1
2
3
4
Document
Column + key-value
In-memory key-value
Document (search engine)
Consistency
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 27
MongoDB
• Configurable
• Strong by default
Cassandra
• Configurable
Redis
• Eventual
Elasticsearch
• Configurable
• Consistent, with
options
Availability
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 28
MongoDB
• Replicated
Cassandra
• Distributed
Redis
• Replicated
Elasticsearch
• Replicated
High
availability
Concurrency
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 29
• Multi-
granularity
locking
(MGL)
MongoDB
• Multiversion
concurrency
control
(MVCC)
Cassandra
• Optimistic
concurrency
control (OCC)
Redis
• Optimistic
concurrency
control (OCC)
Elasticsearch
Scalability
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 30
• High (automatic
data sharding)
MongoDB
• High (automatic
addition /
removal of
nodes in cluster)
Cassandra
• Poor
Redis
• High (dynamic
sharding on live
cluster)
Elasticsearch
Querying
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 31
• Internal API
(MapReduce)
• Complex query
support
MongoDB
• Internal API, CQL
SQL-like
• Complex query
support
Cassandra
• By key or value
range
• Rapid
• No complex
queries
Redis
• Own query
language (Query
DSL)
• Full text search,
filters
Elasticsearch
IV – Geospatial
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 32
GIS (geographic information system)
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 33
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 34
Idea behind GIS « magic »
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 35
Geospatial
data
Geohash API
GIS
support
Available solutions
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 36
Solutions
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 37
New document format GeoJSON (MongoDB)
GeoMesa + Apache Spark (Hadoop)
CQL extension (Cassandra)
GeoCouch extension (CouchDB)
Fast I/O in-memory geospatial operations (Redis)
Library Neo4j Spatial (Neo4j)
V - Conclusion
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 38
4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 39

NoSQL databases

  • 1.
    NoSQL databases STATE OFTHE ART 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 1
  • 2.
    I - Overview 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 2
  • 3.
    What is NoSQL? 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 3
  • 4.
    (typically) NoSQL is… Non-relational Distributed Horizontally scalable Big data Performant Open source 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 4
  • 5.
    Relational VS NoSQL PropertyRelational NoSQL Performance for high data volume Low High Horizontal scalability Complex, error-prone Simple Flexibility Low High Consistency Strong (ACID) Eventual (BASE) Indexing Multiple columns Single column Data duplication Not possible Allowed Standard query language Yes No Data model Single Multiple 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 5
  • 6.
    II - Models 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 6
  • 7.
    Main NoSQL databasemodels Key-value Document Column Graph 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 7
  • 8.
    Key-value store. Datamodel Key 1 Key 2 Key 3 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 8 Value 1 Value 2 Value 3 KEYS VALUES
  • 9.
    Key-value store. Characteristics PROS Frequentreads / writes Simple data model Rapid query execution CONS 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 9 Small reads / writes Simple data model Poor query capabilities
  • 10.
    Key-value store. Implementations 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 10
  • 11.
    Document store. Datamodel 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 11 Document 1 – ID 1 { id: ‘1’ name: ‘foo’ attributeX: ‘bar’ } JSON Document 2 – ID 2 { id: ‘2’ name: ‘bar’ } JSON Document 3 – ID 3 <element> <name>A</name> <content> <type>B</type> <color>red</color> </content> </element> XML Document 4 – ID 4 <element> <name>B</name> <value>5</value> </element> XML
  • 12.
    Document store. Characteristics Flexible Objectin single document Rich querying capabilities 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 12 PROS CONS No joins
  • 13.
    Document store. Implementations 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 13
  • 14.
    Column store. Datamodel 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 14 Column Family Row1 Row2 Row Key1 Row Key2 Column1 name1 : value1 timestamp1 Column2 name2 : value2 timestamp2 ColumnN nameN : valueN timestampN Column1 name1 : value1 timestamp1 Column3 name3 : value3 timestamp3 ColumnM nameM : valueM timestampM
  • 15.
    Column store. Datamodel 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 15 Super Column Family Row1 Row Key1 SuperColumnX … name1 value1 time stamp1 nameN valueN time stampN SuperColumnY … name1 value1 time stamp1 nameM valueM time stamp M
  • 16.
    Column store. Characteristics Largenumber of data (in dynamic columns) Fast queries on columns (usually reads) 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 16 PROS CONS Slow queries on rows (usually writes)
  • 17.
    Column store. Implementations 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 17
  • 18.
    Graph store. Datamodel 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 18 Node1 Node2 Node4 Node3 Node6 Node5 Edge1 Property1 Property2 Property3 Edge2 Edge3 Edge4 Edge5 Edge6
  • 19.
    Graph store. Characteristics Networkmodelling Graph-like queries Rapid deep traversal Fully ACID 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 19 PROS CONS No sharding Poor horizontal scalability Complex data model
  • 20.
    Graph store. Implementations 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 20
  • 21.
    Other NoSQL databasemodels • Based on few other modelsMultimodel • Follows OOP principlesObject-oriented • Mutli-valued attributesMultiValue • Optimized to managa time series dataTime series • …And many more 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 21
  • 22.
    Comparison of NoSQLmodels * 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 22 Model Performance Scalability Flexibility Complexity Functionality Key-value high high high none variable (none) Document high variable (high) high low variable (low) Column high high moderate low minimal Graph variable variable high high graph theory Relational variable variable low moderate relational algebra * Summary of a presentation by Ben Scofield: https://www.slideshare.net/bscofield/nosql-codemash-2010
  • 23.
    Comparison by datasize / complexity 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 23 Key-value Column Document Graph Data size Data complexity
  • 24.
    III – Software 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 24
  • 25.
    Criteria for evaluation Popularityrank * Data model Consistency Availability Concurrency Scalability Querying 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 25 * According to DB-Engines ranking https://db-engines.com/en/ranking (April 2017). Relational DBMSs where discarded.
  • 26.
    TOP 4 Systems 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 26 MongoDB Cassandra Redis Elasticsearch 1 2 3 4 Document Column + key-value In-memory key-value Document (search engine)
  • 27.
    Consistency 4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 27 MongoDB • Configurable • Strong by default Cassandra • Configurable Redis • Eventual Elasticsearch • Configurable • Consistent, with options
  • 28.
    Availability 4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 28 MongoDB • Replicated Cassandra • Distributed Redis • Replicated Elasticsearch • Replicated High availability
  • 29.
    Concurrency 4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 29 • Multi- granularity locking (MGL) MongoDB • Multiversion concurrency control (MVCC) Cassandra • Optimistic concurrency control (OCC) Redis • Optimistic concurrency control (OCC) Elasticsearch
  • 30.
    Scalability 4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 30 • High (automatic data sharding) MongoDB • High (automatic addition / removal of nodes in cluster) Cassandra • Poor Redis • High (dynamic sharding on live cluster) Elasticsearch
  • 31.
    Querying 4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 31 • Internal API (MapReduce) • Complex query support MongoDB • Internal API, CQL SQL-like • Complex query support Cassandra • By key or value range • Rapid • No complex queries Redis • Own query language (Query DSL) • Full text search, filters Elasticsearch
  • 32.
    IV – Geospatial 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 32
  • 33.
    GIS (geographic informationsystem) 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 33
  • 34.
    4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 34
  • 35.
    Idea behind GIS« magic » 4/14/2017 BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 35 Geospatial data Geohash API GIS support
  • 36.
    Available solutions 4/14/2017 BYMARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 36
  • 37.
    Solutions 4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 37 New document format GeoJSON (MongoDB) GeoMesa + Apache Spark (Hadoop) CQL extension (Cassandra) GeoCouch extension (CouchDB) Fast I/O in-memory geospatial operations (Redis) Library Neo4j Spatial (Neo4j)
  • 38.
    V - Conclusion 4/14/2017BY MARKIYAN RIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 38
  • 39.
    4/14/2017 BY MARKIYANRIZUN, UNIVERSITÉ LILLE 1, SOFTEAM, EMAIL: MRIZUN@SOFTEAM.FR 39

Editor's Notes

  • #3 Quick look on NoSQL
  • #4 NoSQL = Not Only SQL … OK, but what kind of properties does it have?
  • #5 Normally, with quite a few exceptions, NoSQL systems should satisfy following list. All of them are non-relational, and one could argue that this is the main difference. Distributed, meaning working on clusters of machines. Therefore, they should be horizontally scalable. This means that one could easily add new node to cluster without time-consuming process of restructuring database. NoSQL systems are mostly designed for storing massive volumes of data and keeping high level of performance. Usually, they are open source.
  • #6 NoSQL is designed to work with big data and still show high levels of performance. On the contrary, relational DBs work well until they are dealing with large amounts of data. Opposed to NoSQL, relational DBs require hard work in order to scale them horizontally. Flexibility here means ease of INSERT / UPDATE operations. For relational case data must be in predefined form, for NoSQL – arbitrary form. Relational are always ACID (atomicity, consistency, isolation, durability), however NoSQL proposes the concept of eventual consistency. BASE (Basically Available, Soft state, Eventual consistency) There are many differences, but two very important are standard query language and single data model.
  • #7 As NoSQL may be presented with a number of varying models, we have to review them.
  • #8 These are 4 main models of NoSQL databases that we are going to study in details. First, KV is just a dictionary (collection of kv pairs). Next we have Document, which is a collection of different documents such as JSON, XML and others. Column DB consists column family that contains varying in name and size column collections – we will see it later. And graph model – this one focuses on connections between entities.
  • #9 The data model is straightforward: a collection of kv pairs, where each key has only one corresponding value. Keys are used as indexes and values may contain any data
  • #10 Rapid query execution because of the simple model and keys as indexes.
  • #12 Main elements of these databases are documents, which are hierarchical tree data structures. Each document is represented by an indexed key (unique identifier that may be a string, URI or path). Information about given object is stored in a single document, unlike it is organised in relational databases (scattered over different tables). Documents may be of different types (JSON, XML, etc.).
  • #13 they may offer an API that would enable users to query documents based on their internal structure and content No joins = instead one would have to collect connected data manually
  • #15 Central elements of database are columns. A column contains name (unique identifier), value (data itself) and timestamp (it allows to determine whether the content is valid, i.e. up to date). Then we have row with row key and associated set of columns. Collection of rows forms column family. Each row of the column family may contain a different number of columns and, additionally, there may be various column names.
  • #16 Also it is possible to have supercolumns – the column, value of which is a map of columns.
  • #17 Fast queries on columns: For example if I'm looking at a database of Sales and I want to see how Price has changed over time, I need to look at the Price field for a lot of records, so it's nice to have those stored together in one column. Slow queries on rows: on the contrary, the query that the column store doesn’t like is something like "show me all the information about a particular Sale“ or add a Sale to database. Here you want lots of fields, but for a small number of rows – one. 
  • #19 This type of databases uses graph structures to represent, store and manage data. Graph database has concepts of edges, nodes and properties. The relationships (edges) link entities (nodes) directly using pointers (unlike in relational databases). Properties can be applied to both nodes and edges, and they help to query data.
  • #20 Well-suited for networ modelling, such as social networks Graph-like queries such as search for the shortest path between nodes Use of pointers allows to retrieve connected data in one operation (instead of searching through the data and using join operations as it is in relational approach). This enables rapid and deep traversal of the graph structure Unlike other NoSQL models, graph databases fully support ACID properties Does not support data sharding, meaning that all data must be stored on single server Hence, poor horizontal scalability
  • #25 As we have seen before, there are lots of different systems on the market. Now we will take a look at only few of them and we will try to evaluate them. For that we need some criteria…
  • #27 We selected top 4 systems which are … They use corresponding models …
  • #28 all systems have configurable consistency, except Redis.
  • #29 Replicated means that data is divided in several replica sets – shards, usually in this case master-slave model is used. Distributed means that each node in the cluster is responsible for a given data set. All of them are highly available.
  • #30 Concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible. Each system uses different method end ensures concurrency. MGL - locks objects that contain other objects. It exploits the hierarchical nature of MongoDB documents. MVCC takes a different approach: each user connected to the database sees a snapshot of the database at a particular instant in time. Any changes made by a writer will not be seen by other users of the database until the transaction has been committed. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted
  • #32 In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria (for example, text specified by a user). MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster
  • #33 This is interesting for us in context of DataBio project that Softeam participates in.
  • #34 GIS allows you to record a map with a geospatial referencing system such as longitude or latitude and then to add additional layers of other information.   Layers can be linked.  Analysis of the information can then be undertaken using the statistical and analytical tools that are provided as part of the GIS. It is possible to provide visual representations of data.  These representations can often reveal patterns and trends that might otherwise have gone unnoticed without the use of GIS techniques. Usecases: Mapping of data (visual representation of data on map) Proximity analysis (distance between objects, points, polygons etc.) Finding clusters Find nearest What’s in area?
  • #35 Taxi manager example implemented with GeoMesa that is used in Hadoop-based NoSQL systems
  • #36 NoSQL database must support geospatial data store geohash as integer index (e.g. quadtree, R-Tree or Hilbert curves index) converted from 2D, 3D or 4D coordinates and time. Provide API / query language to work with data As a result – NoSQL DB can be used in GIS