SlideShare a Scribd company logo
NoSQL
Leo’s notes
Those slides are Leopold Gault's notes, when reading :
• https://www.thoughtworks.com/insights/blog/nosql-databases-overview
• https://www.slideshare.net/arangodb/query-mechanisms-for-nosql-databases
• https://www.slideshare.net/arangodb/introduction-to-column-oriented-databases
• https://neo4j.com/developer/guide-data-modeling/
I am not a NoSQL expert; those notes are just my understanding of the aforementioned sources
Aggregates
Relational data models (OLTP and OLAP) vs NoSQL data models
NoSQL data modelsRelational data models
Transactional (OLTP)
Note that they represent a
document as a hierarchical
tree of data (it makes sense)
I think they meant to
represent a star schema
Who do I think is meant to be normalized ?
Transactional (OLTP)
normalized
normalizedDeliberately
de-normalized
Normalized ?
Not normalized
Not normalized
NoSQL data modelsRelational data models
Who do I think natively supports ACID transactions
Transactional (OLTP)
Always
Most of the times
(e.g. Node4J)
Maybe
sometimes
NoSQL data modelsRelational data models
Always
Maybe
sometimes
Maybe
sometimes
Why aggregates
Let’s say that my application always uses
a set of data like this one
Why aggregates
In a RDBMS, such set of data would
have to be fetched from many
tables (requiring plenty of JOINs)
Let’s say that my application always uses
a set of data like this one
Why aggregates
In a RDBMS, such set of data would
have to be fetched from many
tables (requiring plenty of JOINs)
Let’s say that my application always uses
a set of data like this one
We can see that there is a big mismatch between the way the data is
aggregated by this application (i.e. the data is aggregated), and how the
data was scattered in tables of the RDBMS.
Aggregate-oriented DBMS
NoSQL DBMS (bar Graph DBMS) are aggregate-oriented.
An aggregate is a set of data, that will form the boundaries for ACID
operations.
Hence, the “acidity scope” is not at the transaction level, but at the aggregate level. Note
however that some aggregate-oriented DBMS also support ACID transactions.
An aggregate’s data have been grouped together only because it makes sense to do so,
from the application’s point of view.
This grouping is masterminded by a human. By:
• the developer: when coding an app, the developer will try to identify which sets of
data will be accessed together by the app. He will hence decide to write/read each set
of data as an aggregate.
• or the creator of materialized views, i.e. new aggregates emitted from disparate data.
Why aggregate-oriented DBMS
Working with aggregates is more performant. Indeed, an aggregate is stored together,
instead of being scattered among many tables. The same applies when reading: it is
quicker to retrieve a set of data that has been stored together, than if it had been
scattered throughout many tables.
In a cluster of an aggregate-oriented DBMS, an aggregate can live on the same node (or
be replicated on the same few nodes). Thus our cluster can scale out without reducing
the response time, as sets of data frequently accessed together (i.e. aggregates) are not
cut into pieces that are scattered through many nodes. The same logic applies for
sharding (an aggregate would belong to a single shard, instead of many) and replication.
About aggregates
Here are 2 formal definitions :
• An aggregate is a collection of data that we interact with as a unit. These units
of data (aggregates) form the boundaries for ACID operations (at the
aggregate level) with the database. [source1]
• Aggregate defines a collection of related objects that we treat as a unit. This
unit is taken as a whole for the context of {data manipulation and
management of consistency}. We update aggregates via atomic operations
and communicate our data storage in terms of aggregates. NoSQL databases,
apart from graph databases , have aggregate data models.
However, relational databases have no concept of aggregates within their data
model. These are considered aggregate-ignorant.
An aggregate-ignorant model allows you to look at data in different ways, so
it’s good when you don’t have a primary structure for manipulating data.
Aggregate ignorant databases, like relational and graph databases, in general
support ACID transactions.
[source2]
Who do I think is aggregate oriented?
Transactional (OLTP)
Yes (1 aggregate = 1 column /
segment of column)
Yes (1 aggregate can be a whole document
(identified by its key),
or a materialized view generated using map-
reduce)
Yes
(1 aggregate = 1 value,
i.e. a BLOB that bundles together
a bunch of data, this bunch is
meaningful only for the app)
No
No
No
Aggregate ignorant
Aggregate oriented data models
Maybe also a column family ?
But I don’t think so
I think the reason why Graph DBs are not “aggregate oriented” is because, despite storing
data as interconnected nodes, a node is probably not considered as an aggregate; probably
because the boundaries of an ACID operation extend beyond one node.
NoSQL data modelsRelational data models
Key-Value DBMS
Performance, but ignorance of what the values mean.
Key value DBMS
BLOB.
The K-V DBMS doesn’t care
what’s inside this BLOB value;
it’s up to the app to figure that
out.
key value
key value
key value
key value
key value
key value
Values are just BLOBs ; they have no meaning for the DBMS
Key-value DBMS
Key value DBMS
key value
key value
key value
key value
key value
key value
API
• get the value for a key,
• put a value for a key,
• delete a key-value pair
How to query: with a very simple API
Key-value DBMS
Documents DBMS
Store hierarchical trees of data
<Value=Document>
<Key=DocumentID>
Documents DBMS
key document
key document
key document
key document
key document
“key-value stores where the value is examinable”; indeed this value is a document
key document
Depending on the DBMS, the document
may be in JSON, XML, BSON, etc.
Documents DBMS
Documents DBMS
key document
key document
key document
key document
key document
Example with a JSON document
key document
Documents DBMS
Documents DBMS
key document
key document
key document
key document
key document
How to query: with the document key, or (for some DBMS, like MongoDB) with attributes within documents
key
API
MongoDB
Actually, with MongoDB, it wouldn’t be a JSON doc, but a
BSON one. So it’d look like this:
x31x00x00x00
x04BSONx00
x26x00x00x00
x02x30x00x08x00x00x00awesomex00
x01x31x00x33x33x33x33x33x33x14x40
x10x32x00xc2x07x00x00
x00
x00
Documents DBMS
key document
key document
key document
key document
key document
How to query: for some other DBMS (e.g. CouchDB), querying docs by anything else than their ID requires
creating a materialized view, populated with JavaScript map-reduce code (for instance).
key
API
CouchDB
Document ID
This functions will parse all the
documents in the store, and emit the
docID of docs where there is a match
(where one of the topics is “music”).
The load of running a map function can be
distributed between nodes.
I think that this map function should be followed by a reduce
function that simply returns what it has been fed as parameters: e.g.
nonReduce = function (keys, values, reduce) {
if (reduce) {
// never run
}else{
// returns the emitted data
return values;
}
};
false
Documents DBMS
key document
key document
key document
key document
key document
Example with map and reduce
key
API
CouchDB
I think it's an array (with keys) of an array (with '1's) :
values= [ 'skating':[1,1]
'music': [1],
'sleeping': [1,1,1,1]
];
length() of each nested array ?
Boolean to say whether or not a
re-reduce is needed
That’s a key
That’s a value
Columnar DBMS
To have the DBMS work on columns, instead of rows
Columnar DBMS vs RDBMS
How you use them
Columnar DBMS
• Data is stored in columns
• You specify column families (kind of entities), that are composed of
rows featuring some of the columns (among all the columns
mentioned in the column-family).
RDBMS
• Data is stored in tables; each row contains data for all columns (although
a value can be NULL)
Col 1 Col 2 Col 3
Column family A
row1
row2
row3
row4
Col 1 Col 2 Col 3
Table A
row1
row2
row3
row4
Why columnar DBMS ?
The benefits of column-oriented DBMS reside only in the way they
store data on-disk: they stores data by column instead of by row.
This makes such DBMS more performant when you query a few
columns, but read/write many things in those few columns.
It also makes possible to store the columns in a compressed state, and
only the columns being queried will be decompressed (on the fly).
Such DBMS are meant for analytics or batch-processing use-cases (and
not performant at all for OLTP).
Colum oriented storage vs Row oriented storage
Column oriented storage (columnar DBMS’ strategy)
• Each column is stored in its own datafile
source
datafile0
datafile1
a. Adding/deleting a column is relatively cheep in I/O: it only requires
working on a single small datafile.
b. Columns are stored compressed on the disk. Only the columns you
query will be decompressed (on the fly).
Row oriented storage (RDBMS’ strategy)
a. it might require to rewrite the whole table...
b. you can’t compress rows, because the whole row has to be decompressed
in order to be understandable (just like in a column-oriented storage, the whole column has to be
decompressed, or at least the whole subset of a column –i.e. “segment” ?-). This means the whole
table would have to be decompressed in order to be queried (I don’t think it you
could only decompress a subset of the table, because it is hard to think of a meaningful way the table could have
been chunked. Maybe you could only compress all the values except the ID, and chunk the table based on the ID;
but it would only be useful for JOINs based on foreign key.). A decompressed table is often too
big to fit only in memory, so you’d have to swap part of it on disk (which is
slow) just to be able to query it.
Colum oriented storage vs Row oriented storage
when not to use
Column oriented storage (columnar DBMS’ strategy)
source
• If you only want to work on a few rows (like it’s often the case in
OLTP), it won’t be performant at all: you’ll have to read and
decompress all the columns (or at least their relevant subsets), and
then recompress and rewrite them.
Row oriented storage (RDBMS’ strategy)
• If you only need to work on a few columns, but the table has may
columns, and you want to read/write many thing from those few
columns, you’ll have to read the whole row, just to get the few column
data that interests you.
Col 1 Col 2 Col 3
You just want to
modify a row
FYI: Memory page: the smallest unit of data for virtual-memory management: the OS will move this unit
of block from the HD to the RAM using I/O channels, and vice-versa. As it is the smallest unit, a page is
read from disk as a whole, including unused space.
Graph DBMS
To store and query relationships
How to deal with many relationships
RDBMS
• you would use JOINs to compute relationships, at query
time. On top of being less intuitive, the performance of
the JOINs will decrease exponentially with the size of
the tables being joined.
Graph DBMS
• the relationships are natively stored, so no relationship
will have to be computed at run time.
Labelled Property Graph Model (e.g. implemented by Neo4J)
A graph in such a model is composed of:
• Nodes
• Relationships (between 2 nodes.)
Labelled Property Graph Model (e.g. implemented by Neo4J)
About Nodes
A node can contain:
• Properties: multiple key-value pairs
• Labels: tags representing the roles of the node in the data domain. They are used to group
nodes into sets. Labels may also serve to attach metadata (index or constraint information)
to certain nodes.
Nodes
+
Label Labelled nodes
Person Book
Those names are
properties
Labelled Property Graph Model (e.g. implemented by Neo4J)
About Relationships
A relationship always has:
• a direction: a start node, and an end node
• a type (i.e. a name)
• Properties: multiple key-value pairs
Properties
Properties
The type of relationship is
“HAS_READ”

More Related Content

What's hot

Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
Heman Hosainpana
 
Object relational database management system
Object relational database management systemObject relational database management system
Object relational database management system
Saibee Alam
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Marin Dimitrov
 
PostgreSQL - Case Study
PostgreSQL - Case StudyPostgreSQL - Case Study
PostgreSQL - Case Study
S.Shayan Daneshvar
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Ms sql-server
Ms sql-serverMs sql-server
Ms sql-server
Md.Mojibul Hoque
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
Tariqul islam
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
mymail2ashok
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
Fabio Fumarola
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Beat Signer
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
Apache Hive
Apache HiveApache Hive
Apache Hive
tusharsinghal58
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
rainynovember12
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Amit Khandelwal
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
BADR
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
rainynovember12
 

What's hot (20)

Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Object relational database management system
Object relational database management systemObject relational database management system
Object relational database management system
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
PostgreSQL - Case Study
PostgreSQL - Case StudyPostgreSQL - Case Study
PostgreSQL - Case Study
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Ms sql-server
Ms sql-serverMs sql-server
Ms sql-server
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
Column db dol
Column db dolColumn db dol
Column db dol
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 

Similar to NoSQL - Leo's notes

Nosql
NosqlNosql
Datastores
DatastoresDatastores
Datastores
Raveen Vijayan
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
RojaT4
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
Imteyaz Khan
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
RUHULAMINHAZARIKA
 
Some NoSQL
Some NoSQLSome NoSQL
Some NoSQL
Malk Zameth
 
Nosql seminar
Nosql seminarNosql seminar
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
Sitamarhi Institute of Technology
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
Sitamarhi Institute of Technology
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Mongo db
Mongo dbMongo db
Mongo db
Gyanendra Yadav
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
revathigollu23
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
Mohammed Ragab
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
Guillermo Julca
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
Rick Perry
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
Max Neunhöffer
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
Praveen M Jigajinni
 
Nosql
NosqlNosql
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
ShaimaaMohamedGalal
 

Similar to NoSQL - Leo's notes (20)

Nosql
NosqlNosql
Nosql
 
Datastores
DatastoresDatastores
Datastores
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Choosing your NoSQL storage
Choosing your NoSQL storageChoosing your NoSQL storage
Choosing your NoSQL storage
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
 
Some NoSQL
Some NoSQLSome NoSQL
Some NoSQL
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
Nosql
NosqlNosql
Nosql
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 

More from Léopold Gault

OAuth OpenID Connect
OAuth OpenID ConnectOAuth OpenID Connect
OAuth OpenID Connect
Léopold Gault
 
SAML
SAMLSAML
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
Léopold Gault
 
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
Léopold Gault
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
Léopold Gault
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 Days
Léopold Gault
 
Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c
Léopold Gault
 

More from Léopold Gault (7)

OAuth OpenID Connect
OAuth OpenID ConnectOAuth OpenID Connect
OAuth OpenID Connect
 
SAML
SAMLSAML
SAML
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 Days
 
Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c
 

Recently uploaded

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

NoSQL - Leo's notes

  • 1. NoSQL Leo’s notes Those slides are Leopold Gault's notes, when reading : • https://www.thoughtworks.com/insights/blog/nosql-databases-overview • https://www.slideshare.net/arangodb/query-mechanisms-for-nosql-databases • https://www.slideshare.net/arangodb/introduction-to-column-oriented-databases • https://neo4j.com/developer/guide-data-modeling/ I am not a NoSQL expert; those notes are just my understanding of the aforementioned sources
  • 2.
  • 4. Relational data models (OLTP and OLAP) vs NoSQL data models NoSQL data modelsRelational data models Transactional (OLTP) Note that they represent a document as a hierarchical tree of data (it makes sense) I think they meant to represent a star schema
  • 5. Who do I think is meant to be normalized ? Transactional (OLTP) normalized normalizedDeliberately de-normalized Normalized ? Not normalized Not normalized NoSQL data modelsRelational data models
  • 6. Who do I think natively supports ACID transactions Transactional (OLTP) Always Most of the times (e.g. Node4J) Maybe sometimes NoSQL data modelsRelational data models Always Maybe sometimes Maybe sometimes
  • 7. Why aggregates Let’s say that my application always uses a set of data like this one
  • 8. Why aggregates In a RDBMS, such set of data would have to be fetched from many tables (requiring plenty of JOINs) Let’s say that my application always uses a set of data like this one
  • 9. Why aggregates In a RDBMS, such set of data would have to be fetched from many tables (requiring plenty of JOINs) Let’s say that my application always uses a set of data like this one We can see that there is a big mismatch between the way the data is aggregated by this application (i.e. the data is aggregated), and how the data was scattered in tables of the RDBMS.
  • 10. Aggregate-oriented DBMS NoSQL DBMS (bar Graph DBMS) are aggregate-oriented. An aggregate is a set of data, that will form the boundaries for ACID operations. Hence, the “acidity scope” is not at the transaction level, but at the aggregate level. Note however that some aggregate-oriented DBMS also support ACID transactions. An aggregate’s data have been grouped together only because it makes sense to do so, from the application’s point of view. This grouping is masterminded by a human. By: • the developer: when coding an app, the developer will try to identify which sets of data will be accessed together by the app. He will hence decide to write/read each set of data as an aggregate. • or the creator of materialized views, i.e. new aggregates emitted from disparate data.
  • 11. Why aggregate-oriented DBMS Working with aggregates is more performant. Indeed, an aggregate is stored together, instead of being scattered among many tables. The same applies when reading: it is quicker to retrieve a set of data that has been stored together, than if it had been scattered throughout many tables. In a cluster of an aggregate-oriented DBMS, an aggregate can live on the same node (or be replicated on the same few nodes). Thus our cluster can scale out without reducing the response time, as sets of data frequently accessed together (i.e. aggregates) are not cut into pieces that are scattered through many nodes. The same logic applies for sharding (an aggregate would belong to a single shard, instead of many) and replication.
  • 12. About aggregates Here are 2 formal definitions : • An aggregate is a collection of data that we interact with as a unit. These units of data (aggregates) form the boundaries for ACID operations (at the aggregate level) with the database. [source1] • Aggregate defines a collection of related objects that we treat as a unit. This unit is taken as a whole for the context of {data manipulation and management of consistency}. We update aggregates via atomic operations and communicate our data storage in terms of aggregates. NoSQL databases, apart from graph databases , have aggregate data models. However, relational databases have no concept of aggregates within their data model. These are considered aggregate-ignorant. An aggregate-ignorant model allows you to look at data in different ways, so it’s good when you don’t have a primary structure for manipulating data. Aggregate ignorant databases, like relational and graph databases, in general support ACID transactions. [source2]
  • 13. Who do I think is aggregate oriented? Transactional (OLTP) Yes (1 aggregate = 1 column / segment of column) Yes (1 aggregate can be a whole document (identified by its key), or a materialized view generated using map- reduce) Yes (1 aggregate = 1 value, i.e. a BLOB that bundles together a bunch of data, this bunch is meaningful only for the app) No No No Aggregate ignorant Aggregate oriented data models Maybe also a column family ? But I don’t think so I think the reason why Graph DBs are not “aggregate oriented” is because, despite storing data as interconnected nodes, a node is probably not considered as an aggregate; probably because the boundaries of an ACID operation extend beyond one node. NoSQL data modelsRelational data models
  • 14. Key-Value DBMS Performance, but ignorance of what the values mean.
  • 15. Key value DBMS BLOB. The K-V DBMS doesn’t care what’s inside this BLOB value; it’s up to the app to figure that out. key value key value key value key value key value key value Values are just BLOBs ; they have no meaning for the DBMS Key-value DBMS
  • 16. Key value DBMS key value key value key value key value key value key value API • get the value for a key, • put a value for a key, • delete a key-value pair How to query: with a very simple API Key-value DBMS
  • 18. <Value=Document> <Key=DocumentID> Documents DBMS key document key document key document key document key document “key-value stores where the value is examinable”; indeed this value is a document key document Depending on the DBMS, the document may be in JSON, XML, BSON, etc. Documents DBMS
  • 19. Documents DBMS key document key document key document key document key document Example with a JSON document key document Documents DBMS
  • 20. Documents DBMS key document key document key document key document key document How to query: with the document key, or (for some DBMS, like MongoDB) with attributes within documents key API MongoDB Actually, with MongoDB, it wouldn’t be a JSON doc, but a BSON one. So it’d look like this: x31x00x00x00 x04BSONx00 x26x00x00x00 x02x30x00x08x00x00x00awesomex00 x01x31x00x33x33x33x33x33x33x14x40 x10x32x00xc2x07x00x00 x00 x00
  • 21. Documents DBMS key document key document key document key document key document How to query: for some other DBMS (e.g. CouchDB), querying docs by anything else than their ID requires creating a materialized view, populated with JavaScript map-reduce code (for instance). key API CouchDB Document ID This functions will parse all the documents in the store, and emit the docID of docs where there is a match (where one of the topics is “music”). The load of running a map function can be distributed between nodes. I think that this map function should be followed by a reduce function that simply returns what it has been fed as parameters: e.g. nonReduce = function (keys, values, reduce) { if (reduce) { // never run }else{ // returns the emitted data return values; } }; false
  • 22. Documents DBMS key document key document key document key document key document Example with map and reduce key API CouchDB I think it's an array (with keys) of an array (with '1's) : values= [ 'skating':[1,1] 'music': [1], 'sleeping': [1,1,1,1] ]; length() of each nested array ? Boolean to say whether or not a re-reduce is needed That’s a key That’s a value
  • 23. Columnar DBMS To have the DBMS work on columns, instead of rows
  • 24. Columnar DBMS vs RDBMS How you use them Columnar DBMS • Data is stored in columns • You specify column families (kind of entities), that are composed of rows featuring some of the columns (among all the columns mentioned in the column-family). RDBMS • Data is stored in tables; each row contains data for all columns (although a value can be NULL) Col 1 Col 2 Col 3 Column family A row1 row2 row3 row4 Col 1 Col 2 Col 3 Table A row1 row2 row3 row4
  • 25. Why columnar DBMS ? The benefits of column-oriented DBMS reside only in the way they store data on-disk: they stores data by column instead of by row. This makes such DBMS more performant when you query a few columns, but read/write many things in those few columns. It also makes possible to store the columns in a compressed state, and only the columns being queried will be decompressed (on the fly). Such DBMS are meant for analytics or batch-processing use-cases (and not performant at all for OLTP).
  • 26. Colum oriented storage vs Row oriented storage Column oriented storage (columnar DBMS’ strategy) • Each column is stored in its own datafile source datafile0 datafile1 a. Adding/deleting a column is relatively cheep in I/O: it only requires working on a single small datafile. b. Columns are stored compressed on the disk. Only the columns you query will be decompressed (on the fly). Row oriented storage (RDBMS’ strategy) a. it might require to rewrite the whole table... b. you can’t compress rows, because the whole row has to be decompressed in order to be understandable (just like in a column-oriented storage, the whole column has to be decompressed, or at least the whole subset of a column –i.e. “segment” ?-). This means the whole table would have to be decompressed in order to be queried (I don’t think it you could only decompress a subset of the table, because it is hard to think of a meaningful way the table could have been chunked. Maybe you could only compress all the values except the ID, and chunk the table based on the ID; but it would only be useful for JOINs based on foreign key.). A decompressed table is often too big to fit only in memory, so you’d have to swap part of it on disk (which is slow) just to be able to query it.
  • 27. Colum oriented storage vs Row oriented storage when not to use Column oriented storage (columnar DBMS’ strategy) source • If you only want to work on a few rows (like it’s often the case in OLTP), it won’t be performant at all: you’ll have to read and decompress all the columns (or at least their relevant subsets), and then recompress and rewrite them. Row oriented storage (RDBMS’ strategy) • If you only need to work on a few columns, but the table has may columns, and you want to read/write many thing from those few columns, you’ll have to read the whole row, just to get the few column data that interests you. Col 1 Col 2 Col 3 You just want to modify a row FYI: Memory page: the smallest unit of data for virtual-memory management: the OS will move this unit of block from the HD to the RAM using I/O channels, and vice-versa. As it is the smallest unit, a page is read from disk as a whole, including unused space.
  • 28. Graph DBMS To store and query relationships
  • 29. How to deal with many relationships RDBMS • you would use JOINs to compute relationships, at query time. On top of being less intuitive, the performance of the JOINs will decrease exponentially with the size of the tables being joined. Graph DBMS • the relationships are natively stored, so no relationship will have to be computed at run time.
  • 30. Labelled Property Graph Model (e.g. implemented by Neo4J) A graph in such a model is composed of: • Nodes • Relationships (between 2 nodes.)
  • 31. Labelled Property Graph Model (e.g. implemented by Neo4J) About Nodes A node can contain: • Properties: multiple key-value pairs • Labels: tags representing the roles of the node in the data domain. They are used to group nodes into sets. Labels may also serve to attach metadata (index or constraint information) to certain nodes. Nodes + Label Labelled nodes Person Book Those names are properties
  • 32. Labelled Property Graph Model (e.g. implemented by Neo4J) About Relationships A relationship always has: • a direction: a start node, and an end node • a type (i.e. a name) • Properties: multiple key-value pairs Properties Properties The type of relationship is “HAS_READ”

Editor's Notes

  1. I think an aggregate is stored as a BLOB value (associated to a key), in a Key-Value DBMS a document, in a Document DBMS a column, in a Columnar DBMS