SlideShare a Scribd company logo
1 of 67
NoSQL databases
Filip Ilievski (f.ilievski@vu.nl)
Vrije Universiteit Amsterdam
What will you hear about today
1) Databases: A historical perspective
2) Modern trends and why NoSQL?
3) Properties of NoSQL
4) Categories and instances of NoSQL databases
5) RDBMSs vs NoSQL
6) Use cases & The Semantic Web
The origin of Relational DBMSs
“In 1970, Edgar F. Codd, an Oxford-educated mathematician working at the IBM
San Jose Research Lab, published a paper showing how information stored in large
databases could be accessed without knowing how the information was structured
or where it resided in the database.”
The origin of Relational DBMSs
“Until then, retrieving information required relatively sophisticated computer
knowledge, or even the services of specialists who knew how to write programs to
fetch specific information—a time-consuming and expensive task.”
“What Codd did was open the door to a new world of data independence. Users
wouldn’t have to be specialists, nor would they need to know where the information
was or how the computer retrieved it. They could now concentrate more on their
businesses and less on their computers.”
(http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/)
The origin of Relational DBMSs
After Codd’s paper described the relational ideas in theory, practical
implementations followed.
Soon, the Relational DBMSs were proven superior over the ad-hoc flat file
databases and became de facto standard in terms of modelling, representing and
accessing data.
Relational databases were (and often still are) ideal when the storage of data was
expensive and the data schemas were fairly simple.
But the circumstances change.
Trend #1: Increase in modeling complexity
In the Big Data era, it is not uncommon for a database to have hundreds-
thousands of tables, and many of these should be combined to give the final
requested information.
=> Trend #1: Queries get more complex and require complex SQL
operations
Trend #2: Using tables is not always optimal
Think of social media, where it is more
intuitive to use something like a graph
network
=> Trend #2: Some use cases require
graphs or nested structures rather
than tables
Trend #2: Using tables is not always optimal
Think of social media, where it is more
intuitive to use something like a graph
network
=> Trend #2: Some use cases require
graphs or nested structures rather
than tables
Trend #3: A fixed schema is not always optimal
Our model might change over time
Applications in practice are typically incremental (“agile”), so the initial design
choices get altered -- and this influences the data model.
Having a fixed pre-defined schema is not always desired.
Trend #4: Increase in querying intensity
The Internet in the 80s was not there as we know it: only a handful of people
would access and use a database.
There are many more users accessing the data now, mainly needed by the social
networks and enabled by the computing power increase.
Trend #4: Increase in querying intensity
The Internet in the 80s was not there as we know it: only a handful of people
would access and use a database.
There are many more users accessing the data now, mainly needed by the social
networks and enabled by the computing power increase.
Additionally, the intensity of access by an individual user is higher (imagine that
each Facebook like is yet another WRITE to the database).
â—Ź A single website being accessed can trigger tens-hundreds of database
queries.
=> Trend #4: Many more queries have to be handled
=> Trend #4’: The querying intensity varies
Trend #5: Cheap storage
Hardware is much cheaper, more accessible and powerful now.
Trend #5: Cheap storage
Hardware is much cheaper, more accessible and powerful now.
Trend #5: Cheap storage
Hardware is much cheaper, more accessible and powerful now.
When scaling up, it makes more sense to add more computers (horizontal scaling)
rather than to upgrade a single machine (vertical scaling)
=> Trend #5: Usage of a lot of cheap (cloud) hardware instead of a single
powerful machine
Modern trends: summary
=> Trend #1: Queries get more complex and require complex SQL
operations
=> Trend #2: Some use cases require graphs or nested structures rather
than table
=> Trend #3: The data model might evolve over time
=> Trend #4: Many more queries have to be handled
=> Trend #4’: The querying intensity varies
=> Trend #5: Usage of a lot of cheap (cloud) hardware instead of a single
powerful machine
Relational databases in the modern era
Relational databases are not designed to handle many users or large data.
Relational databases are not designed to be distributed over multiple computers.
Relational databases are not efficient with data that is not meant to be stored in
tables.
Sometimes, the relational database principles (e.g. normalization) are too complex
and slow when there is a lot of data or a complex model.
NoSQL (=Not-Only-SQL)
Main idea: adapt to the new trends/needs
1) To avoid making complex queries and joining many datasets, store as much
as possible in a single record
a) Consequently, the data structure in NoSQL is often not a table, but instead a dictionary, a tree,
an array, etc.
b) Yes, this leads to repetition and violates the normalization principle.
NoSQL (=Not-Only-SQL)
Main idea: adapt to the new trends/needs
1) To avoid making complex queries and joining many datasets, store as much
as possible in a single record
a) Consequently, the data structure in NoSQL is often not a table, but instead a dictionary, a tree,
an array, etc.
b) Yes, this leads to repetition and violates the normalization principle.
2) Use a data structure that is most appropriate for the problem (e.g. use a
graph to model social networks)
NoSQL (=Not-Only-SQL)
Main idea: adapt to the new trends/needs
1) To avoid making complex queries and joining many datasets, store as much
as possible in a single record
a) Consequently, the data structure in NoSQL is often not a table, but instead a dictionary, a tree,
an array, etc.
b) Yes, this leads to repetition and violates the normalization principle.
2) Use a data structure that is most appropriate for the problem (e.g. use a
graph to model social networks)
3) To handle many users and many queries, and to use the cheap hardware,
distribute the data over many simple machines.
What is NoSQL?
What is NoSQL?
It is a group of databases that attempts to provide more flexible way of data
storage, while adapting to the new trends of intensive storage and accessible
hardware.
This idea started around 2009, and it has been already widely adopted.
What is NoSQL?
NoSQL stands for Not-Only-SQL (yes, some NoSQL databases support SQL
operations too).
NoSQL databases are NOT meant to replace relational databases, both have their
use cases.
â—Ź Flexible schema / schema less
Characteristics of a NoSQL database
Image from https://www.slideshare.net/mikecrabb/a-beginners-guide-to-nosql/30-SIDENOTEcolour_tabbyname_Gunthercolour_gingername_Mylocolour
â—Ź Flexible schema / schema less
â—Ź Non relational (forget about normalization today - but you have to know it for
the exam :) )
Characteristics of a NoSQL database
â—Ź Flexible schema / schema less
â—Ź Non relational
â—Ź Simple access compared to SQL (but not standard across products)
Characteristics of a NoSQL database
â—Ź Flexible schema / schema less
â—Ź Non relational
â—Ź Simple access compared to SQL (but not standard across products)
â—Ź Basically, does not support complicated queries
â—Ź Cheaper than RDBMS systems
â—Ź Horizontally scalable
â—Ź Replicated
â—Ź Distributed
Characteristics of a NoSQL database
NoSQL databases
Filip Ilievski (f.ilievski@vu.nl)
Vrije Universiteit Amsterdam
Half time
NoSQL DBs
A)KEY-VALUE DBs
Each record in the database contains a unique key, which “hides” the value
Analogue to key-value pairs in dictionaries/JSON
Very simple and designed to work with a lot of data
Each record can have a different structure/type of data
A)KEY-VALUE DBs
A)KEY-VALUE DBs
A)KEY-VALUE DBs
A)KEY-VALUE DBs: Code for Redis
Initialization
import redis
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
Store a record
my_key =”B”
my_value=”triangle”
r.set(my_key, my_value)
Obtain a record
my_key=”B”
r.get(my_key)
B) DOCUMENT-STORE DBs
Again, simple and designed to work with a lot of data
Very similar to a key-value database
Main difference is that you can actually see (and QUERY for) the values
B) DOCUMENT-STORE DBs
B) DOCUMENT-STORE DBs
B) DOCUMENT-STORE DBs
B) DOCUMENT-STORE DBs: MongoDB
C) GRAPH DBs
Graph databases focus on modelling the structure of the data
Inspired by Euler’s graph theory, G=(E,V)
Motivated mainly by social media networks
C) GRAPH DBs
C) GRAPH DBs
C) GRAPH DBs
C) GRAPH DBs
C) GRAPH DBs
C) GRAPH DBs: Neo4j
D) COLUMN STORE
Column data is saved together, instead of row data
Super useful for data analytics
Inspired by Google BigTable
D) COLUMN STORE: Facebook’s Cassandra
It is mainly about the size
E) OTHER DATABASES
Many other NoSQL databases exist that do not fall in these four categories:
â—Ź Text search databases (e.g. ElasticSearch)
â—Ź XML databases
â—Ź ...
Comparison of NoSQL categories
Popularity
https://db-engines.com/en/ranking
So, which one to use?
It depends
In general, RDBMS is great for ensuring data consistency, when data validity is
very important (think financial transactions and similar corporate applications)
NoSQL is great ensuring high availability and speed rather than validity
NoSQL is also great for many applications that are hard to imagine with RDBMSs:
indexing text documents, social networks, storing coordinates, etc.
Pick the right tool for the job!
Me and NoSQL: Map studio
Bachelor thesis: Use of political maps to render statistical data on countries,
provinces, areas, cities, etc.
We are drawing the maps from scratch, so we have to keep a huge number of
coordinates in our database
Typically the coordinates were stored only once, and retrieved many times. So the
READ operation is very important to be efficient, but not the WRITE one.
Me and NoSQL: Map studio
Requirements:
â—Ź Main one is response time (=how quick can the database return an
answer/the data)
â—Ź Secondary goal is horizontal scalability (=the database should allow one
easily to split the data on multiple machines)
What is less/not important:
â—Ź consistency
â—Ź concurrency
Map studio
Solution:
Knowledge graphs
Knowledge graphs
Knowledge graphs
A knowledge graph acquires and integrates information into an ontology
and applies a reasoner to derive new knowledge.
â—Ź It is a graph
â—Ź It is semantic = the meaning of the data is encoded alongside the data in the
graph, in the form of the ontology. A knowledge graph is self-descriptive, or,
simply put, it provides a single place to find the data and understand what it’s
all about.
â—Ź It is smart = the ontology allows implicit information to be inferred from explicit
data
â—Ź It is alive = it can grow and get updated
Knowledge graphs
These principles are shared in the research on Semantic Web and Linked open
data, aiming to construct a Web of things.
Each world fact is represented as a subject-predicate-object triple and stored as
Linked Open Data.
By the end of 2016 Google’s knowledge graph apparently contained 70 billion
connected facts.
The Linked Open Data cloud also contains billions of facts (our LOD Laundromat
collection at VU might be among the largest -> last counted 40B, but next run will
contain much more).
Take aways from today’s lecture
1) NoSQL fills the spectrum between flat files and relational DBs
2) NoSQL databases were introduced to fit the new trends: many users, many
queries, complex data models, diverse data structures, and cheap hardware
3) The data structure in NoSQL is often not a table, but instead a dictionary, a
tree, an array, etc.
4) Horizontal scaling, repetition of data, flexible schema
5) Classes of NoSQL DBs: graph, column, key-value, document, other
6) RDBMS or NoSQL? Pick the right tool for the job
Acknowledgements
Some slides have been copied from existing slides on Slideshare
â—Ź https://www.slideshare.net/mikecrabb/a-beginners-guide-to-nosql/
â—Ź https://www.slideshare.net/bhaskar_vk/introduction-to-nosql-databases-
47768468
â—Ź https://www.slideshare.net/RTigger/sql-vs-no-sql
https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f
For an advanced presentation on NoSQL, see:
â—Ź https://www.slideshare.net/quipo/nosql-databases-why-what-and-when/34-
Thanks!
Anything you would like to talk about?

More Related Content

What's hot

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence Dr-Dipali Meher
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals Antonios Giannopoulos
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2Fabio Fumarola
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data modelsMoumie Soulemane
 

What's hot (20)

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
NoSql
NoSqlNoSql
NoSql
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 

Similar to NoSQL databases

Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docxpinstechwork
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSabdurrobsoyon
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technologyINFOGAIN PUBLICATION
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDBAhsan Bilal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docxpinstechwork
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxvvpadhu
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101ThienSi Le
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptxRushikeshChikane2
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudantPeter Tutty
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdfAkshayDwivedi31
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL Prasoon Sharma
 

Similar to NoSQL databases (20)

Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docx
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
 

More from Filip Ilievski

The Commonsense Knowledge Graph
The Commonsense Knowledge GraphThe Commonsense Knowledge Graph
The Commonsense Knowledge GraphFilip Ilievski
 
Commonsense knowledge in Wikidata
Commonsense knowledge in WikidataCommonsense knowledge in Wikidata
Commonsense knowledge in WikidataFilip Ilievski
 
SemEval-2018 task 5: Counting events and participants in the long tail
SemEval-2018 task 5: Counting events and participants in the long tailSemEval-2018 task 5: Counting events and participants in the long tail
SemEval-2018 task 5: Counting events and participants in the long tailFilip Ilievski
 
A look inside Babelfy: Examining the bubble
A look inside Babelfy: Examining the bubbleA look inside Babelfy: Examining the bubble
A look inside Babelfy: Examining the bubbleFilip Ilievski
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slidesFilip Ilievski
 
Systematic Study of Long Tail Phenomena in Entity Linking
Systematic Study of Long Tail Phenomena in Entity LinkingSystematic Study of Long Tail Phenomena in Entity Linking
Systematic Study of Long Tail Phenomena in Entity LinkingFilip Ilievski
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataFilip Ilievski
 
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Filip Ilievski
 
NAF2SEM and cross-document Event Coreference
NAF2SEM and cross-document Event CoreferenceNAF2SEM and cross-document Event Coreference
NAF2SEM and cross-document Event CoreferenceFilip Ilievski
 
Mini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimizationMini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimizationFilip Ilievski
 
CLiN 25: NED with two-stage coherence optimization
CLiN 25: NED with two-stage coherence optimizationCLiN 25: NED with two-stage coherence optimization
CLiN 25: NED with two-stage coherence optimizationFilip Ilievski
 

More from Filip Ilievski (11)

The Commonsense Knowledge Graph
The Commonsense Knowledge GraphThe Commonsense Knowledge Graph
The Commonsense Knowledge Graph
 
Commonsense knowledge in Wikidata
Commonsense knowledge in WikidataCommonsense knowledge in Wikidata
Commonsense knowledge in Wikidata
 
SemEval-2018 task 5: Counting events and participants in the long tail
SemEval-2018 task 5: Counting events and participants in the long tailSemEval-2018 task 5: Counting events and participants in the long tail
SemEval-2018 task 5: Counting events and participants in the long tail
 
A look inside Babelfy: Examining the bubble
A look inside Babelfy: Examining the bubbleA look inside Babelfy: Examining the bubble
A look inside Babelfy: Examining the bubble
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
 
Systematic Study of Long Tail Phenomena in Entity Linking
Systematic Study of Long Tail Phenomena in Entity LinkingSystematic Study of Long Tail Phenomena in Entity Linking
Systematic Study of Long Tail Phenomena in Entity Linking
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15Lotus: Linked Open Text UnleaShed - ISWC COLD '15
Lotus: Linked Open Text UnleaShed - ISWC COLD '15
 
NAF2SEM and cross-document Event Coreference
NAF2SEM and cross-document Event CoreferenceNAF2SEM and cross-document Event Coreference
NAF2SEM and cross-document Event Coreference
 
Mini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimizationMini seminar presentation on context-based NED optimization
Mini seminar presentation on context-based NED optimization
 
CLiN 25: NED with two-stage coherence optimization
CLiN 25: NED with two-stage coherence optimizationCLiN 25: NED with two-stage coherence optimization
CLiN 25: NED with two-stage coherence optimization
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo GarcĂ­a Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

NoSQL databases

  • 1. NoSQL databases Filip Ilievski (f.ilievski@vu.nl) Vrije Universiteit Amsterdam
  • 2. What will you hear about today 1) Databases: A historical perspective 2) Modern trends and why NoSQL? 3) Properties of NoSQL 4) Categories and instances of NoSQL databases 5) RDBMSs vs NoSQL 6) Use cases & The Semantic Web
  • 3. The origin of Relational DBMSs “In 1970, Edgar F. Codd, an Oxford-educated mathematician working at the IBM San Jose Research Lab, published a paper showing how information stored in large databases could be accessed without knowing how the information was structured or where it resided in the database.”
  • 4. The origin of Relational DBMSs “Until then, retrieving information required relatively sophisticated computer knowledge, or even the services of specialists who knew how to write programs to fetch specific information—a time-consuming and expensive task.” “What Codd did was open the door to a new world of data independence. Users wouldn’t have to be specialists, nor would they need to know where the information was or how the computer retrieved it. They could now concentrate more on their businesses and less on their computers.” (http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/reldb/)
  • 5. The origin of Relational DBMSs After Codd’s paper described the relational ideas in theory, practical implementations followed. Soon, the Relational DBMSs were proven superior over the ad-hoc flat file databases and became de facto standard in terms of modelling, representing and accessing data. Relational databases were (and often still are) ideal when the storage of data was expensive and the data schemas were fairly simple.
  • 7. Trend #1: Increase in modeling complexity In the Big Data era, it is not uncommon for a database to have hundreds- thousands of tables, and many of these should be combined to give the final requested information. => Trend #1: Queries get more complex and require complex SQL operations
  • 8. Trend #2: Using tables is not always optimal Think of social media, where it is more intuitive to use something like a graph network => Trend #2: Some use cases require graphs or nested structures rather than tables
  • 9. Trend #2: Using tables is not always optimal Think of social media, where it is more intuitive to use something like a graph network => Trend #2: Some use cases require graphs or nested structures rather than tables
  • 10. Trend #3: A fixed schema is not always optimal Our model might change over time Applications in practice are typically incremental (“agile”), so the initial design choices get altered -- and this influences the data model. Having a fixed pre-defined schema is not always desired.
  • 11. Trend #4: Increase in querying intensity The Internet in the 80s was not there as we know it: only a handful of people would access and use a database. There are many more users accessing the data now, mainly needed by the social networks and enabled by the computing power increase.
  • 12. Trend #4: Increase in querying intensity The Internet in the 80s was not there as we know it: only a handful of people would access and use a database. There are many more users accessing the data now, mainly needed by the social networks and enabled by the computing power increase. Additionally, the intensity of access by an individual user is higher (imagine that each Facebook like is yet another WRITE to the database). â—Ź A single website being accessed can trigger tens-hundreds of database queries. => Trend #4: Many more queries have to be handled => Trend #4’: The querying intensity varies
  • 13. Trend #5: Cheap storage Hardware is much cheaper, more accessible and powerful now.
  • 14. Trend #5: Cheap storage Hardware is much cheaper, more accessible and powerful now.
  • 15. Trend #5: Cheap storage Hardware is much cheaper, more accessible and powerful now. When scaling up, it makes more sense to add more computers (horizontal scaling) rather than to upgrade a single machine (vertical scaling) => Trend #5: Usage of a lot of cheap (cloud) hardware instead of a single powerful machine
  • 16. Modern trends: summary => Trend #1: Queries get more complex and require complex SQL operations => Trend #2: Some use cases require graphs or nested structures rather than table => Trend #3: The data model might evolve over time => Trend #4: Many more queries have to be handled => Trend #4’: The querying intensity varies => Trend #5: Usage of a lot of cheap (cloud) hardware instead of a single powerful machine
  • 17. Relational databases in the modern era Relational databases are not designed to handle many users or large data. Relational databases are not designed to be distributed over multiple computers. Relational databases are not efficient with data that is not meant to be stored in tables. Sometimes, the relational database principles (e.g. normalization) are too complex and slow when there is a lot of data or a complex model.
  • 18. NoSQL (=Not-Only-SQL) Main idea: adapt to the new trends/needs 1) To avoid making complex queries and joining many datasets, store as much as possible in a single record a) Consequently, the data structure in NoSQL is often not a table, but instead a dictionary, a tree, an array, etc. b) Yes, this leads to repetition and violates the normalization principle.
  • 19. NoSQL (=Not-Only-SQL) Main idea: adapt to the new trends/needs 1) To avoid making complex queries and joining many datasets, store as much as possible in a single record a) Consequently, the data structure in NoSQL is often not a table, but instead a dictionary, a tree, an array, etc. b) Yes, this leads to repetition and violates the normalization principle. 2) Use a data structure that is most appropriate for the problem (e.g. use a graph to model social networks)
  • 20. NoSQL (=Not-Only-SQL) Main idea: adapt to the new trends/needs 1) To avoid making complex queries and joining many datasets, store as much as possible in a single record a) Consequently, the data structure in NoSQL is often not a table, but instead a dictionary, a tree, an array, etc. b) Yes, this leads to repetition and violates the normalization principle. 2) Use a data structure that is most appropriate for the problem (e.g. use a graph to model social networks) 3) To handle many users and many queries, and to use the cheap hardware, distribute the data over many simple machines.
  • 22. What is NoSQL? It is a group of databases that attempts to provide more flexible way of data storage, while adapting to the new trends of intensive storage and accessible hardware. This idea started around 2009, and it has been already widely adopted.
  • 23. What is NoSQL? NoSQL stands for Not-Only-SQL (yes, some NoSQL databases support SQL operations too). NoSQL databases are NOT meant to replace relational databases, both have their use cases.
  • 24. â—Ź Flexible schema / schema less Characteristics of a NoSQL database Image from https://www.slideshare.net/mikecrabb/a-beginners-guide-to-nosql/30-SIDENOTEcolour_tabbyname_Gunthercolour_gingername_Mylocolour
  • 25. â—Ź Flexible schema / schema less â—Ź Non relational (forget about normalization today - but you have to know it for the exam :) ) Characteristics of a NoSQL database
  • 26. â—Ź Flexible schema / schema less â—Ź Non relational â—Ź Simple access compared to SQL (but not standard across products) Characteristics of a NoSQL database
  • 27. â—Ź Flexible schema / schema less â—Ź Non relational â—Ź Simple access compared to SQL (but not standard across products) â—Ź Basically, does not support complicated queries â—Ź Cheaper than RDBMS systems â—Ź Horizontally scalable â—Ź Replicated â—Ź Distributed Characteristics of a NoSQL database
  • 28.
  • 29. NoSQL databases Filip Ilievski (f.ilievski@vu.nl) Vrije Universiteit Amsterdam Half time
  • 31. A)KEY-VALUE DBs Each record in the database contains a unique key, which “hides” the value Analogue to key-value pairs in dictionaries/JSON Very simple and designed to work with a lot of data Each record can have a different structure/type of data
  • 35. A)KEY-VALUE DBs: Code for Redis Initialization import redis pool = redis.ConnectionPool(host='localhost', port=6379, db=0) r = redis.Redis(connection_pool=pool) Store a record my_key =”B” my_value=”triangle” r.set(my_key, my_value) Obtain a record my_key=”B” r.get(my_key)
  • 36. B) DOCUMENT-STORE DBs Again, simple and designed to work with a lot of data Very similar to a key-value database Main difference is that you can actually see (and QUERY for) the values
  • 41. C) GRAPH DBs Graph databases focus on modelling the structure of the data Inspired by Euler’s graph theory, G=(E,V) Motivated mainly by social media networks
  • 47. C) GRAPH DBs: Neo4j
  • 48. D) COLUMN STORE Column data is saved together, instead of row data Super useful for data analytics Inspired by Google BigTable
  • 49. D) COLUMN STORE: Facebook’s Cassandra
  • 50. It is mainly about the size
  • 51. E) OTHER DATABASES Many other NoSQL databases exist that do not fall in these four categories: â—Ź Text search databases (e.g. ElasticSearch) â—Ź XML databases â—Ź ...
  • 52.
  • 53. Comparison of NoSQL categories
  • 55.
  • 56. So, which one to use? It depends In general, RDBMS is great for ensuring data consistency, when data validity is very important (think financial transactions and similar corporate applications) NoSQL is great ensuring high availability and speed rather than validity NoSQL is also great for many applications that are hard to imagine with RDBMSs: indexing text documents, social networks, storing coordinates, etc. Pick the right tool for the job!
  • 57.
  • 58. Me and NoSQL: Map studio Bachelor thesis: Use of political maps to render statistical data on countries, provinces, areas, cities, etc. We are drawing the maps from scratch, so we have to keep a huge number of coordinates in our database Typically the coordinates were stored only once, and retrieved many times. So the READ operation is very important to be efficient, but not the WRITE one.
  • 59. Me and NoSQL: Map studio Requirements: â—Ź Main one is response time (=how quick can the database return an answer/the data) â—Ź Secondary goal is horizontal scalability (=the database should allow one easily to split the data on multiple machines) What is less/not important: â—Ź consistency â—Ź concurrency
  • 63. Knowledge graphs A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. â—Ź It is a graph â—Ź It is semantic = the meaning of the data is encoded alongside the data in the graph, in the form of the ontology. A knowledge graph is self-descriptive, or, simply put, it provides a single place to find the data and understand what it’s all about. â—Ź It is smart = the ontology allows implicit information to be inferred from explicit data â—Ź It is alive = it can grow and get updated
  • 64. Knowledge graphs These principles are shared in the research on Semantic Web and Linked open data, aiming to construct a Web of things. Each world fact is represented as a subject-predicate-object triple and stored as Linked Open Data. By the end of 2016 Google’s knowledge graph apparently contained 70 billion connected facts. The Linked Open Data cloud also contains billions of facts (our LOD Laundromat collection at VU might be among the largest -> last counted 40B, but next run will contain much more).
  • 65. Take aways from today’s lecture 1) NoSQL fills the spectrum between flat files and relational DBs 2) NoSQL databases were introduced to fit the new trends: many users, many queries, complex data models, diverse data structures, and cheap hardware 3) The data structure in NoSQL is often not a table, but instead a dictionary, a tree, an array, etc. 4) Horizontal scaling, repetition of data, flexible schema 5) Classes of NoSQL DBs: graph, column, key-value, document, other 6) RDBMS or NoSQL? Pick the right tool for the job
  • 66. Acknowledgements Some slides have been copied from existing slides on Slideshare â—Ź https://www.slideshare.net/mikecrabb/a-beginners-guide-to-nosql/ â—Ź https://www.slideshare.net/bhaskar_vk/introduction-to-nosql-databases- 47768468 â—Ź https://www.slideshare.net/RTigger/sql-vs-no-sql https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f For an advanced presentation on NoSQL, see: â—Ź https://www.slideshare.net/quipo/nosql-databases-why-what-and-when/34-
  • 67. Thanks! Anything you would like to talk about?