SlideShare a Scribd company logo
DATABASES AND
HOW TO
CHOOSE THEM
Databases and how to choose them - January 2017
Index
1
Databases types2
3
4
Use cases
Best and bad practices
Key concepts
Databases and how to choose them - January 2017
Key concepts
Databases and how to choose them - January 2017
ACID vs BASE
● ACID:
● Atomicity. It contains the concept of transaction, as a group of tasks that must be performed against a database. If one element of a
transaction fails, the entire transaction fails.
● Consistency. This is usually defined like the property that guarantees that a transaction brings the database from one valid state (in a
formal sense, not in a functional one) to another. In ACID, consistency just implies a compliance with the defined rules, like constraints,
triggers, etc.
● Isolation. Each transaction must be independent by itself, meaning that it should not “see” the effects of other concurrent operations.
● Durability. This property ensures that once a transaction is complete, it will survive system failure, power loss and other types of system
breakdowns.
● BASE:
● Basically Available. This property states that the system ensures the availability of the data in a way: there will be a response to any
request (it could be inconsistent data or even a error).
● Soft-state. Due to the way from eventual consistency to actually consistency, the state of the system could change over time, even while
there is not an input operation over the database. Thus, the state of the system is called “soft”.
● Eventual consistency. After the system stops receiving input, when data have been propagated to every nodes, it will eventually become
consistent.
Databases and how to choose them - January 2017
CAP THEOREM
● CAP:
● Consistency. C in CAP actually means “linearizability”, which is a very specific and strong notion of consistency
that has nothing to do with the C in ACID (it has more to do with Atomic and Isolation, indeed). A typical way to
define it is like this: “if operation B started after operation A successfully completed, B must see the the system in the
same state as it was on completion of operation A, or a newer state”. Thus, a system is consistent if an update is
applied to all nodes at the same time.
name=Alice
name?
Alice
Databases and how to choose them - January 2017
CAP THEOREM
● CAP:
● Availability. A in CAP is defined as “every request received by a non-failing database node must result in a
non-error response”. This is both a strong and a weak requirement, since 100% of the requests must return a
response, but the response can take an unbounded (but finite) amount of time. As people tend to care more about
latency, a very slow response usually makes a system “not-available” for users.
Databases and how to choose them - January 2017
CAP THEOREM
● CAP:
● Partition Tolerance. P in CAP means… well, it is not clear. Some definitions of the concept state that the system
keeps on working even if some nodes, or the connection between two of them, fail. This kind of definition is what
drives to apply the CAP theorem to monolithic, single-node relational databases (they qualify as CA). A multi-node
system not requiring partition-tolerance would have to run on a network that never drops messages and whose
nodes can’t fail. Since this kind of system does not exist, P in CAP can’t be excluded by decision.
Databases and how to choose them - January 2017
CAP THEOREM
Databases and how to choose them - January 2017
Isolation
● Isolation.
In database systems, isolation determines how transaction integrity is visible to other users and systems. Though it’s often
used in a relaxed way, this property of ACID in a DBMS (Database Management System) is an important part of any
transactional system. This property specifies when and how the changes implemented in an operation become visible to
other parallel operations.
Acquiring locks on data is the way to achieve a good isolation level, so the most locks taken in an executing transaction, the
higher isolation level. On the other hand, locks have an impact on performance.
Databases and how to choose them - January 2017
Isolation
● Isolation levels.
ISOLATION LEVELS
READ
UNCOMMITED
READ
COMMITED
REPEATABLE
READS
SERIALIZABLE
CONCURRENCY
PHENOMENA
DIRTY READS
UNREAPEATABLE
READS
PHANTOM READS
Databases and how to choose them - January 2017
Indexes
● Indexes
A database index is a data structure that improves the speed of searches on a database table, with the trade-off of slower write performance,
due to additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to
search every row in a database table.
Databases and how to choose them - January 2017
Indexes
● Inverted indexes
An inverted index is a data structure that maps content to its locations in a database file (in contrast to a Forward Index, which maps from
documents to content). The purpose of an inverted index is to allow fast full text searches, at the cost of increased processing and intensive
use of resources.
This document
can be stored in
ElasticSearch.
1
ElasticSearch is a
document
oriented database
2
Databases and how to choose them - January 2017
Sharding
● Sharding
Shards are partitions of data within a database. Since each partition is smaller than the whole database, a query using the
shard key (the field that sets the partition) will avoid a full scan, so there will be a dramatic improvement in search
performance.
On the other side, sharding implies a strong dependency on the network, with higher latency when querying several
shards, as well as consistency concerns when data is replicated among several shards (as it should be, for high-availability
needs).
It also introduces additional complexity in design (partition key must be carefully chosen) and development (load
balancing, replication, failover, etc).
Databases and how to choose them - January 2017
Database types
{data}
Databases and how to choose them - January 2017
Database types
● Database types
As a first approach, we have the next kinds of databases:
● Relational
● Key-value, column-oriented
● Document-oriented
● Graph
We deliberately exclude the popular key-value type because of the naive approach of its players for several production use cases and the
overlapping of some features with some of the aforementioned.
Databases and how to choose them - January 2017
Database types
Relational columnar storage.
The concept of relational databases is wide-known and involves some of the topics already treated in this document, specially ACID.
Recently, the schema-less need has been covered by RDBMS also, so their strengths are the consistency under heavy read and write needs
and the popular knowledge in both design and query language.
Columnar storage can be seen as a transposition of the common row-storage, meaning that:
Columnar models are very useful for some use cases. A common example is selecting a unique field, or calculating an average. Instead of
going through every row and accessing to the field age, a columnar model allows accessing exactly to the area where age is stored.
This kind of models are just relational (thus, ACID), and they are suitable for use cases with needs of very good read performance till certain
limit in volume (say, under one Terabyte).
1, 2, 3; Alice, Bob, Charles; Adams, Brown, Cooper; 23, 42, 34
Databases and how to choose them - January 2017
Database types
● Column-oriented databases.
● A common misunderstanding is about columnar storage in relational databases and column oriented databases, such as Cassandra.
● Column oriented databases store data in column families as rows that have many columns associated with a row key. Column
families are groups of related data that are often accessed together.
● Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists
of multiple columns (and here it is where the key-value concept appears).
● This kind of databases are strongly dependent of design, since they are thought to be accessed by a key. Secondary indexes are
allowed but they do not bring good enough performance for operational needs.
users
1 “Name”: “Alice” “Surname”: ”Adams” “age”: “23”
2 “Name”: “Bob” “Surname”: ”Brown” “age”: “42”
3 “Name”: “Charles” “Surname”: ”Cooper” “age”: “34”
Databases and how to choose them - January 2017
Database types
● Document-oriented databases.
● Just like it sounds, document-oriented databases store documents, typically in a JSON format. They are a certain kind of key-value
storages, with the nuance of having an internal structure which is used by their engines to query for the data.
● The way of viewing the data seems similar to the one in relational databases, except for the need of a schema and relational
constraints.
● The main difference between two worlds is in the ACID vs BASE distinction, which translate to horizontal scaling capabilities.
● Thus, these systems can offer good performance operating with several Terabytes.
● ElasticSearch is a rare example of document-oriented database. It is very suitable for Full Text Search and its capabilities (making
use of the aforementioned inverted indexes) allow to solve non-defined searches in operational-use-cases time.
Databases and how to choose them - January 2017
Database types
● Graph databases.
● Graph databases use the mathematical concept of a graph to store data. Graphs consists of nodes, edges and properties, which are
used to query for the desired information.
● The main advantage of these systems is the high performance for certain use cases involving a lot of SQL-joins, since those cases
are about following nodes relations.
● Write performance (and read performance without joins) are under the ones offered by other systems, so this kind of databases are
quite polarized regarding the use case.
Databases and how to choose them - January 2017
Database types
● High-level comparison.
Relational
(row-based)
Relational
(columnar)
Document-oriented Key-value
column-oriented
Graph
Basic description Data structured in
rows
Data stored in
columns
Data stored in
(potentially)
unstructured
documents
Data structured as
key-value maps
Data structured as
nodes and edges
(graphs) with
relations
Strengths ACID
Good performance
Low complexity
ACID
Good read
performance
Scalability
Good read
performance
Scalability
Good write
performance
ACID
Good read
performance
Weaknesses Scalability Scalability
Counter-intuitive
Consistency
Complexity
Strong design
dependency,
use-case
polarization
Scalability
Complexity
Counter-intuitive
Typical use
cases
Online operational
with ACID needs
Read-only without
scaling-out
Heavy readings with
high volume of records
Heavy writings with
high volume of
records and reads
by key
SQL-Joins
(relations)
Key players PostgreSQL PostgreSQL ElasticSearch
MongoDB
Cassandra Neo4J
Databases and how to choose them - January 2017
Database types
● Radar graph.
Databases and how to choose them - January 2017
Use cases
Databases and how to choose them - January 2017
Use cases
● CRUD over an entity
● For typical CRUD operations (and, maybe, listing) over a certain entity, in a RESTful way, the very first option should be a
RDBMS. They provide:
○ good write and read performance
○ (typically) lots of features
○ (typically) the advantage of the SQL modeling and language, which qualifies them for a straightforward usage.
● Note that CRUD over an entity usually implies accessing data by an unique key, which would be the entity id. Accessing one, or
several (listing), entities by other fields, would need index creation.
● Both scenarios fit well in a RDBMS while the WHERE clause fields were known, but the possibility of scaling out has to be
considered. If volume of data may grow too much, a document-oriented database could be the logical alternative.
● Particularizing, MongoDB covers essentially the same use cases than PostgreSQL, with the former being the chosen when
volume is (or could be) high, and the last being the election when ACID capabilities are more important.
Databases and how to choose them - January 2017
Use cases
● FTS or searching by any field
● Performing searches by any field involves the creation of lots of indexes in the way PostgreSQL or MongoDB treat them.
● Instead of that, using ElasticSearch would be much more effective. The same logic applies for Free Text Search, with the
inverted indexes of Elastic being the solution.
● The intensive use of resources made by ElasticSearch prevents it to be used in other use cases, like the aforementioned CRUD
over an entity or much more concrete accesses (id or known fields).
Databases and how to choose them - January 2017
Use cases
● High-volume loads
● Cassandra is the system that provides better write performance and scalability.
● A typical use case could be a log system, if it is just accessed by date or by component name.
● If there is a high volume of online writes, but access can not be done by a unique field, then we can choose among others
products, attending to the previous considerations.
● It is important to know that reindexing operations over the database has a big impact in performance. If it is not possible to
switch off the indexes while writing (like in a typical online operative), MongoDB and PostgreSQL could be worse options than
ElasticSearch.
● On the other hand, in high writing and reading scenarios, consistency becomes relevant, so PostgreSQL may have the edge.
Databases and how to choose them - January 2017
Use cases
● Relations
● Fraud detection or a recommendation engine are typical cases in which a lot of SQL joins are needed, since they are all about
querying several entities of the same type by a variety of fields, and maybe with entities of a different type.
● In a graph, that’s about following a path among several nodes, so it is natively more efficient to use a graph database.
● Scalability or consistency could be concerns in those cases.
Databases and how to choose them - January 2017
Use cases
● Analytics
● Analytics use cases usually involve:
○ a huge volume of data
○ a much more relaxed time of processing
○ a much lower level of concurrence.
● For those cases, jobs accesing to a DFS can be enough.
Databases and how to choose them - January 2017
Best and bad practices
Databases and how to choose them - January 2017
Best and bad practices
● Best practices:
● Choose the right database for the each use case.
● A new “materialized view” is better than fight with problems. There is not a silver bullet.
● Avoid BLOBs
● Schemas are good: keep order and are intuitive.
● Mind the CAP
Databases and how to choose them - January 2017
Best and bad practices
● Bad practices:
● Over…
○ indexing
○ normalization
○ provisioning of resources
● Relational mindset
● Split brain
● Fashion victim
Databases and how to choose them - January 2017
Questions
Databases and how to choose them - January 2017
Thanks!

More Related Content

What's hot

Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Spark Summit
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
 
A Travel Through Mesos
A Travel Through MesosA Travel Through Mesos
A Travel Through Mesos
Datio Big Data
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
Omid Vahdaty
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
Alessandro Menabò
 
RDD
RDDRDD
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
Vijay Srinivas Agneeswaran, Ph.D
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
Spark Summit
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
Kyle Burke
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
Vasil Remeniuk
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
SAP Concur
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
Sigmoid
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
 
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache CassandraApache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Anant Corporation
 

What's hot (20)

Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 
A Travel Through Mesos
A Travel Through MesosA Travel Through Mesos
A Travel Through Mesos
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
RDD
RDDRDD
RDD
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
 
Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
Apache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache CassandraApache Cassandra Lunch #70: Basics of Apache Cassandra
Apache Cassandra Lunch #70: Basics of Apache Cassandra
 

Viewers also liked

Del Mono al QA
Del Mono al QADel Mono al QA
Del Mono al QA
Datio Big Data
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Datio Big Data
 
DC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern apps
Datio Big Data
 
Security&Governance
Security&GovernanceSecurity&Governance
Security&Governance
Datio Big Data
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by Datio
Datio Big Data
 
PDP Your personal development plan
PDP Your personal development planPDP Your personal development plan
PDP Your personal development plan
Datio Big Data
 

Viewers also liked (6)

Del Mono al QA
Del Mono al QADel Mono al QA
Del Mono al QA
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
DC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern apps
 
Security&Governance
Security&GovernanceSecurity&Governance
Security&Governance
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by Datio
 
PDP Your personal development plan
PDP Your personal development planPDP Your personal development plan
PDP Your personal development plan
 

Similar to Databases and how to choose them

Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
ijdms
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
Mohamed Galal
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IJCERT JOURNAL
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Mohamed Galal
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
pinstechwork
 
No sql database
No sql databaseNo sql database
No sql database
vishal gupta
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
pinstechwork
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
Rahul P
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
Editor Jacotech
 
Nosql
NosqlNosql
Nosql
NosqlNosql
Nosql
ROXTAD71
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
Adenilson Lima Diniz
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
GhanashyamBK1
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
INFOGAIN PUBLICATION
 
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptxDATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
Laxmi Pandya
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
ijiert bestjournal
 

Similar to Databases and how to choose them (20)

Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
No sql database
No sql databaseNo sql database
No sql database
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
Nosql
NosqlNosql
Nosql
 
Nosql
NosqlNosql
Nosql
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptxDATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 

More from Datio Big Data

Búsqueda IA
Búsqueda IABúsqueda IA
Búsqueda IA
Datio Big Data
 
Descubriendo la Inteligencia Artificial
Descubriendo la Inteligencia ArtificialDescubriendo la Inteligencia Artificial
Descubriendo la Inteligencia Artificial
Datio Big Data
 
Learning Python. Level 0
Learning Python. Level 0Learning Python. Level 0
Learning Python. Level 0
Datio Big Data
 
Learn Python
Learn PythonLearn Python
Learn Python
Datio Big Data
 
How to document without dying in the attempt
How to document without dying in the attemptHow to document without dying in the attempt
How to document without dying in the attempt
Datio Big Data
 
Developers on test
Developers on testDevelopers on test
Developers on test
Datio Big Data
 
Ceph: The Storage System of the Future
Ceph: The Storage System of the FutureCeph: The Storage System of the Future
Ceph: The Storage System of the Future
Datio Big Data
 
Datio OpenStack
Datio OpenStackDatio OpenStack
Datio OpenStack
Datio Big Data
 
Quality Assurance Glossary
Quality Assurance GlossaryQuality Assurance Glossary
Quality Assurance Glossary
Datio Big Data
 
Data Integration
Data IntegrationData Integration
Data Integration
Datio Big Data
 
Gamification: from buzzword to reality
Gamification: from buzzword to realityGamification: from buzzword to reality
Gamification: from buzzword to reality
Datio Big Data
 
Pandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data ManipulationPandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data Manipulation
Datio Big Data
 

More from Datio Big Data (12)

Búsqueda IA
Búsqueda IABúsqueda IA
Búsqueda IA
 
Descubriendo la Inteligencia Artificial
Descubriendo la Inteligencia ArtificialDescubriendo la Inteligencia Artificial
Descubriendo la Inteligencia Artificial
 
Learning Python. Level 0
Learning Python. Level 0Learning Python. Level 0
Learning Python. Level 0
 
Learn Python
Learn PythonLearn Python
Learn Python
 
How to document without dying in the attempt
How to document without dying in the attemptHow to document without dying in the attempt
How to document without dying in the attempt
 
Developers on test
Developers on testDevelopers on test
Developers on test
 
Ceph: The Storage System of the Future
Ceph: The Storage System of the FutureCeph: The Storage System of the Future
Ceph: The Storage System of the Future
 
Datio OpenStack
Datio OpenStackDatio OpenStack
Datio OpenStack
 
Quality Assurance Glossary
Quality Assurance GlossaryQuality Assurance Glossary
Quality Assurance Glossary
 
Data Integration
Data IntegrationData Integration
Data Integration
 
Gamification: from buzzword to reality
Gamification: from buzzword to realityGamification: from buzzword to reality
Gamification: from buzzword to reality
 
Pandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data ManipulationPandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data Manipulation
 

Recently uploaded

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

Databases and how to choose them

  • 2. Databases and how to choose them - January 2017 Index 1 Databases types2 3 4 Use cases Best and bad practices Key concepts
  • 3. Databases and how to choose them - January 2017 Key concepts
  • 4. Databases and how to choose them - January 2017 ACID vs BASE ● ACID: ● Atomicity. It contains the concept of transaction, as a group of tasks that must be performed against a database. If one element of a transaction fails, the entire transaction fails. ● Consistency. This is usually defined like the property that guarantees that a transaction brings the database from one valid state (in a formal sense, not in a functional one) to another. In ACID, consistency just implies a compliance with the defined rules, like constraints, triggers, etc. ● Isolation. Each transaction must be independent by itself, meaning that it should not “see” the effects of other concurrent operations. ● Durability. This property ensures that once a transaction is complete, it will survive system failure, power loss and other types of system breakdowns. ● BASE: ● Basically Available. This property states that the system ensures the availability of the data in a way: there will be a response to any request (it could be inconsistent data or even a error). ● Soft-state. Due to the way from eventual consistency to actually consistency, the state of the system could change over time, even while there is not an input operation over the database. Thus, the state of the system is called “soft”. ● Eventual consistency. After the system stops receiving input, when data have been propagated to every nodes, it will eventually become consistent.
  • 5. Databases and how to choose them - January 2017 CAP THEOREM ● CAP: ● Consistency. C in CAP actually means “linearizability”, which is a very specific and strong notion of consistency that has nothing to do with the C in ACID (it has more to do with Atomic and Isolation, indeed). A typical way to define it is like this: “if operation B started after operation A successfully completed, B must see the the system in the same state as it was on completion of operation A, or a newer state”. Thus, a system is consistent if an update is applied to all nodes at the same time. name=Alice name? Alice
  • 6. Databases and how to choose them - January 2017 CAP THEOREM ● CAP: ● Availability. A in CAP is defined as “every request received by a non-failing database node must result in a non-error response”. This is both a strong and a weak requirement, since 100% of the requests must return a response, but the response can take an unbounded (but finite) amount of time. As people tend to care more about latency, a very slow response usually makes a system “not-available” for users.
  • 7. Databases and how to choose them - January 2017 CAP THEOREM ● CAP: ● Partition Tolerance. P in CAP means… well, it is not clear. Some definitions of the concept state that the system keeps on working even if some nodes, or the connection between two of them, fail. This kind of definition is what drives to apply the CAP theorem to monolithic, single-node relational databases (they qualify as CA). A multi-node system not requiring partition-tolerance would have to run on a network that never drops messages and whose nodes can’t fail. Since this kind of system does not exist, P in CAP can’t be excluded by decision.
  • 8. Databases and how to choose them - January 2017 CAP THEOREM
  • 9. Databases and how to choose them - January 2017 Isolation ● Isolation. In database systems, isolation determines how transaction integrity is visible to other users and systems. Though it’s often used in a relaxed way, this property of ACID in a DBMS (Database Management System) is an important part of any transactional system. This property specifies when and how the changes implemented in an operation become visible to other parallel operations. Acquiring locks on data is the way to achieve a good isolation level, so the most locks taken in an executing transaction, the higher isolation level. On the other hand, locks have an impact on performance.
  • 10. Databases and how to choose them - January 2017 Isolation ● Isolation levels. ISOLATION LEVELS READ UNCOMMITED READ COMMITED REPEATABLE READS SERIALIZABLE CONCURRENCY PHENOMENA DIRTY READS UNREAPEATABLE READS PHANTOM READS
  • 11. Databases and how to choose them - January 2017 Indexes ● Indexes A database index is a data structure that improves the speed of searches on a database table, with the trade-off of slower write performance, due to additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table.
  • 12. Databases and how to choose them - January 2017 Indexes ● Inverted indexes An inverted index is a data structure that maps content to its locations in a database file (in contrast to a Forward Index, which maps from documents to content). The purpose of an inverted index is to allow fast full text searches, at the cost of increased processing and intensive use of resources. This document can be stored in ElasticSearch. 1 ElasticSearch is a document oriented database 2
  • 13. Databases and how to choose them - January 2017 Sharding ● Sharding Shards are partitions of data within a database. Since each partition is smaller than the whole database, a query using the shard key (the field that sets the partition) will avoid a full scan, so there will be a dramatic improvement in search performance. On the other side, sharding implies a strong dependency on the network, with higher latency when querying several shards, as well as consistency concerns when data is replicated among several shards (as it should be, for high-availability needs). It also introduces additional complexity in design (partition key must be carefully chosen) and development (load balancing, replication, failover, etc).
  • 14. Databases and how to choose them - January 2017 Database types {data}
  • 15. Databases and how to choose them - January 2017 Database types ● Database types As a first approach, we have the next kinds of databases: ● Relational ● Key-value, column-oriented ● Document-oriented ● Graph We deliberately exclude the popular key-value type because of the naive approach of its players for several production use cases and the overlapping of some features with some of the aforementioned.
  • 16. Databases and how to choose them - January 2017 Database types Relational columnar storage. The concept of relational databases is wide-known and involves some of the topics already treated in this document, specially ACID. Recently, the schema-less need has been covered by RDBMS also, so their strengths are the consistency under heavy read and write needs and the popular knowledge in both design and query language. Columnar storage can be seen as a transposition of the common row-storage, meaning that: Columnar models are very useful for some use cases. A common example is selecting a unique field, or calculating an average. Instead of going through every row and accessing to the field age, a columnar model allows accessing exactly to the area where age is stored. This kind of models are just relational (thus, ACID), and they are suitable for use cases with needs of very good read performance till certain limit in volume (say, under one Terabyte). 1, 2, 3; Alice, Bob, Charles; Adams, Brown, Cooper; 23, 42, 34
  • 17. Databases and how to choose them - January 2017 Database types ● Column-oriented databases. ● A common misunderstanding is about columnar storage in relational databases and column oriented databases, such as Cassandra. ● Column oriented databases store data in column families as rows that have many columns associated with a row key. Column families are groups of related data that are often accessed together. ● Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists of multiple columns (and here it is where the key-value concept appears). ● This kind of databases are strongly dependent of design, since they are thought to be accessed by a key. Secondary indexes are allowed but they do not bring good enough performance for operational needs. users 1 “Name”: “Alice” “Surname”: ”Adams” “age”: “23” 2 “Name”: “Bob” “Surname”: ”Brown” “age”: “42” 3 “Name”: “Charles” “Surname”: ”Cooper” “age”: “34”
  • 18. Databases and how to choose them - January 2017 Database types ● Document-oriented databases. ● Just like it sounds, document-oriented databases store documents, typically in a JSON format. They are a certain kind of key-value storages, with the nuance of having an internal structure which is used by their engines to query for the data. ● The way of viewing the data seems similar to the one in relational databases, except for the need of a schema and relational constraints. ● The main difference between two worlds is in the ACID vs BASE distinction, which translate to horizontal scaling capabilities. ● Thus, these systems can offer good performance operating with several Terabytes. ● ElasticSearch is a rare example of document-oriented database. It is very suitable for Full Text Search and its capabilities (making use of the aforementioned inverted indexes) allow to solve non-defined searches in operational-use-cases time.
  • 19. Databases and how to choose them - January 2017 Database types ● Graph databases. ● Graph databases use the mathematical concept of a graph to store data. Graphs consists of nodes, edges and properties, which are used to query for the desired information. ● The main advantage of these systems is the high performance for certain use cases involving a lot of SQL-joins, since those cases are about following nodes relations. ● Write performance (and read performance without joins) are under the ones offered by other systems, so this kind of databases are quite polarized regarding the use case.
  • 20. Databases and how to choose them - January 2017 Database types ● High-level comparison. Relational (row-based) Relational (columnar) Document-oriented Key-value column-oriented Graph Basic description Data structured in rows Data stored in columns Data stored in (potentially) unstructured documents Data structured as key-value maps Data structured as nodes and edges (graphs) with relations Strengths ACID Good performance Low complexity ACID Good read performance Scalability Good read performance Scalability Good write performance ACID Good read performance Weaknesses Scalability Scalability Counter-intuitive Consistency Complexity Strong design dependency, use-case polarization Scalability Complexity Counter-intuitive Typical use cases Online operational with ACID needs Read-only without scaling-out Heavy readings with high volume of records Heavy writings with high volume of records and reads by key SQL-Joins (relations) Key players PostgreSQL PostgreSQL ElasticSearch MongoDB Cassandra Neo4J
  • 21. Databases and how to choose them - January 2017 Database types ● Radar graph.
  • 22. Databases and how to choose them - January 2017 Use cases
  • 23. Databases and how to choose them - January 2017 Use cases ● CRUD over an entity ● For typical CRUD operations (and, maybe, listing) over a certain entity, in a RESTful way, the very first option should be a RDBMS. They provide: ○ good write and read performance ○ (typically) lots of features ○ (typically) the advantage of the SQL modeling and language, which qualifies them for a straightforward usage. ● Note that CRUD over an entity usually implies accessing data by an unique key, which would be the entity id. Accessing one, or several (listing), entities by other fields, would need index creation. ● Both scenarios fit well in a RDBMS while the WHERE clause fields were known, but the possibility of scaling out has to be considered. If volume of data may grow too much, a document-oriented database could be the logical alternative. ● Particularizing, MongoDB covers essentially the same use cases than PostgreSQL, with the former being the chosen when volume is (or could be) high, and the last being the election when ACID capabilities are more important.
  • 24. Databases and how to choose them - January 2017 Use cases ● FTS or searching by any field ● Performing searches by any field involves the creation of lots of indexes in the way PostgreSQL or MongoDB treat them. ● Instead of that, using ElasticSearch would be much more effective. The same logic applies for Free Text Search, with the inverted indexes of Elastic being the solution. ● The intensive use of resources made by ElasticSearch prevents it to be used in other use cases, like the aforementioned CRUD over an entity or much more concrete accesses (id or known fields).
  • 25. Databases and how to choose them - January 2017 Use cases ● High-volume loads ● Cassandra is the system that provides better write performance and scalability. ● A typical use case could be a log system, if it is just accessed by date or by component name. ● If there is a high volume of online writes, but access can not be done by a unique field, then we can choose among others products, attending to the previous considerations. ● It is important to know that reindexing operations over the database has a big impact in performance. If it is not possible to switch off the indexes while writing (like in a typical online operative), MongoDB and PostgreSQL could be worse options than ElasticSearch. ● On the other hand, in high writing and reading scenarios, consistency becomes relevant, so PostgreSQL may have the edge.
  • 26. Databases and how to choose them - January 2017 Use cases ● Relations ● Fraud detection or a recommendation engine are typical cases in which a lot of SQL joins are needed, since they are all about querying several entities of the same type by a variety of fields, and maybe with entities of a different type. ● In a graph, that’s about following a path among several nodes, so it is natively more efficient to use a graph database. ● Scalability or consistency could be concerns in those cases.
  • 27. Databases and how to choose them - January 2017 Use cases ● Analytics ● Analytics use cases usually involve: ○ a huge volume of data ○ a much more relaxed time of processing ○ a much lower level of concurrence. ● For those cases, jobs accesing to a DFS can be enough.
  • 28. Databases and how to choose them - January 2017 Best and bad practices
  • 29. Databases and how to choose them - January 2017 Best and bad practices ● Best practices: ● Choose the right database for the each use case. ● A new “materialized view” is better than fight with problems. There is not a silver bullet. ● Avoid BLOBs ● Schemas are good: keep order and are intuitive. ● Mind the CAP
  • 30. Databases and how to choose them - January 2017 Best and bad practices ● Bad practices: ● Over… ○ indexing ○ normalization ○ provisioning of resources ● Relational mindset ● Split brain ● Fashion victim
  • 31. Databases and how to choose them - January 2017 Questions
  • 32. Databases and how to choose them - January 2017 Thanks!