Database Awareness
09 Mar 2021
@irensaltali
Who am I?
İren Saltalı
.NET Consultant @kloia
Blog : irensaltali.medium.com
Tweet : @irensaltali
LinkedIn : /in/irensaltali
GitHub : github.com/irensaltali
2
Agenda
• Why we need Database Awareness?
• Document vs Relational
• Row-based vs Column-based
• In-memory Database vs In-memory Data grids
• Graph
• Time-series
• Solr vs ElasticSearch
• Event Store
Why we need Database Awareness?
Databases directly affect our system performance, scalability, durability, consistency,
cost, and even how we code. We need to choose the database that meets our demands best. To
do that, we have to know two main topics.
• How database works (Database Awareness)
• How our system works (System Awareness)
4
Document
Unstructured
Frequent updates to the data structure
Application-level joins
Horizontal scaling
Document based data modeling
MongoDB, Apache CouchDB, Couchbase
Table
Schema
No/less updates to the data structure
Server-level joins
Vertical scaling
Relational data modeling
MSSQL, MySQL, PostgreSQL
vs
5
Document vs Relational
6
Document vs Relational – Use Cases
Document
• Content management
• Logging
• Storing third party system’s data
• Web analytics
RDMS
• Banking/Finance
• Booking
• ERP
7
Row-based vs Column-based
Name City Age
İren Ankara 34
Seren Yalova 31
Bilgehan İstanbul 25
İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25
İren Seren Bilgehan Anlara Yalova İstanbul 34 31 25
Row-based
Column-based
Row-based vs Column-based - Write
İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25 Doğa Ankara 2
İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2
Row-based
Column-based
Doğa Ankara 2
New data
Row-based vs Column-based - Read
İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25 Doğa Ankara 2
İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2
Row-based
Column-based
Select * İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2
Select Sum(Age) İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2
10
IMDB vs IMDG
SQL support
No MPP
Replace RDBMS
Network Latency
Redis
No/less SQL support
Massively Parallel Processing
Can’t replace RDBMS
On same server with application
Hazelcast
vs
Graph Databases
A graph is composed of two elements: a node and a relationship.
• Nodes represent entities.
• Edges (graphs, relationships), are the lines that connect nodes
to other nodes.
• Edges can be directed or undirected.
• Edges can store properties represented by key/value pairs.
• High performance on graph-like queries.
Some graph databases: Amazon Neptune, Neo4j, OrientDB
11
image from https://aws.amazon.com/nosql/graph/
Times Series Database (TSDB)
A time series database (TSDB) is a database optimized for time-
stamped or time series data.
• Built specifically for handling metrics and events or
measurements that are time-stamped.
• Discrete characteristics from its continuous values.
• Best for server metrics, application performance monitoring,
network data, sensor data, events, clicks, trades in a market.
Some times series databases: Prometheus, Graphite, InfluxDB, Amazon Timestream
12
Solr vs ElasticSearch
13
XML, CSV, JSON, DB, Word, Pdf
DBC, CSV, XML, Tika, URL, Flat File
REST, Schemaless
Lucene Query
Span queries, Autocomplete, Faceting, Spatial/geo search
Visualisation: Banana (Port of Kibana)
Hard to manage scaling
JSON
ActiveMQ, Amazon SQS, CouchDB, DynamoDB, FileSystem, Git,
GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j,
OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter,
Wikipedia
Schemaless
Lucene Query, Query DSL
Span queries, Autocomplete, Faceting, Spatial/geo search
Visualisation: Kibana
Built for horizontal scaling
vs
Event Store
An event store databases optimized for storage of events.
• Event are not allowed to be changed.
• Optimized for writes
• Reproducibility
• Snapshots
Some event stores: IBM Db2 Event Store, EventStoreDB, NEventStore
14
Q & A
Thank you for listening.
blog.kloia.com @kloia_com
kloia.com
@irensaltali
RESOURCES MENTIONED IN
THIS SESSION WILL BE
AVAILABLE ON MY TWITTER
Sources
• https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-
management-systems-and-models
• https://www.flydata.com/blog/whats-unique-about-a-columnar-database/
• https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
• https://dataschool.com/data-modeling-101/row-vs-column-oriented-databases/
• https://www.youtube.com/watch?v=Vw1fCeD06YI
• https://en.wikipedia.org/wiki/Time_series_database
• https://www.influxdata.com/time-series-database/
• https://logz.io/blog/solr-vs-elasticsearch/
• https://solr-vs-elasticsearch.com/
• https://en.wikipedia.org/wiki/Event_store
• https://docs.microsoft.com/en-us/dotnet/architecture/cloud-native/relational-vs-nosql-data

Database awareness

  • 1.
    Database Awareness 09 Mar2021 @irensaltali
  • 2.
    Who am I? İrenSaltalı .NET Consultant @kloia Blog : irensaltali.medium.com Tweet : @irensaltali LinkedIn : /in/irensaltali GitHub : github.com/irensaltali 2
  • 3.
    Agenda • Why weneed Database Awareness? • Document vs Relational • Row-based vs Column-based • In-memory Database vs In-memory Data grids • Graph • Time-series • Solr vs ElasticSearch • Event Store
  • 4.
    Why we needDatabase Awareness? Databases directly affect our system performance, scalability, durability, consistency, cost, and even how we code. We need to choose the database that meets our demands best. To do that, we have to know two main topics. • How database works (Database Awareness) • How our system works (System Awareness) 4
  • 5.
    Document Unstructured Frequent updates tothe data structure Application-level joins Horizontal scaling Document based data modeling MongoDB, Apache CouchDB, Couchbase Table Schema No/less updates to the data structure Server-level joins Vertical scaling Relational data modeling MSSQL, MySQL, PostgreSQL vs 5 Document vs Relational
  • 6.
    6 Document vs Relational– Use Cases Document • Content management • Logging • Storing third party system’s data • Web analytics RDMS • Banking/Finance • Booking • ERP
  • 7.
    7 Row-based vs Column-based NameCity Age İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25 İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25 İren Seren Bilgehan Anlara Yalova İstanbul 34 31 25 Row-based Column-based
  • 8.
    Row-based vs Column-based- Write İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25 Doğa Ankara 2 İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2 Row-based Column-based Doğa Ankara 2 New data
  • 9.
    Row-based vs Column-based- Read İren Ankara 34 Seren Yalova 31 Bilgehan İstanbul 25 Doğa Ankara 2 İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2 Row-based Column-based Select * İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2 Select Sum(Age) İren Seren Bilgehan Doğa Ankara Yalova İstanbul Ankara 34 31 25 2
  • 10.
    10 IMDB vs IMDG SQLsupport No MPP Replace RDBMS Network Latency Redis No/less SQL support Massively Parallel Processing Can’t replace RDBMS On same server with application Hazelcast vs
  • 11.
    Graph Databases A graphis composed of two elements: a node and a relationship. • Nodes represent entities. • Edges (graphs, relationships), are the lines that connect nodes to other nodes. • Edges can be directed or undirected. • Edges can store properties represented by key/value pairs. • High performance on graph-like queries. Some graph databases: Amazon Neptune, Neo4j, OrientDB 11 image from https://aws.amazon.com/nosql/graph/
  • 12.
    Times Series Database(TSDB) A time series database (TSDB) is a database optimized for time- stamped or time series data. • Built specifically for handling metrics and events or measurements that are time-stamped. • Discrete characteristics from its continuous values. • Best for server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market. Some times series databases: Prometheus, Graphite, InfluxDB, Amazon Timestream 12
  • 13.
    Solr vs ElasticSearch 13 XML,CSV, JSON, DB, Word, Pdf DBC, CSV, XML, Tika, URL, Flat File REST, Schemaless Lucene Query Span queries, Autocomplete, Faceting, Spatial/geo search Visualisation: Banana (Port of Kibana) Hard to manage scaling JSON ActiveMQ, Amazon SQS, CouchDB, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia Schemaless Lucene Query, Query DSL Span queries, Autocomplete, Faceting, Spatial/geo search Visualisation: Kibana Built for horizontal scaling vs
  • 14.
    Event Store An eventstore databases optimized for storage of events. • Event are not allowed to be changed. • Optimized for writes • Reproducibility • Snapshots Some event stores: IBM Db2 Event Store, EventStoreDB, NEventStore 14
  • 15.
    Q & A Thankyou for listening. blog.kloia.com @kloia_com kloia.com @irensaltali RESOURCES MENTIONED IN THIS SESSION WILL BE AVAILABLE ON MY TWITTER
  • 16.
    Sources • https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database- management-systems-and-models • https://www.flydata.com/blog/whats-unique-about-a-columnar-database/ •https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1 • https://dataschool.com/data-modeling-101/row-vs-column-oriented-databases/ • https://www.youtube.com/watch?v=Vw1fCeD06YI • https://en.wikipedia.org/wiki/Time_series_database • https://www.influxdata.com/time-series-database/ • https://logz.io/blog/solr-vs-elasticsearch/ • https://solr-vs-elasticsearch.com/ • https://en.wikipedia.org/wiki/Event_store • https://docs.microsoft.com/en-us/dotnet/architecture/cloud-native/relational-vs-nosql-data