| © Copyright 2022, InfluxData
How to choose the
right database for
your application
Charles Mahler, InfluxData
November 2-3
Influxdays.com
| © Copyright 2022, InfluxData
Agenda
3
● Overview of the current database landscape and trends
● What makes databases perform differently?
● Overview of different database types
● Future Database trends/features
● Q/A
| © Copyright 2022, InfluxData
Things to keep in mind
4
● Don’t fall for hype- keep it simple
● Nothing is magical, always tradeoffs
● Many databases have overlap within categories and use cases
● Some generalizations being made for simplicity
| © Copyright 2022, InfluxData
Types of Databases
5
Relational Document
Key-Value Graph
Time series Columnar
In-Memory Search
Vector NewSQL
Multi-model
| © Copyright 2022, InfluxData
Current Database Landscape
6
| © Copyright 2022, InfluxData
Database Trends
7
| © Copyright 2022, InfluxData
What makes databases perform
differently?
8
| © Copyright 2022, InfluxData
Database Design Considerations
9
● On disk data format
● Indexing data structure
● Disk-based vs memory-based
● Compression
● Hot vs Cold data
● Durability/Recovery
| © Copyright 2022, InfluxData
How to Choose the Right Database
10
● Data access patterns
● Transaction isolation requirements
● Scalability requirements
● Business stage
● In-house knowledge
| © Copyright 2022, InfluxData
Relational vs NoSQL
● B-tree index
● Table based structure
● Defined schema
● Strong consistency- ACID
● Vertical scaling
11
● LSM tree index
● Multiple types of storage
● Flexible schema
● Consistency tradeoffs- BASE
● Horizontal scaling
| © Copyright 2022, InfluxData
Relational Databases
12
| © Copyright 2022, InfluxData
Relational Database Overview
13
● Data follows relational model and is stored in tabular form
● Stored on disk as rows
● SQL used for queries
● Examples- PostgreSQL, MySQL
| © Copyright 2022, InfluxData
Relational Database Pros and Cons
14
● Solid performance across
broad range of applications
● Strong ecosystem and battle
tested
● Data consistency
● Challenging to scale
horizontally
● Can struggle with high write
volume workloads
● Defined schema
| © Copyright 2022, InfluxData
Relational Database Use Cases
15
● Suitable for most general applications
● Any application where ACID is required
| © Copyright 2022, InfluxData
Key-Value Databases
16
| © Copyright 2022, InfluxData
Key-Value Database Overview
17
● Simplest type of NoSQL database- effectively a hash table
● Keys point to unstructured blob of data
● Examples- DynamoDB, Redis, Riak
| © Copyright 2022, InfluxData
Key-Value Database Pros and Cons
18
● Great read and write
performance
● Flexible schema
● Scalable
● Weak consistency
● Minimal querying capabilities
| © Copyright 2022, InfluxData
Key-Value Database Use Cases
19
● Personalized recommendations
● Session management
● Real-time features
| © Copyright 2022, InfluxData
Document Databases
20
| © Copyright 2022, InfluxData
Document Database Overview
21
● Extension of key-value database with added support for
metadata
● Semi-structured data format
● Documents hold all relevant data rather than relying on joins
● Examples - MongoDB, CouchDB
| © Copyright 2022, InfluxData
Document Database Pros and Cons
22
● Flexible Schema
● Performance
● Data locality
● Documents can map better
to application needs
● Scalable
● Consistency
● Relationships between
documents
| © Copyright 2022, InfluxData
Document Database Use Cases
23
● Suitable for most general applications
● Applications where rapid iteration is beneficial
| © Copyright 2022, InfluxData
Graph Databases
24
| © Copyright 2022, InfluxData
Graph Database Overview
25
● Used for storing and analyzing relationships between
connected data sets
● Specialized query languages
● Native vs non-native graph storage
● Examples- Neo4J, DGraph
| © Copyright 2022, InfluxData
Graph Database Pros and Cons
26
● Fast queries for graph
analysis
● Developer productivity -
query language, built-in
algorithms
● Flexible schema
● Easy to establish new
relationships between data
points
● Not ideal if application data
isn’t highly connected
| © Copyright 2022, InfluxData
Graph Database Use Cases
27
● Fraud detection
● Social networks
● Recommendation features
| © Copyright 2022, InfluxData
Time Series Databases
28
| © Copyright 2022, InfluxData
Time Series Database Overview
29
● Designed for time series data
● Must be able to handle queries at both ends of the database
spectrum
● Optimized for high write throughput and querying based on
time ranges
● Built in features for data lifecycle management
● Examples- InfluxDB, TimescaleDB, QuestDB
| © Copyright 2022, InfluxData
Time Series Database Pros and Cons
30
● Very fast data ingest and
query performance
● Developer productivity- data
retention, aggregations,
query languages
● Update and deletion
performance
| © Copyright 2022, InfluxData
Time Series Database Use Cases
31
● Monitoring
● IoT applications
● Financial data
● Analytics and event tracking
| © Copyright 2022, InfluxData
Columnar Databases
32
| © Copyright 2022, InfluxData
Columnar Database Overview
33
● Data is stored in column format rather than row
● Column format has benefits for compression and processing
speed
● Used heavily for analytics workloads
● Examples- Clickhouse, Vertica, Redshift
| © Copyright 2022, InfluxData
Columnar Database Pros and Cons
34
● More efficient for analytic
type queries
● Better data compression
● SIMD processing
● Writing data can be less
efficient
● Slower if accessing multiple
column values
| © Copyright 2022, InfluxData
Performance
35
| © Copyright 2022, InfluxData
Columnar Database Use Cases
36
● Analytics
● Data warehousing
● Observability
| © Copyright 2022, InfluxData
In-Memory Databases
37
| © Copyright 2022, InfluxData
In-Memory Database Overview
38
● Databases where data is stored entirely in RAM
● Can use fully optimized data structures without tradeoffs for
interacting with disk
● Examples- Redis, Memcached
| © Copyright 2022, InfluxData
In-Memory Database Pros and Cons
39
● High performance
● Low latency
● Versatile data types
● RAM is more expensive than
disk
● Large data sets will need to
be scaled horizontally
● Secondary database needed
for persistence in most cases
| © Copyright 2022, InfluxData
In-Memory Database Use Cases
40
● Caching
● Real-time applications
● Pub/Sub
| © Copyright 2022, InfluxData
Search Databases
41
| © Copyright 2022, InfluxData
Search Database Overview
42
● Designed for efficiently storing and querying text-based
documents
● Can handle structured or unstructured data
● Examples- Elasticsearch, Solr, MeiliSearch
| © Copyright 2022, InfluxData
Search Database Pros and Cons
43
● Developer Productivity - Built
in algorithms, query
languages
● Performance
● Scalability
● With high write volumes you
may need to sacrifice
indexing or replication
| © Copyright 2022, InfluxData
Search Database Use Cases
44
● Full-text search
● Log analysis
● Autocompletion
● Analytics
| © Copyright 2022, InfluxData
Vector Databases
45
| © Copyright 2022, InfluxData
Vector Database Overview
46
● Designed for storing and searching vector embeddings of
unstructured data
● Allows for searching of images, videos, text, audio, etc.
● Examples- Milvus, Pinecone
| © Copyright 2022, InfluxData
Vector Database Pros and Cons
47
● Efficient vector search
● Scalability
● Hybrid storage
● Very specialized database,
overhead might not be worth
it
| © Copyright 2022, InfluxData
Vector Database Use Cases
48
● Duplicate removal
● Anomaly detection
● Ranking and recommendation engines
● Similarity search for unstructured data
● Semantic search
| © Copyright 2022, InfluxData
NewSQL Databases
49
| © Copyright 2022, InfluxData
NewSQL Database Overview
50
● Attempt to get the best of both worlds by taking things from
both relational and non-relational databases
● Capable of high consistency while maintaining high availability
● Examples- CockroachDB, Spanner, TiDB
| © Copyright 2022, InfluxData
NewSQL Database Pros and Cons
51
● SQL support
● Horizontal scalability
● Typically “cloud native”
architecture friendly
● Complexity, even if hidden
● Potential latency
● Some SQL limitations
● Very new technology
| © Copyright 2022, InfluxData
NewSQL Database Use Cases
52
● Work for most general applications like a normal relational
database
● Some can also support analytics workloads
| © Copyright 2022, InfluxData
Future database trends
53
● Multi-Model Databases
● ML Indexing
○ 2-3x faster reads
○ Index several orders of magnitude smaller
| © Copyright 2022, InfluxData
Questions?
Website - influxdata.com
Slack - influxdata.com/slack
InfluxDB University - university.influxdata.com
InfluxDays - influxdays.com
54

How to Choose the Right Database for Your Workloads

  • 1.
    | © Copyright2022, InfluxData How to choose the right database for your application Charles Mahler, InfluxData
  • 2.
  • 3.
    | © Copyright2022, InfluxData Agenda 3 ● Overview of the current database landscape and trends ● What makes databases perform differently? ● Overview of different database types ● Future Database trends/features ● Q/A
  • 4.
    | © Copyright2022, InfluxData Things to keep in mind 4 ● Don’t fall for hype- keep it simple ● Nothing is magical, always tradeoffs ● Many databases have overlap within categories and use cases ● Some generalizations being made for simplicity
  • 5.
    | © Copyright2022, InfluxData Types of Databases 5 Relational Document Key-Value Graph Time series Columnar In-Memory Search Vector NewSQL Multi-model
  • 6.
    | © Copyright2022, InfluxData Current Database Landscape 6
  • 7.
    | © Copyright2022, InfluxData Database Trends 7
  • 8.
    | © Copyright2022, InfluxData What makes databases perform differently? 8
  • 9.
    | © Copyright2022, InfluxData Database Design Considerations 9 ● On disk data format ● Indexing data structure ● Disk-based vs memory-based ● Compression ● Hot vs Cold data ● Durability/Recovery
  • 10.
    | © Copyright2022, InfluxData How to Choose the Right Database 10 ● Data access patterns ● Transaction isolation requirements ● Scalability requirements ● Business stage ● In-house knowledge
  • 11.
    | © Copyright2022, InfluxData Relational vs NoSQL ● B-tree index ● Table based structure ● Defined schema ● Strong consistency- ACID ● Vertical scaling 11 ● LSM tree index ● Multiple types of storage ● Flexible schema ● Consistency tradeoffs- BASE ● Horizontal scaling
  • 12.
    | © Copyright2022, InfluxData Relational Databases 12
  • 13.
    | © Copyright2022, InfluxData Relational Database Overview 13 ● Data follows relational model and is stored in tabular form ● Stored on disk as rows ● SQL used for queries ● Examples- PostgreSQL, MySQL
  • 14.
    | © Copyright2022, InfluxData Relational Database Pros and Cons 14 ● Solid performance across broad range of applications ● Strong ecosystem and battle tested ● Data consistency ● Challenging to scale horizontally ● Can struggle with high write volume workloads ● Defined schema
  • 15.
    | © Copyright2022, InfluxData Relational Database Use Cases 15 ● Suitable for most general applications ● Any application where ACID is required
  • 16.
    | © Copyright2022, InfluxData Key-Value Databases 16
  • 17.
    | © Copyright2022, InfluxData Key-Value Database Overview 17 ● Simplest type of NoSQL database- effectively a hash table ● Keys point to unstructured blob of data ● Examples- DynamoDB, Redis, Riak
  • 18.
    | © Copyright2022, InfluxData Key-Value Database Pros and Cons 18 ● Great read and write performance ● Flexible schema ● Scalable ● Weak consistency ● Minimal querying capabilities
  • 19.
    | © Copyright2022, InfluxData Key-Value Database Use Cases 19 ● Personalized recommendations ● Session management ● Real-time features
  • 20.
    | © Copyright2022, InfluxData Document Databases 20
  • 21.
    | © Copyright2022, InfluxData Document Database Overview 21 ● Extension of key-value database with added support for metadata ● Semi-structured data format ● Documents hold all relevant data rather than relying on joins ● Examples - MongoDB, CouchDB
  • 22.
    | © Copyright2022, InfluxData Document Database Pros and Cons 22 ● Flexible Schema ● Performance ● Data locality ● Documents can map better to application needs ● Scalable ● Consistency ● Relationships between documents
  • 23.
    | © Copyright2022, InfluxData Document Database Use Cases 23 ● Suitable for most general applications ● Applications where rapid iteration is beneficial
  • 24.
    | © Copyright2022, InfluxData Graph Databases 24
  • 25.
    | © Copyright2022, InfluxData Graph Database Overview 25 ● Used for storing and analyzing relationships between connected data sets ● Specialized query languages ● Native vs non-native graph storage ● Examples- Neo4J, DGraph
  • 26.
    | © Copyright2022, InfluxData Graph Database Pros and Cons 26 ● Fast queries for graph analysis ● Developer productivity - query language, built-in algorithms ● Flexible schema ● Easy to establish new relationships between data points ● Not ideal if application data isn’t highly connected
  • 27.
    | © Copyright2022, InfluxData Graph Database Use Cases 27 ● Fraud detection ● Social networks ● Recommendation features
  • 28.
    | © Copyright2022, InfluxData Time Series Databases 28
  • 29.
    | © Copyright2022, InfluxData Time Series Database Overview 29 ● Designed for time series data ● Must be able to handle queries at both ends of the database spectrum ● Optimized for high write throughput and querying based on time ranges ● Built in features for data lifecycle management ● Examples- InfluxDB, TimescaleDB, QuestDB
  • 30.
    | © Copyright2022, InfluxData Time Series Database Pros and Cons 30 ● Very fast data ingest and query performance ● Developer productivity- data retention, aggregations, query languages ● Update and deletion performance
  • 31.
    | © Copyright2022, InfluxData Time Series Database Use Cases 31 ● Monitoring ● IoT applications ● Financial data ● Analytics and event tracking
  • 32.
    | © Copyright2022, InfluxData Columnar Databases 32
  • 33.
    | © Copyright2022, InfluxData Columnar Database Overview 33 ● Data is stored in column format rather than row ● Column format has benefits for compression and processing speed ● Used heavily for analytics workloads ● Examples- Clickhouse, Vertica, Redshift
  • 34.
    | © Copyright2022, InfluxData Columnar Database Pros and Cons 34 ● More efficient for analytic type queries ● Better data compression ● SIMD processing ● Writing data can be less efficient ● Slower if accessing multiple column values
  • 35.
    | © Copyright2022, InfluxData Performance 35
  • 36.
    | © Copyright2022, InfluxData Columnar Database Use Cases 36 ● Analytics ● Data warehousing ● Observability
  • 37.
    | © Copyright2022, InfluxData In-Memory Databases 37
  • 38.
    | © Copyright2022, InfluxData In-Memory Database Overview 38 ● Databases where data is stored entirely in RAM ● Can use fully optimized data structures without tradeoffs for interacting with disk ● Examples- Redis, Memcached
  • 39.
    | © Copyright2022, InfluxData In-Memory Database Pros and Cons 39 ● High performance ● Low latency ● Versatile data types ● RAM is more expensive than disk ● Large data sets will need to be scaled horizontally ● Secondary database needed for persistence in most cases
  • 40.
    | © Copyright2022, InfluxData In-Memory Database Use Cases 40 ● Caching ● Real-time applications ● Pub/Sub
  • 41.
    | © Copyright2022, InfluxData Search Databases 41
  • 42.
    | © Copyright2022, InfluxData Search Database Overview 42 ● Designed for efficiently storing and querying text-based documents ● Can handle structured or unstructured data ● Examples- Elasticsearch, Solr, MeiliSearch
  • 43.
    | © Copyright2022, InfluxData Search Database Pros and Cons 43 ● Developer Productivity - Built in algorithms, query languages ● Performance ● Scalability ● With high write volumes you may need to sacrifice indexing or replication
  • 44.
    | © Copyright2022, InfluxData Search Database Use Cases 44 ● Full-text search ● Log analysis ● Autocompletion ● Analytics
  • 45.
    | © Copyright2022, InfluxData Vector Databases 45
  • 46.
    | © Copyright2022, InfluxData Vector Database Overview 46 ● Designed for storing and searching vector embeddings of unstructured data ● Allows for searching of images, videos, text, audio, etc. ● Examples- Milvus, Pinecone
  • 47.
    | © Copyright2022, InfluxData Vector Database Pros and Cons 47 ● Efficient vector search ● Scalability ● Hybrid storage ● Very specialized database, overhead might not be worth it
  • 48.
    | © Copyright2022, InfluxData Vector Database Use Cases 48 ● Duplicate removal ● Anomaly detection ● Ranking and recommendation engines ● Similarity search for unstructured data ● Semantic search
  • 49.
    | © Copyright2022, InfluxData NewSQL Databases 49
  • 50.
    | © Copyright2022, InfluxData NewSQL Database Overview 50 ● Attempt to get the best of both worlds by taking things from both relational and non-relational databases ● Capable of high consistency while maintaining high availability ● Examples- CockroachDB, Spanner, TiDB
  • 51.
    | © Copyright2022, InfluxData NewSQL Database Pros and Cons 51 ● SQL support ● Horizontal scalability ● Typically “cloud native” architecture friendly ● Complexity, even if hidden ● Potential latency ● Some SQL limitations ● Very new technology
  • 52.
    | © Copyright2022, InfluxData NewSQL Database Use Cases 52 ● Work for most general applications like a normal relational database ● Some can also support analytics workloads
  • 53.
    | © Copyright2022, InfluxData Future database trends 53 ● Multi-Model Databases ● ML Indexing ○ 2-3x faster reads ○ Index several orders of magnitude smaller
  • 54.
    | © Copyright2022, InfluxData Questions? Website - influxdata.com Slack - influxdata.com/slack InfluxDB University - university.influxdata.com InfluxDays - influxdays.com 54