Demystifying the Distributed Database Landscape

Demystifying the
Distributed Database
Landscape
A survey of technologies in 2021

Peter Corless
+ Listen to & share user stories
+ Write blogs & case studies
+ Play (and design) strategy &
roleplaying games
Director of Technical Advocacy
ScyllaDB

3
Distributed Database Landscape 2021
SQL
+ Distributed SQL
+ NewSQL
NoSQL
+ Key-value
+ Document
+ Wide-column
+ Graph
Multi-model
+ SQL + NoSQL
+ Multiple NoSQL
Production Environments
+ On-premises
+ Co-location
+ Public cloud
+ Private cloud
+ Hybrid cloud
+ Multicloud
+ Edge
+ IoT / Embedded
Business / Use Models
+ Open Source License
+ Enterprise License
+ OEM License
+ Service Agreements
Use Cases
+ OLTP
+ OLAP
+ HTAP
+ Time Series

4
This Next Tech Cycle
The wave of innovation we’re currently riding.

Hardware, software, and
methodologies are all
co-evolving to create this
next tech cycle.
5

This Next Tech Cycle
2000 2010 2020 2025+
Transistor
Count
42M
Pentium 4
(2000)
228M
Pentium D
(2005)
2.3B
Xeon Nahalem-EX
(2010)
10B
SPARC M7
(2015)
39B
Epyc Rome
(2019)
Core
Count 1 2 8 32 64
~60B?
Epyc Genoa
(2022)
96
~80B?
Epyc Bergamo
(2023)
128
1.2 ZB
IP traﬃc
(2016)
2 ZB
Data stored
(2010)
64 ZB
Data stored
(2020)
Broadband
Speeds
3G
(2002)
105mbps
(2014)
1.5 mbps
(2002)
16 mbps
(2008)
Wireless
Services
3Gbps
(2021)
1Gbps
(2018)
4G
(2014)
5G
(2018)
Zettabyte
Era
~180 ZB
Data stored
(2025)
Public
Cloud
AWS
(2006)
GCP
(2008)
Azure
(2010)
1021

7
+ Compute
+ From >100 cores to >1,000 cores per server
+ From multicore CPUs → full System on a Chip (SoC) designs (CPU, GPU, Cache, Memory)
+ Memory
+ Terabyte-scale RAM per server
+ DDR5 — 4600 MHz in 2020, 8000 MHz by 2024
+ DDR6 — 9600 MHz by 2025
+ Persistent memory — memory mode
+ Storage
+ Petabyte-scale storage per server
+ NVMe 2.0 [2021] — separation of base and transport
+ Persistent memory — app direct (storage) mode
Hardware Still Vertically Scaling

8
+ Agile [c. 2000]
+ CI/CD = CI [1991] + CD [2009]
+ DevOps [2009]
+ Chaos Monkey [2011]
+ Kubernetes [2014]
+ GitOps [2017]
+ DevSecOps [2018]
Methodologies Still Evolving
How It Started
How It’s Going
How It Evolved

9
Hybrid & Multicloud is Now-ish

10
+ <1 terabyte
+ 1 to 50 terabytes
+ 50-100 terabytes
+ >100 terabytes
How much data do you have under management in your own
transactional database systems?
Poll Question

11
The Distributed
Database Landscape
Here there be monstrous databases!

12
DB-Engines.com
+ 381 databases
+ Some are distributed databases
+ Others are not distributed databases
+ Some are SQL
+ Some are NoSQL
+ Some support both SQL + NoSQL
+ Some support multiple NoSQL types
+ Some are… not easily classiﬁable
+ A huge industry with some well-known
names
+ But popularity (by itself) ≠
ﬁtness for use for your use case

13
Top 100 Databases
+ Narrowing ﬁeld helps scope analysis
+ Still results in wide variety of databases
+ Many SQL
+ Many NoSQL
+ ScyllaDB is in the Top 100!

14
Top 100 Databases
(and Database-like systems)
on DB-Engines.com
[as of November 2021]
+ 49 SQL
+ 32 NoSQL
+ 5 Both SQL + NoSQL
+ 5 Search Engines
+ 6 Time Series
+ 3 Others
Top 100 Databases

Are these all really
distributed databases?
15

17
+ Clustering & Distribution Strategies
+ Local clustering — multiple nodes in the same datacenter share updates
+ Cross-cluster updates — multiple clusters can share data between them
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster
+ Node Roles, High Availability & Failover Strategies
+ Primary-replica (Active-passive; writes to primary only; read-only replicas; “hot standby” modes)
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)
+ Load balancing (client side or service in front of database)
+ Data Replication & Sharding Strategies
+ Replication Factors & Consistency Levels
+ Horizontal Scalability: Manual vs. Auto-sharding
+ Topology Awareness: Rack-awareness, Datacenter-awareness
What do you mean by a “Distributed Database?”

18
The Short List: Systems of Interest
SQL + NewSQL NoSQL
PostgreSQL MongoDB
CockroachDB Redis
ScyllaDB

19
PostgreSQL — distributed SQL
+ Horizontal Scalability: Manual Sharding vs. Auto-sharding
Part of base offering
Can be added, but not part of base

20
CockroachDB — NewSQL
+ Topology Awareness: Rack-awareness*, Datacenter-awareness
* Can be manually conﬁgured using localities

21
MongoDB — the leading document store

22
+ Multi-datacenter clustering — geographically, even globally disbursed. but same logical cluster*
+ Peer-to-peer, leaderless (Active-Active, multi primaries; can write to any replica; no SPOF)*
+ Replication Factors & Consistency Levels (e.g., strong locally; causal consistency in active-active*)
Redis — key-value in-memory DB/cache
* Redis Enterprise feature

23
+ Load balancing (client side or service in front of database*)
ScyllaDB
* For DynamoDB-compatible API

24
But for now, let’s move on...

25
Where are Distributed
Databases Headed Next?
Time to read the tea leaves

26
The Trend for SQL
+ Google Trends for “SQL”
are at 25% rate of 2004
+ Book citations for “SQL”
peaked in 2008 and
were down to 28% of
that rate by 2019
+ Back to 1994 levels of
interest, basically
+ Still dwarfs other
database terms like
“NoSQL” or “NewSQL” or
“RDBMS”
+ No single term or
technology sums up the
distributed database
market anymore

27
+ Cambrian Explosion will Continue — “What is a database anyway?”
+ Distributed Databases of all kinds
+ Distributed Streaming — “Kafka as a database?” (kSQL says “Yes!”)
+ Distributed Ledgers — “Blockchains/DAGs as a database?”
+ Further fragmentation of the market
+ NoSQL + SQL blending increasingly
+ Evolution of NoSQL back to SQL assumptions
+ Adding back Strong Consistency, Schema Constraints, Strict Typing
Where are Distributed Databases Going?

28
+ Elasticity — Faster provisioning/decommissioning, autoscaling
+ Uncoupling Compute from Storage — Tiered Storage, Plug-in Storage
+ Data over Time
+ Built for Event Streaming, Time Series
+ Data over Space
+ Geospatial queries, Geoindexing
+ Geographic / political boundaries — GDPR, data localization
regulatory compliance
Further Trends in Distributed Databases

29
+ Increasing Focus on Developer Enablement and Developer Experience (DX)
+ APIs for extensibility: extensions, plugins, modules, add-ons, integration layers
+ Database Speciﬁc: PostgreSQL extensions, Redis modules
+ Cross-industry: GraphQL, OpenAPI (Swagger), etc.
+ AI/ML integration and incorporation into databases
+ “Building models where your data resides” — Martin Heller (Apr 2021)
+ Amazon Redshift ML
+ BigQuery ML
+ Oracle, Db2, Microsoft SQL Server
Database as a Development Platform

30
+ Tighter Coupling of Data Engineering + Data Sciences +
Operations
+ Repairing rifts of the past decade
+ Bridging huge divides between people and systems
+ From “Data Pipelining” (production-oriented) to...
+ “Data Supply Chains” (consumption-oriented)
+ Like “Software Supply Chain,” but for data and data products.
Data Teaming

31
+ Specializing databases to run in the cloud (and cloud-only)
+ Providing “concierge” services
+ Ecosystem: can integrate into cloud vendor’s (or partners’) offerings
+ Managed for you — at a price
+ Making Open Source databases easier to run on infrastructural level
+ Making self-managed operations simpler
+ Flexibility: can run on premises or in the cloud
+ Self-service model — so long as you have the skillz
We Need Different Kinds of “Easy”

32
Hope You Enjoyed Your Trip!
http://slack.scylladb.com/

33
+ Kostja Osipov
+ Serge Leontiev
Thanks
Any errors, omissions, misinterpretations,
misrepresentations or misunderstandings
are purely my own.
Please send suggestions and corrections
to peter@scylladb.com
People who helped educate me
Disclaimer

United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Learn NoSQL for free!
university.scylladb.com
@petercorless

Demystifying the Distributed Database Landscape

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Demystifying the Distributed Database Landscape

Similar to Demystifying the Distributed Database Landscape (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Demystifying the Distributed Database Landscape