SlideShare a Scribd company logo
1 of 20
High Performance NoSQL Masterclass
Survey of High Performance
NoSQL Systems
Peter Corless
High Performance NoSQL Masterclass
Peter Corless
● Director of Technical Advocacy,
ScyllaDB
● Editor / contributor to ScyllaDB blog
● Program chair for ScyllaDB Summit
and P99 CONF
● Host of ScyllaDB Masterclass series
● @PeterCorless on Twitter
High Performance NoSQL Masterclass
NoSQL Database
Landscape
DB-Engines.com “Top 100”
4
As of November 2022
NoSQL/Multimodel Databases in the Top 100
5
Key Value (9)
Redis
Memcached
Hazelcast
Etcd
Ehcache
Aerospike
Riak KV
RocksDB
LevelDB
Wide Column (8)
Apache Cassandra
Amazon DynamoDB
ScyllaDB
Apache HBase
DataStax Enterprise
Azure Table Storage
Google Cloud Bigtable
Accumulo
Document (12)
MongoDB
Couchbase
Firebase Realtime
CouchDB
Google Cloud Firestore
Realm
MarkLogic
Google Cloud Datastore
RavenDB
IBM Cloudant
RethinkDB
PouchDB
Graph (1)
Neo4j
Multimodel (5)
Azure Cosmos DB
ArangoDB
OrientDB
Oracle NoSQL
Yugabyte
Time Series (5)
InfluxDB
kdb+
Graphite
Prometheus
TimescaleDB [SQL]
High Performance NoSQL Masterclass
Document Databases
6
● “Documents” are encoded formats
○ Javascript Object Notation (JSON) or
Binary JSON (BSON)
○ Extensible Markup Language (XML)
○ (We’re not talking about managing
PDFs or Word files)
● Allows “tree”-style data models
● “Parent” and “child” nodes
ADVANTAGE
● Easy for developers to get started
DISADVANTAGE
● Primary-replica clustering bottlenecks
write-heavy workloads at scale
Discover more differences: MongoDB vs. ScyllaDB Production Experience from a Dev & Ops Standpoint
High Performance NoSQL Masterclass
Key Value Databases
7
● Keys are simple indexes for a record
● Values can be simple data types
(.e.g, text or integer values), or more
complex (lists, maps, collections)
● Often used for in-memory caching
ADVANTAGE
● Fast, simple
DISADVANTAGE
● Multi-datacenter clustering is an anti-
pattern
Why that might be a bad idea: 7 Reasons Not to Put an External Cache in Front of Your Database
High Performance NoSQL Masterclass
Graph Databases
8
● Models domains as vertices
(entities/objects) and edges
(relationships)
● “Edges” are vital for understanding
interrelationships
● Complexity grows as an n2 problem
● Query languages need to
understand how to navigate
topology (limit query depth, avoid
infinite loops, etc.) — Cypher,
Gremlin/Tinkerpop
ADVANTAGE
● Models object relational complexities
well
DISADVANTAGE
● Data set size often limited by
complexity / computational power
Did you know… You can use ScyllaDB or Cassandra as Storage Backend for JanusGraph?
High Performance NoSQL Masterclass
Wide Column Databases
9
● Row-based store
● “Key-key-value”
● Can be used as a simple key-value
● Many (but not all) share the SQL-like
Cassandra Query Language (CQL)
● Designed for horizontal scaleout
● ScyllaDB also architected for vertical
scale-up too.
ADVANTAGE
● Great scaleout, global clustering
DISADVANTAGE
● Intimidating to newcomers
High Performance NoSQL Masterclass
The Case for Wide
Column NoSQL
10
High Performance NoSQL Masterclass
Horizontal (and Vertical) Scalability
11
● Scale out to any number of
nodes (Cassandra, ScyllaDB)
● Scale up to any number of
cores per node (ScyllaDB)
High Performance NoSQL Masterclass
Wide Column = “Key Key Value”
■ Wide column databases are row-based
● Use partitioning & clustering (or sort) keys
● Mostly used for transaction processing
(OLTP)
● Examples: Cassandra, ScyllaDB, DynamoDB
12
→
→
→
→
→
→
High Performance NoSQL Masterclass
Wide Column ≠ Column Store
■ Don’t confuse a wide column database with
a columnar database (aka column store)
■ Column stores store data in columnar format
● Can count “runs” of repeated values in
columns to minimize data repetition
● Mostly used for analytics processing
(OLAP)
● Examples: Druid, Pinot, Clickhouse,
BigQuery
13
High Performance NoSQL Masterclass
Automatic Data Sharding & Replication
14
Autosharding based on Token Ranges
Using an RF=3, each data record is automatically copied
and put on two other replica nodes
Servers
ScyllaDB
■ Data automatically partitioned and
balanced across cluster based on
partition key using token ranges
■ Data within partitions is organized
by clustering key (or sort key)
■ Each record is automatically
replicated across cluster based on
replication factor (typically RF=3) to
ensure durability
■ Multi-datacenter replication built-
in
0-100
0-100
0-100
101-200
101-200
101-200
201-300
201-300
201-300
High Performance NoSQL Masterclass
Leaderless Topology
15
Peer-to-Peer Active-Active (Multi-Datacenter)
Each node accepts reads+writes
Inherently better load balancing
Deals better w/ write-heavy or mixed read-write workloads
Clients
Servers
ScyllaDB
■ No single point of failure
■ No bottleneck at a “leader” node
■ Every node can be read-write
High Performance NoSQL Masterclass
Coordinator Node per Operation
■ Client makes request to any
replica node
■ This “coordinator” node forwards
the request to other replicas.
■ Replicas acknowledges operation
to coordinator, which responds to
client
■ Various forms of load balancing
● Simple round-robin
● Datacenter aware round-robin
● Heat-weighted load balancing
16
16
Coordinator Node
Using token awareness, for an update, the coordinator
node will be chosen from one of the current replicas
Clients
Servers
ScyllaDB
High Performance NoSQL Masterclass
Tunable Consistency Levels per Operation
■ “AP”-mode as per CAP theorem
● Emphasizes high availability
over strong consistency
■ Many consistency levels
● ONE
● QUORUM
● QUORUM_LOCAL
● EACH_QUORUM
● ALL
● ALL_LOCAL
17
Clients
Servers
ScyllaDB
Example: Quorum Consistency
In a cluster of 3 nodes, so long as 2 of the 3 nodes
succeed, the operation will succeed.
The third node will eventually get updated & be made
consistent, in-sync with the rest of the cluster
OK
OK NO
High Performance NoSQL Masterclass
Write & Read Paths
■ Writes are acknowledged when
both in in-memory memtable &
durable commitlog.
■ Periodically memtables are
flushed to immutable on-disk
Sorted Strings Tables (SSTables)
■ Reads will first check the in-
memory row-based cache, or
fetch data from SSTable on disk
■ Bloom filters help the system
figure out where the data is [or
isn’t] stored
18
High Performance NoSQL Masterclass
Discover More in ScyllaDB University
university.scylladb.com
High Performance NoSQL Masterclass
Keep in touch!
Peter Corless
Director of Technical Advocacy
ScyllaDB
peter@scylladb.com
@PeterCorless

More Related Content

What's hot

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 

What's hot (20)

Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 

Similar to Survey of High Performance NoSQL Systems

[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)
Steve Min
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
Steve Min
 

Similar to Survey of High Performance NoSQL Systems (20)

NoSQL
NoSQLNoSQL
NoSQL
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
AWS Certified Cloud Practitioner Course S11-S17
AWS Certified Cloud Practitioner Course S11-S17AWS Certified Cloud Practitioner Course S11-S17
AWS Certified Cloud Practitioner Course S11-S17
 
Mysql high availability and scalability
Mysql high availability and scalabilityMysql high availability and scalability
Mysql high availability and scalability
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
 
MariaDB: Connect Storage Engine
MariaDB: Connect Storage EngineMariaDB: Connect Storage Engine
MariaDB: Connect Storage Engine
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Free & Open DynamoDB API for Everyone
Free & Open DynamoDB API for EveryoneFree & Open DynamoDB API for Everyone
Free & Open DynamoDB API for Everyone
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 

More from ScyllaDB

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Survey of High Performance NoSQL Systems

  • 1. High Performance NoSQL Masterclass Survey of High Performance NoSQL Systems Peter Corless
  • 2. High Performance NoSQL Masterclass Peter Corless ● Director of Technical Advocacy, ScyllaDB ● Editor / contributor to ScyllaDB blog ● Program chair for ScyllaDB Summit and P99 CONF ● Host of ScyllaDB Masterclass series ● @PeterCorless on Twitter
  • 3. High Performance NoSQL Masterclass NoSQL Database Landscape
  • 5. NoSQL/Multimodel Databases in the Top 100 5 Key Value (9) Redis Memcached Hazelcast Etcd Ehcache Aerospike Riak KV RocksDB LevelDB Wide Column (8) Apache Cassandra Amazon DynamoDB ScyllaDB Apache HBase DataStax Enterprise Azure Table Storage Google Cloud Bigtable Accumulo Document (12) MongoDB Couchbase Firebase Realtime CouchDB Google Cloud Firestore Realm MarkLogic Google Cloud Datastore RavenDB IBM Cloudant RethinkDB PouchDB Graph (1) Neo4j Multimodel (5) Azure Cosmos DB ArangoDB OrientDB Oracle NoSQL Yugabyte Time Series (5) InfluxDB kdb+ Graphite Prometheus TimescaleDB [SQL]
  • 6. High Performance NoSQL Masterclass Document Databases 6 ● “Documents” are encoded formats ○ Javascript Object Notation (JSON) or Binary JSON (BSON) ○ Extensible Markup Language (XML) ○ (We’re not talking about managing PDFs or Word files) ● Allows “tree”-style data models ● “Parent” and “child” nodes ADVANTAGE ● Easy for developers to get started DISADVANTAGE ● Primary-replica clustering bottlenecks write-heavy workloads at scale Discover more differences: MongoDB vs. ScyllaDB Production Experience from a Dev & Ops Standpoint
  • 7. High Performance NoSQL Masterclass Key Value Databases 7 ● Keys are simple indexes for a record ● Values can be simple data types (.e.g, text or integer values), or more complex (lists, maps, collections) ● Often used for in-memory caching ADVANTAGE ● Fast, simple DISADVANTAGE ● Multi-datacenter clustering is an anti- pattern Why that might be a bad idea: 7 Reasons Not to Put an External Cache in Front of Your Database
  • 8. High Performance NoSQL Masterclass Graph Databases 8 ● Models domains as vertices (entities/objects) and edges (relationships) ● “Edges” are vital for understanding interrelationships ● Complexity grows as an n2 problem ● Query languages need to understand how to navigate topology (limit query depth, avoid infinite loops, etc.) — Cypher, Gremlin/Tinkerpop ADVANTAGE ● Models object relational complexities well DISADVANTAGE ● Data set size often limited by complexity / computational power Did you know… You can use ScyllaDB or Cassandra as Storage Backend for JanusGraph?
  • 9. High Performance NoSQL Masterclass Wide Column Databases 9 ● Row-based store ● “Key-key-value” ● Can be used as a simple key-value ● Many (but not all) share the SQL-like Cassandra Query Language (CQL) ● Designed for horizontal scaleout ● ScyllaDB also architected for vertical scale-up too. ADVANTAGE ● Great scaleout, global clustering DISADVANTAGE ● Intimidating to newcomers
  • 10. High Performance NoSQL Masterclass The Case for Wide Column NoSQL 10
  • 11. High Performance NoSQL Masterclass Horizontal (and Vertical) Scalability 11 ● Scale out to any number of nodes (Cassandra, ScyllaDB) ● Scale up to any number of cores per node (ScyllaDB)
  • 12. High Performance NoSQL Masterclass Wide Column = “Key Key Value” ■ Wide column databases are row-based ● Use partitioning & clustering (or sort) keys ● Mostly used for transaction processing (OLTP) ● Examples: Cassandra, ScyllaDB, DynamoDB 12 → → → → → →
  • 13. High Performance NoSQL Masterclass Wide Column ≠ Column Store ■ Don’t confuse a wide column database with a columnar database (aka column store) ■ Column stores store data in columnar format ● Can count “runs” of repeated values in columns to minimize data repetition ● Mostly used for analytics processing (OLAP) ● Examples: Druid, Pinot, Clickhouse, BigQuery 13
  • 14. High Performance NoSQL Masterclass Automatic Data Sharding & Replication 14 Autosharding based on Token Ranges Using an RF=3, each data record is automatically copied and put on two other replica nodes Servers ScyllaDB ■ Data automatically partitioned and balanced across cluster based on partition key using token ranges ■ Data within partitions is organized by clustering key (or sort key) ■ Each record is automatically replicated across cluster based on replication factor (typically RF=3) to ensure durability ■ Multi-datacenter replication built- in 0-100 0-100 0-100 101-200 101-200 101-200 201-300 201-300 201-300
  • 15. High Performance NoSQL Masterclass Leaderless Topology 15 Peer-to-Peer Active-Active (Multi-Datacenter) Each node accepts reads+writes Inherently better load balancing Deals better w/ write-heavy or mixed read-write workloads Clients Servers ScyllaDB ■ No single point of failure ■ No bottleneck at a “leader” node ■ Every node can be read-write
  • 16. High Performance NoSQL Masterclass Coordinator Node per Operation ■ Client makes request to any replica node ■ This “coordinator” node forwards the request to other replicas. ■ Replicas acknowledges operation to coordinator, which responds to client ■ Various forms of load balancing ● Simple round-robin ● Datacenter aware round-robin ● Heat-weighted load balancing 16 16 Coordinator Node Using token awareness, for an update, the coordinator node will be chosen from one of the current replicas Clients Servers ScyllaDB
  • 17. High Performance NoSQL Masterclass Tunable Consistency Levels per Operation ■ “AP”-mode as per CAP theorem ● Emphasizes high availability over strong consistency ■ Many consistency levels ● ONE ● QUORUM ● QUORUM_LOCAL ● EACH_QUORUM ● ALL ● ALL_LOCAL 17 Clients Servers ScyllaDB Example: Quorum Consistency In a cluster of 3 nodes, so long as 2 of the 3 nodes succeed, the operation will succeed. The third node will eventually get updated & be made consistent, in-sync with the rest of the cluster OK OK NO
  • 18. High Performance NoSQL Masterclass Write & Read Paths ■ Writes are acknowledged when both in in-memory memtable & durable commitlog. ■ Periodically memtables are flushed to immutable on-disk Sorted Strings Tables (SSTables) ■ Reads will first check the in- memory row-based cache, or fetch data from SSTable on disk ■ Bloom filters help the system figure out where the data is [or isn’t] stored 18
  • 19. High Performance NoSQL Masterclass Discover More in ScyllaDB University university.scylladb.com
  • 20. High Performance NoSQL Masterclass Keep in touch! Peter Corless Director of Technical Advocacy ScyllaDB peter@scylladb.com @PeterCorless