SlideShare a Scribd company logo
What a Modern Database
Enables
Srini Srinivasan
CTO and Founder
Aerospike
All rights reserved. © 2023 Aerospike, Inc.
Our Driving Design Centers
2
Optimizations for Modern System Architectures
• CPU and NUMA pinning
• Storage tiers (DRAM, NVMe)
• Hybrid Memory Architecture
• Network to application alignment
Massive Parallelism with Indexing
• Multi-threaded NUMA architecture
• Data distribution across disks, nodes
• Client accesses server in single hop
• AI/ML Processing
Strongly Consistent Transactions
• Zero Data Loss
• Linearizable reads (tunable)
• Read one, write all scheme
• Roster concept maximizes availability
Geo-Distributed Active-Active System
• Uniform data partitioning
• Mixed workload handling
• Self-managed rack aware clusters
• Synch and Asynch Replication
All rights reserved. © 2023 Aerospike, Inc.
Strongly Consistent Transactions
3
All rights reserved. © 2023 Aerospike, Inc.
Aerospike Strong Consistency @ 33% less H/W
4
Failure Support – Big Hardware Savings
• 1 failure => 2 copies
• 2 failures => 3 copies
When is data consistent?
• Once all nodes respond
Aerospike consensus is
non-quorum, roster-based
How is consistency maintained?
• With a roster
• Determines cluster health
Heartbeats
• Exchanged by nodes
• CPU unaffected as data/node increases
A B
Application
(Leader) (Follower)
Aerospike passes Jepsen tests: https://jepsen.io/analyses/aerospike-3-99-0-3
All rights reserved. © 2023 Aerospike, Inc.
Strong Consistency (SC) – Write Logic
5
Write to all replicas before return to client w/commits with minimum friction
1. Request 6. Success
2. Write Local
3. Replicate
4. Response
5. Advise Replicated*
3. Replicate
4. Response
5. Advise Replicated*
*Advise Replicated is one way
and only when more than 1 copy
Master
Client
Replica 2
Replica 1
All rights reserved. © 2023 Aerospike, Inc.
SC – Linearizable Read Logic
6
Master or replica read alone is
sufficient for Sequential Consistency
Master
Client
1. Request 5. Response
2. Read Local
Replica 2
Replica 1
3. Status Request
4. Status Response
3. Status Request
4. Status Response
No stale reads possible
Extra network round-trip
All rights reserved. © 2023 Aerospike, Inc.
High Availability in an RF2 Strong Consistency System
7
Synchronous
1M
Rack 1
Zone 1
A
B
C
Z
1R
2M
2R
3M
3R
C
B
Read (3R)
Write (3M)
Write (3M)
Read (3M)
RF2:
Complete
copy of
data
Writes < 10ms
Reads < 1ms
Automatic Sync
RF – Replication Factor
Rack 2
Zone 2
Rack Awareness pegs data copies to racks distributed across zones or datacenters within a cluster
All rights reserved. © 2023 Aerospike, Inc.
Data Availability During Split Brain Events
A B C D E
A B C D E
A B C D E
RM
R
RR
M
RM
R
RR
M
RM RR
M R
In a healthy cluster the Roster Master is the same as the Master, and
the Roster Replica is the same as the replica.
Rule 1: A sub-cluster is Active if it has the Roster Master and all
Roster Replicas and at least 1 is full.
Rule 2: A sub-cluster is Active if it has the majority of nodes and at
least one full Roster Master or Roster Replica OR exactly ½ the
roster nodes and the Roster Master and the partition is full.
A B C D E
RM RR
M R
Rule 3: A sub-cluster is Active if it is a Super Majority Cluster and the
partitions are full or subsets
All rights reserved. © 2023 Aerospike, Inc.
Geo-Distributed Active-Active System
9
All rights reserved. © 2023 Aerospike, Inc.
Node Add/Remove/Update without Disruption
10
Self-healing, auto-sharding, algorithmic cluster management
A B C Z
25%
CLUSTER DATA
High uptime
“Shared Nothing” architecture
No single points of failure
Self-healing capability
Auto rebalance upon node add/remove
Data migrates automatically, evenly
Set-and-forget DevOps
Automatic sharding of data
No re-tuning of cluster for use-case
changes
25%
CLUSTER DATA
25%
CLUSTER DATA
25%
CLUSTER DATA
A B C Z
All rights reserved. © 2023 Aerospike, Inc.
Global Transactions – Sync Active-Active
11
USA West
Rack 1
Node 1 R1
Node 2
Node 3
Geographically distributed strongly
consistent transactions at scale
Node 7
Node 8
Node 9 M
United Kingdom
Rack 3
Node 4 R1
Node 5
Node 6
USA East
Rack 2
Local apps
Roster Membership Based
Local apps
Local apps
Synchronous active-active replication
Strong Consistency (linearizable)
No data loss
Conflict avoidance
Auto recovery on single site failure
Low latency reads from local rack
Single cluster with
Racks 1, 2, 3
Automatic Sync
Writes ~ 200 ms
Reads < 1ms
All rights reserved. © 2023 Aerospike, Inc.
Distributed Data Hub – Async Active-Active
12
Multiple clusters
connected via XDR
› Asynchronous active-active replication
› Dynamic fine-grained data routing
› Relaxed consistency (lag ~ 100ms)
› Asynchronous active-active replication
› Dynamic fine-grained data routing
› Relaxed consistency (lag ~ 100ms)
Predictive
Analytics
Single Source of Truth
Legacy
Data
Store
TB’s 100’s PB’s
PB’s
Edge (ms) Core (ms) Warehouse (sec-to-mins)
Location A
(SOE)
Location B
(SOE)
Location C
(SOE)
XDR
XDR
Real Time
System of Record
Streaming
AI/ML
Engines
XDR
Query & Reporting
Store
Web
Social
Data Sources
Streaming Video
Gaming
Enterprise
Applications
IoT
3rd Party
Mobile
Features
All rights reserved. © 2023 Aerospike, Inc.
Optimizations for Modern System
Architectures
13
All rights reserved. © 2023 Aerospike, Inc.
Real-time Read Access to Data in SSD
14
Patented Hybrid Memory ArchitectureTM (HMA) places data on SSD and indexes-only in DRAM
Software written in C to natively talk to hardware, not an API layer
BLOCK INTERFACE
SSD SSD
NVME
SSD
HYBRID-MEMORY ARCHITECTURE™
Direct SSD device access
Highly Parallelized
Large Block Writes to SSD
SSD vendor-optimized
Continuous, non-disruptive defrag
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD SSD
OTHER DATABASE
All rights reserved. © 2023 Aerospike, Inc.
Storage Tier Configurations
15
All DRAM All Flash
› Index and Data in Flash
› Sub 5-millisecond reads & writes
› Lower DRAM usage than HMA
› Suitable for lots of small objects
› Server footprint reduction similar to HMA
OPERATIONS
EXPIRY
DIGEST & TREE INFO
RECORD METADATA
STORAGE POINTER
WRITE QUEUE
BIN
1
BIN
2
BIN
3
STORAGE
FLASH INDEX
OPERATIONS
EXPIRY
DIGEST & TREE INFO
RECORD METADATA
STORAGE POINTER
WRITE QUEUE
DEFRAG
DATA IN
FLASH
READS
STORAGE
Hybrid DRAM/Flash
› Index in DRAM, Data in Flash
› Sub millisecond reads & writes
› 5-10X lower server footprint
DRAM INDEX
OPERATIONS
EXPIRY
DIGEST & TREE INFO
RECORD METADATA
STORAGE POINTER
WRITE QUEUE
DEFRAG
DATA IN
FLASH
READS
STORAGE
BIN
1
BIN
2
BIN
3
BIN
1
BIN
2
BIN
3
› Index and Data DRAM
› Sub millisecond reads & writes
All rights reserved. © 2023 Aerospike, Inc.
SLAs versus Scale on Storage Tiers
16
Memory Optimized
512 GiB memory
2 x 1900 GB SSD
r6in.16xlarge
Storage Optimized
128 GiB memory
2 x 7500 GB SSD
im4gn.8xlarge
20 TB Data
37 nodes
20 TB Data
Addressable
memory space:
512 GiB/node
Addressable
memory space:
15 TB/node
In-Memory
All-Flash
Hybrid Memory
Performance + Cost Affordable Scale
99% < 1ms
99% < 1ms
99% < 10ms
Terabytes
Petabytes
6 nodes
In-Memory
All-Flash
Hybrid Memory
Petabytes
All rights reserved. © 2023 Aerospike, Inc.
C based DB kernel
Optimizations for CPU, Memory, Network
17
➤ Multi-threaded data structures (NUMA pinned)
➤ Nested locking model for synchronization
➤ Lockless data structures
➤ Partitioned single threaded data structures
➤ Index entries are aligned to cache line (64 bytes)
➤ Custom memory management (arenas)
Memory Arena Assignment
Multi-core Architecture
NIC
Queue
NIC
Queue
NIC
Queue
NIC
Queue
NIC
NIC IRQ
Binding
Core Core Core Core
CPU Socket
NIC IRQ
Binding
NIC IRQ
Binding
Core Core Core Core
CPU Socket
NIC IRQ
Binding
All rights reserved. © 2023 Aerospike, Inc.
Massive Parallelism with Indexing
18
All rights reserved. © 2023 Aerospike, Inc.
Data distribution
Intelligent Data Partitioning Eliminates Hotspots
19
Data distribution is deterministic, uniform and algorithmic
Even amount data on every node
and on every flash device
Load balanced continually and
automatically on all servers, even
while scaling up/down or with
cluster reconfigurations
No retuning for new use cases
(same scheme/algos)
Partition Id Leader Replica 1 Replica 2 Replica 3 Replica 4
0 B D E A C
1 E C A D B
2 C B E A D
… … … … … …
4095 A E B D C
A B C D E
All rights reserved. © 2023 Aerospike, Inc.
Remove bottlenecks: Same low latency from 1st GB to the 1st PB…
Smart Client TM
Direct Path to Data (single-hop)
20
Each nodes knows where all data resides via Smart ClientTM
Client is 1st
-class participant in architecture
and data fabric
Continuously updates
Calculates Partition ID to determine
Node ID
Cluster-spanning operations
(scan, query, batch) sent to all processing
nodes for parallel processing
Executes operations APIs (e.g. CRUD+)
All rights reserved. © 2023 Aerospike, Inc.
Secondary Indexes – Parallel Query Execution
b1:r1 b2:r1 … b1:r2 b2:r4 … b5:r3 b2:r9 …
. . .
P1 P2 Px
SECONDARY INDEX
PRIMARY INDEX
RECORD RECORD
RECORD RECORD
SSD
SSD
DRAM
…
Query
• Value-based lookup
• Via secondary index
• Similar to SQL “select”
Parallel execution
• Per partition
• Scatter-gather scheme
• Multiple threads across nodes
Parallel access efficient for “low
selectivity indices
Support equality matches, range
queries: Integer, double, string, blob
All rights reserved. © 2023 Aerospike, Inc.
A B C
CLIENT
22
% OF CLUSTER DATA
11%
SSD 1
11%
SSD 2
11%
SSD 3
Massively Parallel Architecture
Data distribution is deterministic, uniform and algorithmic
Data distribution
Even amount data on every node and on
every flash device
Load balanced continually and
automatically on all servers, even while
scaling up/down or with cluster
reconfigurations
No retuning for new use cases (same
scheme/algos)
No hot spots with intelligent auto-sharding
33%
CLUSTER DATA
0 33%
CLUSTER DATA
33%
CLUSTER DATA
A B C
All rights reserved. © 2023 Aerospike, Inc.
Summary
23
Optimizations for Modern System Architectures
• CPU and NUMA pinning
• Storage tiers (DRAM, NVMe)
• Hybrid Memory Architecture
• Network to application alignment
Massive Parallelism with Indexing
• Multi-threaded NUMA architecture
• Data distribution across disks, nodes
• Client accesses server in single hop
• AI/ML Processing
Strongly Consistent Transactions
• Zero Data Loss
• Linearizable reads (tunable)
• Read one, write all scheme
• Roster concept maximizes availability
Geo-Distributed Active-Active System
• Uniform data partitioning
• Mixed workload handling
• Self-managed rack aware clusters
• Synch and Asynch Replication
All rights reserved. © 2023 Aerospike, Inc.
Thank You
24

More Related Content

Similar to What a Modern Database Enables_Srini Srinivasan.pdf

Aurora Deep Dive | AWS Floor28
Aurora Deep Dive | AWS Floor28Aurora Deep Dive | AWS Floor28
Aurora Deep Dive | AWS Floor28
Amazon Web Services
 
3PAR and VMWare
3PAR and VMWare3PAR and VMWare
3PAR and VMWare
vmug
 
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Amazon Web Services
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Amazon Web Services
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
Amazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
Amazon Web Services
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
Aerospike, Inc.
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
Aerospike, Inc.
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
Aerospike
 
(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads
Amazon Web Services
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
Peter Milne
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with Akka
Yuval Itzchakov
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
Nikhil Kumar
 

Similar to What a Modern Database Enables_Srini Srinivasan.pdf (20)

Aurora Deep Dive | AWS Floor28
Aurora Deep Dive | AWS Floor28Aurora Deep Dive | AWS Floor28
Aurora Deep Dive | AWS Floor28
 
3PAR and VMWare
3PAR and VMWare3PAR and VMWare
3PAR and VMWare
 
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
 
(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with Akka
 
DRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 

More from Aerospike, Inc.

Building for a Real-time Data Future_Subbu Iyer.pdf
Building for a Real-time Data Future_Subbu Iyer.pdfBuilding for a Real-time Data Future_Subbu Iyer.pdf
Building for a Real-time Data Future_Subbu Iyer.pdf
Aerospike, Inc.
 
Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...
Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...
Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...
Aerospike, Inc.
 
App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...
App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...
App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...
Aerospike, Inc.
 
Aerospike & AWS Working backward from the customer.pdf
Aerospike &  AWS Working backward from the customer.pdfAerospike &  AWS Working backward from the customer.pdf
Aerospike & AWS Working backward from the customer.pdf
Aerospike, Inc.
 
Update on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdf
Update on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdfUpdate on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdf
Update on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdf
Aerospike, Inc.
 
Developing for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdfDeveloping for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdf
Aerospike, Inc.
 
Aerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdf
Aerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdfAerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdf
Aerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdf
Aerospike, Inc.
 
Now in AI- How we got here_Ashwin Rao.pdf
Now in AI- How we got here_Ashwin Rao.pdfNow in AI- How we got here_Ashwin Rao.pdf
Now in AI- How we got here_Ashwin Rao.pdf
Aerospike, Inc.
 

More from Aerospike, Inc. (8)

Building for a Real-time Data Future_Subbu Iyer.pdf
Building for a Real-time Data Future_Subbu Iyer.pdfBuilding for a Real-time Data Future_Subbu Iyer.pdf
Building for a Real-time Data Future_Subbu Iyer.pdf
 
Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...
Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...
Aerospike & Unity_Connecting Developer Communities_Matt Dondelinger and Stace...
 
App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...
App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...
App Modernization with Aerospike as an Intraday System of Record_Venkat Thamm...
 
Aerospike & AWS Working backward from the customer.pdf
Aerospike &  AWS Working backward from the customer.pdfAerospike &  AWS Working backward from the customer.pdf
Aerospike & AWS Working backward from the customer.pdf
 
Update on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdf
Update on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdfUpdate on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdf
Update on Aerospike Database, Clients and Frameworks_Ronen Botzer.pdf
 
Developing for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdfDeveloping for Real-time_Art Anderson.pdf
Developing for Real-time_Art Anderson.pdf
 
Aerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdf
Aerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdfAerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdf
Aerospike Today and Tomorrow Product Roadmap 2023_Lenley Hensarling.pdf
 
Now in AI- How we got here_Ashwin Rao.pdf
Now in AI- How we got here_Ashwin Rao.pdfNow in AI- How we got here_Ashwin Rao.pdf
Now in AI- How we got here_Ashwin Rao.pdf
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 

What a Modern Database Enables_Srini Srinivasan.pdf

  • 1. What a Modern Database Enables Srini Srinivasan CTO and Founder Aerospike
  • 2. All rights reserved. © 2023 Aerospike, Inc. Our Driving Design Centers 2 Optimizations for Modern System Architectures • CPU and NUMA pinning • Storage tiers (DRAM, NVMe) • Hybrid Memory Architecture • Network to application alignment Massive Parallelism with Indexing • Multi-threaded NUMA architecture • Data distribution across disks, nodes • Client accesses server in single hop • AI/ML Processing Strongly Consistent Transactions • Zero Data Loss • Linearizable reads (tunable) • Read one, write all scheme • Roster concept maximizes availability Geo-Distributed Active-Active System • Uniform data partitioning • Mixed workload handling • Self-managed rack aware clusters • Synch and Asynch Replication
  • 3. All rights reserved. © 2023 Aerospike, Inc. Strongly Consistent Transactions 3
  • 4. All rights reserved. © 2023 Aerospike, Inc. Aerospike Strong Consistency @ 33% less H/W 4 Failure Support – Big Hardware Savings • 1 failure => 2 copies • 2 failures => 3 copies When is data consistent? • Once all nodes respond Aerospike consensus is non-quorum, roster-based How is consistency maintained? • With a roster • Determines cluster health Heartbeats • Exchanged by nodes • CPU unaffected as data/node increases A B Application (Leader) (Follower) Aerospike passes Jepsen tests: https://jepsen.io/analyses/aerospike-3-99-0-3
  • 5. All rights reserved. © 2023 Aerospike, Inc. Strong Consistency (SC) – Write Logic 5 Write to all replicas before return to client w/commits with minimum friction 1. Request 6. Success 2. Write Local 3. Replicate 4. Response 5. Advise Replicated* 3. Replicate 4. Response 5. Advise Replicated* *Advise Replicated is one way and only when more than 1 copy Master Client Replica 2 Replica 1
  • 6. All rights reserved. © 2023 Aerospike, Inc. SC – Linearizable Read Logic 6 Master or replica read alone is sufficient for Sequential Consistency Master Client 1. Request 5. Response 2. Read Local Replica 2 Replica 1 3. Status Request 4. Status Response 3. Status Request 4. Status Response No stale reads possible Extra network round-trip
  • 7. All rights reserved. © 2023 Aerospike, Inc. High Availability in an RF2 Strong Consistency System 7 Synchronous 1M Rack 1 Zone 1 A B C Z 1R 2M 2R 3M 3R C B Read (3R) Write (3M) Write (3M) Read (3M) RF2: Complete copy of data Writes < 10ms Reads < 1ms Automatic Sync RF – Replication Factor Rack 2 Zone 2 Rack Awareness pegs data copies to racks distributed across zones or datacenters within a cluster
  • 8. All rights reserved. © 2023 Aerospike, Inc. Data Availability During Split Brain Events A B C D E A B C D E A B C D E RM R RR M RM R RR M RM RR M R In a healthy cluster the Roster Master is the same as the Master, and the Roster Replica is the same as the replica. Rule 1: A sub-cluster is Active if it has the Roster Master and all Roster Replicas and at least 1 is full. Rule 2: A sub-cluster is Active if it has the majority of nodes and at least one full Roster Master or Roster Replica OR exactly ½ the roster nodes and the Roster Master and the partition is full. A B C D E RM RR M R Rule 3: A sub-cluster is Active if it is a Super Majority Cluster and the partitions are full or subsets
  • 9. All rights reserved. © 2023 Aerospike, Inc. Geo-Distributed Active-Active System 9
  • 10. All rights reserved. © 2023 Aerospike, Inc. Node Add/Remove/Update without Disruption 10 Self-healing, auto-sharding, algorithmic cluster management A B C Z 25% CLUSTER DATA High uptime “Shared Nothing” architecture No single points of failure Self-healing capability Auto rebalance upon node add/remove Data migrates automatically, evenly Set-and-forget DevOps Automatic sharding of data No re-tuning of cluster for use-case changes 25% CLUSTER DATA 25% CLUSTER DATA 25% CLUSTER DATA A B C Z
  • 11. All rights reserved. © 2023 Aerospike, Inc. Global Transactions – Sync Active-Active 11 USA West Rack 1 Node 1 R1 Node 2 Node 3 Geographically distributed strongly consistent transactions at scale Node 7 Node 8 Node 9 M United Kingdom Rack 3 Node 4 R1 Node 5 Node 6 USA East Rack 2 Local apps Roster Membership Based Local apps Local apps Synchronous active-active replication Strong Consistency (linearizable) No data loss Conflict avoidance Auto recovery on single site failure Low latency reads from local rack Single cluster with Racks 1, 2, 3 Automatic Sync Writes ~ 200 ms Reads < 1ms
  • 12. All rights reserved. © 2023 Aerospike, Inc. Distributed Data Hub – Async Active-Active 12 Multiple clusters connected via XDR › Asynchronous active-active replication › Dynamic fine-grained data routing › Relaxed consistency (lag ~ 100ms) › Asynchronous active-active replication › Dynamic fine-grained data routing › Relaxed consistency (lag ~ 100ms) Predictive Analytics Single Source of Truth Legacy Data Store TB’s 100’s PB’s PB’s Edge (ms) Core (ms) Warehouse (sec-to-mins) Location A (SOE) Location B (SOE) Location C (SOE) XDR XDR Real Time System of Record Streaming AI/ML Engines XDR Query & Reporting Store Web Social Data Sources Streaming Video Gaming Enterprise Applications IoT 3rd Party Mobile Features
  • 13. All rights reserved. © 2023 Aerospike, Inc. Optimizations for Modern System Architectures 13
  • 14. All rights reserved. © 2023 Aerospike, Inc. Real-time Read Access to Data in SSD 14 Patented Hybrid Memory ArchitectureTM (HMA) places data on SSD and indexes-only in DRAM Software written in C to natively talk to hardware, not an API layer BLOCK INTERFACE SSD SSD NVME SSD HYBRID-MEMORY ARCHITECTURE™ Direct SSD device access Highly Parallelized Large Block Writes to SSD SSD vendor-optimized Continuous, non-disruptive defrag OS FILE SYSTEM PAGE CACHE BLOCK INTERFACE SSD SSD OTHER DATABASE
  • 15. All rights reserved. © 2023 Aerospike, Inc. Storage Tier Configurations 15 All DRAM All Flash › Index and Data in Flash › Sub 5-millisecond reads & writes › Lower DRAM usage than HMA › Suitable for lots of small objects › Server footprint reduction similar to HMA OPERATIONS EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER WRITE QUEUE BIN 1 BIN 2 BIN 3 STORAGE FLASH INDEX OPERATIONS EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER WRITE QUEUE DEFRAG DATA IN FLASH READS STORAGE Hybrid DRAM/Flash › Index in DRAM, Data in Flash › Sub millisecond reads & writes › 5-10X lower server footprint DRAM INDEX OPERATIONS EXPIRY DIGEST & TREE INFO RECORD METADATA STORAGE POINTER WRITE QUEUE DEFRAG DATA IN FLASH READS STORAGE BIN 1 BIN 2 BIN 3 BIN 1 BIN 2 BIN 3 › Index and Data DRAM › Sub millisecond reads & writes
  • 16. All rights reserved. © 2023 Aerospike, Inc. SLAs versus Scale on Storage Tiers 16 Memory Optimized 512 GiB memory 2 x 1900 GB SSD r6in.16xlarge Storage Optimized 128 GiB memory 2 x 7500 GB SSD im4gn.8xlarge 20 TB Data 37 nodes 20 TB Data Addressable memory space: 512 GiB/node Addressable memory space: 15 TB/node In-Memory All-Flash Hybrid Memory Performance + Cost Affordable Scale 99% < 1ms 99% < 1ms 99% < 10ms Terabytes Petabytes 6 nodes In-Memory All-Flash Hybrid Memory Petabytes
  • 17. All rights reserved. © 2023 Aerospike, Inc. C based DB kernel Optimizations for CPU, Memory, Network 17 ➤ Multi-threaded data structures (NUMA pinned) ➤ Nested locking model for synchronization ➤ Lockless data structures ➤ Partitioned single threaded data structures ➤ Index entries are aligned to cache line (64 bytes) ➤ Custom memory management (arenas) Memory Arena Assignment Multi-core Architecture NIC Queue NIC Queue NIC Queue NIC Queue NIC NIC IRQ Binding Core Core Core Core CPU Socket NIC IRQ Binding NIC IRQ Binding Core Core Core Core CPU Socket NIC IRQ Binding
  • 18. All rights reserved. © 2023 Aerospike, Inc. Massive Parallelism with Indexing 18
  • 19. All rights reserved. © 2023 Aerospike, Inc. Data distribution Intelligent Data Partitioning Eliminates Hotspots 19 Data distribution is deterministic, uniform and algorithmic Even amount data on every node and on every flash device Load balanced continually and automatically on all servers, even while scaling up/down or with cluster reconfigurations No retuning for new use cases (same scheme/algos) Partition Id Leader Replica 1 Replica 2 Replica 3 Replica 4 0 B D E A C 1 E C A D B 2 C B E A D … … … … … … 4095 A E B D C A B C D E
  • 20. All rights reserved. © 2023 Aerospike, Inc. Remove bottlenecks: Same low latency from 1st GB to the 1st PB… Smart Client TM Direct Path to Data (single-hop) 20 Each nodes knows where all data resides via Smart ClientTM Client is 1st -class participant in architecture and data fabric Continuously updates Calculates Partition ID to determine Node ID Cluster-spanning operations (scan, query, batch) sent to all processing nodes for parallel processing Executes operations APIs (e.g. CRUD+)
  • 21. All rights reserved. © 2023 Aerospike, Inc. Secondary Indexes – Parallel Query Execution b1:r1 b2:r1 … b1:r2 b2:r4 … b5:r3 b2:r9 … . . . P1 P2 Px SECONDARY INDEX PRIMARY INDEX RECORD RECORD RECORD RECORD SSD SSD DRAM … Query • Value-based lookup • Via secondary index • Similar to SQL “select” Parallel execution • Per partition • Scatter-gather scheme • Multiple threads across nodes Parallel access efficient for “low selectivity indices Support equality matches, range queries: Integer, double, string, blob
  • 22. All rights reserved. © 2023 Aerospike, Inc. A B C CLIENT 22 % OF CLUSTER DATA 11% SSD 1 11% SSD 2 11% SSD 3 Massively Parallel Architecture Data distribution is deterministic, uniform and algorithmic Data distribution Even amount data on every node and on every flash device Load balanced continually and automatically on all servers, even while scaling up/down or with cluster reconfigurations No retuning for new use cases (same scheme/algos) No hot spots with intelligent auto-sharding 33% CLUSTER DATA 0 33% CLUSTER DATA 33% CLUSTER DATA A B C
  • 23. All rights reserved. © 2023 Aerospike, Inc. Summary 23 Optimizations for Modern System Architectures • CPU and NUMA pinning • Storage tiers (DRAM, NVMe) • Hybrid Memory Architecture • Network to application alignment Massive Parallelism with Indexing • Multi-threaded NUMA architecture • Data distribution across disks, nodes • Client accesses server in single hop • AI/ML Processing Strongly Consistent Transactions • Zero Data Loss • Linearizable reads (tunable) • Read one, write all scheme • Roster concept maximizes availability Geo-Distributed Active-Active System • Uniform data partitioning • Mixed workload handling • Self-managed rack aware clusters • Synch and Asynch Replication
  • 24. All rights reserved. © 2023 Aerospike, Inc. Thank You 24