OldSQL to NewSQL

CrateDB & PostgreSQL
OldSQL to NewSQL
11th July 2017
@claus__m

About
~2yrs at Crate.io
DevRel/Field Engineering/Support/
Integrations/…
Speaking
Conferences, meetups, ...
Working with customers
Consulting, pre- and post-sales
@claus__m

Agenda
Failures
What, how, and when?
PostgreSQL
Concept overview
CrateDB
Concept overview
Discussion
NewSQL or not? Benefits and drawbacks.
Use Cases
Wrap up
@claus__m

Database Failures
Consequences
Data loss
Lost updates, dirty reads, ...
Service interruptions
Services can’t work without their database
Slow performance
Users may lose interest
Pressure
DBAs in the spotlight
@claus__m

What Makes Databases
Fail?
Overloaded
Insufficient hardware (RAM, CPU, disk),
swapping, inefficient queries
Failure
Hardware may fail on many levels: e.g.
Network, disk, RAM
Platform
Configuration errors, updates, resource
sharing, bugs
People
Malicious intent, sloppiness, ...
@claus__m

Overloaded
Insufficient hardware (RAM, CPU, disk),
swapping, inefficient queries
Failure
Hardware may fail on many levels: e.g.
Network, disk, RAM
Platform
Configuration errors, updates, resource
sharing, bugs
People
Malicious intent, sloppiness, ...
@claus__m
What Makes Databases
Fail?

Overview
Concepts and other things
Index and data
How the database creates indices, stores and
retrieves data
Search and scans
How the data is found
Replication and high availability
Distribution and achieving zero downtime
@claus__m
Assessment

Overview
Multi-process System
fork() to clone processes from postmaster to
postgres instances with shared memory
Technology
C/C++ based natively compiled
Optimization
Cost-based optimizer
Transactional
ACID compliant
@claus__m

Index And Data
Tree-based
An in-memory B-Tree, defined in CREATE
TABLE or ALTER TABLE
In Memory & On Disk
8K data pages in shared buffer cache and on
disk
Item Pointers
Only major changes are reflected in the index
(e.g. INSERT/DELETES)
@claus__m

@claus__mhttp://use-the-index-luke.com/sql/anatomy/the-tree

Searches And Scans
Sequential
Go over every block and execute a predicate
Index-based
Find something using an index on that column,
or a full index scan
Bitmap-based
Mark matches in boolean queries for results
@claus__m

Replication And
High Availability
Disk based
By sharing a disk or continuously cloning a disk
Log-shipping
Send the write-ahead-log to the standby server,
which can answer reads
Master/Master
Sends rows to the other master, can answer
reads and writes, locks rows/tables
Client-sharding
Shard the data on a client/proxy and route
accordingly
@claus__m

Overview
Multi-threaded System
Thread-pools to read/write Lucene segments
Technology
Java/JVM based
Optimization
Naive optimization on query levels
Eventually Consistent
Atomic operations per row, optimistic
concurrency only
Distributed By Default
Transparent partitioning and sharding @claus__m

Index And Data
Inverted index
Term dictionary where field values point to
rows (posting list)
Field cache
“Inverted inverted index”, column names point
to the possible values and their rows
On disk, cached in memory
Immutable segments on disk, binary search in
each segment, cached with mmap() into ram
pages
@claus__m

Example Posting List
@claus__m

Index And Data
@claus__m
Shards
Compounds of multiple immutable segments,
merged occasionally
Rows are documents, columns are fields
Vector space model to weight and score
searches (_score field)
Multi-threaded index access
Shards are multiple segments, each is read
with a thread

Replication And
High Availability
Shared nothing architecture
Every node handles every task
Shard-based
Replicas are copies of shards that are
distributed in the cluster evenly
Consistency
Elected leader maintains and distributes a
consistent cluster state
CAP
Tuneable consistency with synchronous inserts
@claus__m

PostgreSQL: Strengths
Single-Node-Performance
Predictable and fast
SQL Sophistication
Lots of features, many of them heavily
optimized
Transactions
ACID compliance, concurrency control
@claus__m

PostgreSQL: Weaknesses
Distribution
High availability or working with huge data sets
requires 3rd party software, partitioning
Ingest speed
ACID compliance slows down inserts
Operational Complexity/DevOps Readiness
Highly controllable features make it hard to
manage
Schema Flexibility
Schema evolution management required
@claus__m

CrateDB: Strengths
Distribution
Distributed by nature, with tunable consistency
Ingest speed
Solid insert speeds with bulk inserts
Operational Complexity/DevOps Readiness
High flexibility, containerization, sane defaults
Schema Flexibility
Schema evolution on the fly
Built-in Search
Fulltext capabilities
@claus__m

CrateDB: Weaknesses
Single-Node-Performance
Distribution overhead requires a certain cluster
size to be efficient
SQL Features
Many features are yet missing or hard to do in
a distributed system
Transactions
No ACID compliance, eventual
consistency/optimistic concurrency requires
client-side handling
@claus__m

Use Cases: PostgreSQL
ORMs
Broad integration in various object-relational
mappers in frameworks (hibernate, …)
Transaction-based workloads
Single, high-value transactions
Extensive SQL compliance
Required support for views, stored procedures,
…
Small data sets
Hundreds of MBs to several GB
@claus__m

Use Cases: CrateDB
DevOps
Flexible schemas, ad-hoc queries, easy
maintenance
Analytics, machine learning
Large scale inserts/queries, high concurrency,
SQL
Fulltext search
Built-in tools for text-mining/analysis, built on
the de-facto standard of search
@claus__m

Thanks!
Links
https://github.com/crate
https://crate.io
Follow us on twitter
@crateio @claus__m
Next webinar: Scale your SQL database
with Docker, 27th July

OldSQL to NewSQL

More Related Content

What's hot

Similar to OldSQL to NewSQL

Recently uploaded

OldSQL to NewSQL