Scylla Summit 2018: OLAP or OLTP? Why Not Both?

OLAP or OLTP
Why not both?
Glauber Costa
VP Field Engineering, ScyllaDB

Presenter bio
Glauber Costa is VP of Field Engineering at ScyllaDB. He shares
his time between the engineering department working on
upcoming Scylla features and helping customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the Linux
Kernel for 10 years, with contributions into the Xen and KVM
Hypervisors and all sorts of guest functionality and containers.

The road ahead
▪ Scylla celebrates its 4th birthday.
• Performance leadership solidified, TPC design spreading.
▪ Performance is always in our radar and we’ll keep improving.
• But what’s next?

What’s next?
Mina Naguib is the Director of Site Reliability Engineering at Samsung ADS

Let’s make it (more) BORING!

The two major workload types
Analytics (OLAP)
▪ minutes, hours, days
▪ TB / PB of data per operation
▪ throughput oriented
▪ high parallelism

Two major workload types
Analytics (OLAP)
▪ minutes, hours, days
▪ TB / PB of data per operation
▪ throughput oriented
▪ high parallelism
Real-time (OLTP)
▪ microseconds, milliseconds
▪ kB of data per operation
▪ latency oriented
▪ low/moderate parallelism

OLTP-optimized doing OLAP?
or
OLAP-optimized doing OLTP?

The role of money
Things that money can buy
▪ Food
▪ Clothes
▪ A house where I am from
▪ Throughput

The role of money
Things that money can buy
▪ Food
▪ Clothes
▪ A house where I am from
▪ Throughput
Things that money cannot buy
▪ Love
▪ Happiness
▪ A house in the Bay Area
▪ Latencies

Shared clusters- the tuning conundrum
▪ Tune for latencies: throughput suffers
▪ Tune for throughput: latency suffers
▪ Patterns are seasonal. Which one to use as a tuning base?

Classical Solution
Real Time Data Center Analytics Data Center
DATABASEDATABASE

Cost/year for 150TB of replicated data
(price based on AWS i3.metal)
Hardware Estimated waste % Estimated waste $
1 DC (10 instances) USD 278,560.00 40% USD 167,136.00
2 DC (20 instances) USD 557,120.00 40% + 40% USD 334,272.00
Plus increased maintenance costs on admin and tuning!
Total now is 20 instances
Example:
Capacity per instance: 15TB
Minimum amount of instances: 10
Assumptions:
Real time workload is latency sensitive. Only uses 60% of resources.
Analytics don’t run constantly, therefore only uses 60% of resources.

What is your database running?
▪ Foreground, user-generated workload
• user queries, user updates
▪ Background, maintenance operations
• Some are proportional to user workload (compactions)
• Some are maintenance generated (repair)

I/O Scheduling
Query
Commitlog
Compaction
Queue
Queue
Userspace
I/O
Scheduler
Disk
Max useful disk concurrency
I/O queued in FS/deviceNo queues
Queue

CPU Scheduling
read write read Compaction
CPU
CPU
Compaction
SSTable write
SSTable write
read write readread write read

Which tasks to run?
100 shares
100 shares

Which tasks to run?
100 shares
50 shares

▪ Strong mathematical foundation on control theory
▪ Automatically adjust to any incoming workload
Controlled processes

Real time vs Analytics in the same DC
▪ Scylla controllers: background has limited impact.
▪ Workloads affect each other - but user has control
▪ Careful restriction of parallelism:
• Run a single DC today.

Real time vs Analytics in the same DC
▪ Scylla controllers: background has limited impact.
▪ Workloads affect each other - but user has control
▪ Careful restriction of parallelism:
• Run a single DC today.
Don’t miss the Kiwi.com talk and see this in practice

Real time vs Analytics 1.5TB of Data, 1 Node.
200k/s Random queries, 0% cache hit rate.

Average latency: 750us

p95 latency: 1.9ms

Real time vs Analytics
p95 latency: 1.9ms
p99 latency: 3.3ms
1.5TB of Data, 1 Node.

Real time vs Analytics Analytics runs together with real time queries

average: 3.7ms
Analytics runs together with real time queries

p95: 13.4ms

p99: 60.2ms
p99: 28.7ms

Real time vs Analytics With the node at 100% real time
throughput suffers

Not able to sustain 200k/s continuously
With the node at 100% real time
throughput suffers

Real time vs Analytics Analytics runs together with real time
queries
Impact can be reduced by carefully tuning
parallelism of analytics
Analytics parallelism greatly reduced:

p99: 14.5ms
p95: 5.3ms
average: 2ms
Analytics runs together with real time
queries

p99 Visual Comparison
original parallelism
(30 ms)
fine tuned parallelism (10 ms)
Analytics runs together with real time
queries

▪ User knows the expected priorities. We just have to be told.
▪ Any query executed under role analytics will be constrained
by its share of the system’s resources
How we do better
CREATE ROLE analytics
WITH LOGIN = true
AND SERVICE_LEVEL = { ‘shares’: 200 };

Real time vs Analytics Analytics are ISOLATED and run together
with real time queries
Analytics Parallelism is set to a high number.

average: 2ms
Analytics are ISOLATED and run together

p95: 4ms

p99: 6.7ms

p99 Visual comparison
non-isolated (30ms)
isolated (6.7 ms)

Time spent tuning:
zero femtoseconds.

Summary
▪ Scylla is a great choice for Real Time + Analytics
▪ ScyllaDB delivers, today, a very compelling and flexible solution
▪ We will improve on our solid foundations built on latency
guarantees to make this use case even more compelling.
▪ Scylla is fast, but...

Performance is
yesterday’s news

Thank You
Any Questions ?
Please stay in touch
glauber@scylladb.com
@glcst

Scylla Summit 2018: OLAP or OLTP? Why Not Both?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scylla Summit 2018: OLAP or OLTP? Why Not Both?

Similar to Scylla Summit 2018: OLAP or OLTP? Why Not Both? (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2018: OLAP or OLTP? Why Not Both?