OLTP and Analytics are very different. One is characterized by many concurrent small requests, with a high sensitivity to latency, while the other typically processes large streams of data with more emphasis on throughput.
The talk will cover:
- the different requirements of the two workloads
- how ScyllaDB optimizes for both
- performance isolation of different workloads within ScyllaDB
- how ScyllaDB supports concurrent OLTP and Analytics without sacrificing either latency or throughput
- measurements
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
1. OLAP or OLTP
Why not both?
Glauber Costa
VP Field Engineering, ScyllaDB
2. Presenter bio
Glauber Costa is VP of Field Engineering at ScyllaDB. He shares
his time between the engineering department working on
upcoming Scylla features and helping customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the Linux
Kernel for 10 years, with contributions into the Xen and KVM
Hypervisors and all sorts of guest functionality and containers.
3. The road ahead
▪ Scylla celebrates its 4th birthday.
• Performance leadership solidified, TPC design spreading.
▪ Performance is always in our radar and we’ll keep improving.
• But what’s next?
6. The two major workload types
Analytics (OLAP)
▪ minutes, hours, days
▪ TB / PB of data per operation
▪ throughput oriented
▪ high parallelism
7. Two major workload types
Analytics (OLAP)
▪ minutes, hours, days
▪ TB / PB of data per operation
▪ throughput oriented
▪ high parallelism
Real-time (OLTP)
▪ microseconds, milliseconds
▪ kB of data per operation
▪ latency oriented
▪ low/moderate parallelism
9. The role of money
Things that money can buy
▪ Food
▪ Clothes
▪ A house where I am from
▪ Throughput
10. The role of money
Things that money can buy
▪ Food
▪ Clothes
▪ A house where I am from
▪ Throughput
Things that money cannot buy
▪ Love
▪ Happiness
▪ A house in the Bay Area
▪ Latencies
11. Shared clusters- the tuning conundrum
▪ Tune for latencies: throughput suffers
▪ Tune for throughput: latency suffers
▪ Patterns are seasonal. Which one to use as a tuning base?
13. Cost/year for 150TB of replicated data
(price based on AWS i3.metal)
Hardware Estimated waste % Estimated waste $
1 DC (10 instances) USD 278,560.00 40% USD 167,136.00
2 DC (20 instances) USD 557,120.00 40% + 40% USD 334,272.00
Plus increased maintenance costs on admin and tuning!
Total now is 20 instances
Example:
Capacity per instance: 15TB
Minimum amount of instances: 10
Assumptions:
Real time workload is latency sensitive. Only uses 60% of resources.
Analytics don’t run constantly, therefore only uses 60% of resources.
16. What is your database running?
▪ Foreground, user-generated workload
• user queries, user updates
▪ Background, maintenance operations
• Some are proportional to user workload (compactions)
• Some are maintenance generated (repair)
21. ▪ Strong mathematical foundation on control theory
▪ Automatically adjust to any incoming workload
Controlled processes
22. Real time vs Analytics in the same DC
▪ Scylla controllers: background has limited impact.
▪ Workloads affect each other - but user has control
▪ Careful restriction of parallelism:
• Run a single DC today.
23. Real time vs Analytics in the same DC
▪ Scylla controllers: background has limited impact.
▪ Workloads affect each other - but user has control
▪ Careful restriction of parallelism:
• Run a single DC today.
Don’t miss the Kiwi.com talk and see this in practice
24. Real time vs Analytics 1.5TB of Data, 1 Node.
200k/s Random queries, 0% cache hit rate.
25. Real time vs Analytics 1.5TB of Data, 1 Node.
200k/s Random queries, 0% cache hit rate.
Average latency: 750us
26. Real time vs Analytics 1.5TB of Data, 1 Node.
200k/s Random queries, 0% cache hit rate.
Average latency: 750us
p95 latency: 1.9ms
27. Real time vs Analytics
Average latency: 750us
p95 latency: 1.9ms
p99 latency: 3.3ms
1.5TB of Data, 1 Node.
200k/s Random queries, 0% cache hit rate.
28. Real time vs Analytics Analytics runs together with real time queries
29. Real time vs Analytics
average: 3.7ms
Analytics runs together with real time queries
30. Real time vs Analytics
p95: 13.4ms
Analytics runs together with real time queries
31. Real time vs Analytics
p99: 60.2ms
p99: 28.7ms
Analytics runs together with real time queries
32. Real time vs Analytics With the node at 100% real time
throughput suffers
33. Real time vs Analytics
Not able to sustain 200k/s continuously
With the node at 100% real time
throughput suffers
34. Real time vs Analytics Analytics runs together with real time
queries
Impact can be reduced by carefully tuning
parallelism of analytics
Analytics parallelism greatly reduced:
35. Real time vs Analytics
p99: 14.5ms
p95: 5.3ms
average: 2ms
Analytics runs together with real time
queries
Impact can be reduced by carefully tuning
parallelism of analytics
Analytics parallelism greatly reduced:
36. p99 Visual Comparison
original parallelism
(30 ms)
fine tuned parallelism (10 ms)
Analytics runs together with real time
queries
Impact can be reduced by carefully tuning
parallelism of analytics
Analytics parallelism greatly reduced:
39. ▪ User knows the expected priorities. We just have to be told.
▪ Any query executed under role analytics will be constrained
by its share of the system’s resources
How we do better
CREATE ROLE analytics
WITH LOGIN = true
AND SERVICE_LEVEL = { ‘shares’: 200 };
40. Real time vs Analytics Analytics are ISOLATED and run together
with real time queries
Analytics Parallelism is set to a high number.
41. Real time vs Analytics
average: 2ms
Analytics are ISOLATED and run together
with real time queries
Analytics Parallelism is set to a high number.
42. Real time vs Analytics
p95: 4ms
Analytics are ISOLATED and run together
with real time queries
Analytics Parallelism is set to a high number.
43. Real time vs Analytics
p99: 6.7ms
Analytics are ISOLATED and run together
with real time queries
Analytics Parallelism is set to a high number.
46. Summary
▪ Scylla is a great choice for Real Time + Analytics
▪ ScyllaDB delivers, today, a very compelling and flexible solution
▪ We will improve on our solid foundations built on latency
guarantees to make this use case even more compelling.
▪ Scylla is fast, but...