Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scylla Summit 2018: OLAP or OLTP? Why Not Both?

1,253 views

Published on

OLTP and Analytics are very different. One is characterized by many concurrent small requests, with a high sensitivity to latency, while the other typically processes large streams of data with more emphasis on throughput.
The talk will cover:
- the different requirements of the two workloads
- how ScyllaDB optimizes for both
- performance isolation of different workloads within ScyllaDB
- how ScyllaDB supports concurrent OLTP and Analytics without sacrificing either latency or throughput
- measurements

Published in: Software
  • Be the first to comment

Scylla Summit 2018: OLAP or OLTP? Why Not Both?

  1. 1. OLAP or OLTP Why not both? Glauber Costa VP Field Engineering, ScyllaDB
  2. 2. Presenter bio Glauber Costa is VP of Field Engineering at ScyllaDB. He shares his time between the engineering department working on upcoming Scylla features and helping customers succeed. Before ScyllaDB, Glauber worked with Virtualization in the Linux Kernel for 10 years, with contributions into the Xen and KVM Hypervisors and all sorts of guest functionality and containers.
  3. 3. The road ahead ▪ Scylla celebrates its 4th birthday. • Performance leadership solidified, TPC design spreading. ▪ Performance is always in our radar and we’ll keep improving. • But what’s next?
  4. 4. What’s next? Mina Naguib is the Director of Site Reliability Engineering at Samsung ADS
  5. 5. Let’s make it (more) BORING!
  6. 6. The two major workload types Analytics (OLAP) ▪ minutes, hours, days ▪ TB / PB of data per operation ▪ throughput oriented ▪ high parallelism
  7. 7. Two major workload types Analytics (OLAP) ▪ minutes, hours, days ▪ TB / PB of data per operation ▪ throughput oriented ▪ high parallelism Real-time (OLTP) ▪ microseconds, milliseconds ▪ kB of data per operation ▪ latency oriented ▪ low/moderate parallelism
  8. 8. OLTP-optimized doing OLAP? or OLAP-optimized doing OLTP?
  9. 9. The role of money Things that money can buy ▪ Food ▪ Clothes ▪ A house where I am from ▪ Throughput
  10. 10. The role of money Things that money can buy ▪ Food ▪ Clothes ▪ A house where I am from ▪ Throughput Things that money cannot buy ▪ Love ▪ Happiness ▪ A house in the Bay Area ▪ Latencies
  11. 11. Shared clusters- the tuning conundrum ▪ Tune for latencies: throughput suffers ▪ Tune for throughput: latency suffers ▪ Patterns are seasonal. Which one to use as a tuning base?
  12. 12. Classical Solution Real Time Data Center Analytics Data Center DATABASEDATABASE
  13. 13. Cost/year for 150TB of replicated data (price based on AWS i3.metal) Hardware Estimated waste % Estimated waste $ 1 DC (10 instances) USD 278,560.00 40% USD 167,136.00 2 DC (20 instances) USD 557,120.00 40% + 40% USD 334,272.00 Plus increased maintenance costs on admin and tuning! Total now is 20 instances Example: Capacity per instance: 15TB Minimum amount of instances: 10 Assumptions: Real time workload is latency sensitive. Only uses 60% of resources. Analytics don’t run constantly, therefore only uses 60% of resources.
  14. 14. How can Scylla help you now ?
  15. 15. What is your database running? ▪ Foreground, user-generated workload • user queries, user updates ▪ Background, maintenance operations • Some are proportional to user workload (compactions) • Some are maintenance generated (repair)
  16. 16. I/O Scheduling Query Commitlog Compaction Queue Queue Userspace I/O Scheduler Disk Max useful disk concurrency I/O queued in FS/deviceNo queues Queue
  17. 17. CPU Scheduling read write read Compaction CPU CPU Compaction SSTable write SSTable write read write readread write read
  18. 18. Which tasks to run? 100 shares 100 shares
  19. 19. Which tasks to run? 100 shares 50 shares
  20. 20. ▪ Strong mathematical foundation on control theory ▪ Automatically adjust to any incoming workload Controlled processes
  21. 21. Real time vs Analytics in the same DC ▪ Scylla controllers: background has limited impact. ▪ Workloads affect each other - but user has control ▪ Careful restriction of parallelism: • Run a single DC today.
  22. 22. Real time vs Analytics in the same DC ▪ Scylla controllers: background has limited impact. ▪ Workloads affect each other - but user has control ▪ Careful restriction of parallelism: • Run a single DC today. Don’t miss the Kiwi.com talk and see this in practice
  23. 23. Real time vs Analytics 1.5TB of Data, 1 Node. 200k/s Random queries, 0% cache hit rate.
  24. 24. Real time vs Analytics 1.5TB of Data, 1 Node. 200k/s Random queries, 0% cache hit rate. Average latency: 750us
  25. 25. Real time vs Analytics 1.5TB of Data, 1 Node. 200k/s Random queries, 0% cache hit rate. Average latency: 750us p95 latency: 1.9ms
  26. 26. Real time vs Analytics Average latency: 750us p95 latency: 1.9ms p99 latency: 3.3ms 1.5TB of Data, 1 Node. 200k/s Random queries, 0% cache hit rate.
  27. 27. Real time vs Analytics Analytics runs together with real time queries
  28. 28. Real time vs Analytics average: 3.7ms Analytics runs together with real time queries
  29. 29. Real time vs Analytics p95: 13.4ms Analytics runs together with real time queries
  30. 30. Real time vs Analytics p99: 60.2ms p99: 28.7ms Analytics runs together with real time queries
  31. 31. Real time vs Analytics With the node at 100% real time throughput suffers
  32. 32. Real time vs Analytics Not able to sustain 200k/s continuously With the node at 100% real time throughput suffers
  33. 33. Real time vs Analytics Analytics runs together with real time queries Impact can be reduced by carefully tuning parallelism of analytics Analytics parallelism greatly reduced:
  34. 34. Real time vs Analytics p99: 14.5ms p95: 5.3ms average: 2ms Analytics runs together with real time queries Impact can be reduced by carefully tuning parallelism of analytics Analytics parallelism greatly reduced:
  35. 35. p99 Visual Comparison original parallelism (30 ms) fine tuned parallelism (10 ms) Analytics runs together with real time queries Impact can be reduced by carefully tuning parallelism of analytics Analytics parallelism greatly reduced:
  36. 36. We can do better.
  37. 37. How we do better
  38. 38. ▪ User knows the expected priorities. We just have to be told. ▪ Any query executed under role analytics will be constrained by its share of the system’s resources How we do better CREATE ROLE analytics WITH LOGIN = true AND SERVICE_LEVEL = { ‘shares’: 200 };
  39. 39. Real time vs Analytics Analytics are ISOLATED and run together with real time queries Analytics Parallelism is set to a high number.
  40. 40. Real time vs Analytics average: 2ms Analytics are ISOLATED and run together with real time queries Analytics Parallelism is set to a high number.
  41. 41. Real time vs Analytics p95: 4ms Analytics are ISOLATED and run together with real time queries Analytics Parallelism is set to a high number.
  42. 42. Real time vs Analytics p99: 6.7ms Analytics are ISOLATED and run together with real time queries Analytics Parallelism is set to a high number.
  43. 43. p99 Visual comparison non-isolated (30ms) isolated (6.7 ms)
  44. 44. Time spent tuning: zero femtoseconds.
  45. 45. Summary ▪ Scylla is a great choice for Real Time + Analytics ▪ ScyllaDB delivers, today, a very compelling and flexible solution ▪ We will improve on our solid foundations built on latency guarantees to make this use case even more compelling. ▪ Scylla is fast, but...
  46. 46. Performance is yesterday’s news
  47. 47. Let’s make it boring.
  48. 48. Thank You Any Questions ? Please stay in touch glauber@scylladb.com @glcst

×