Database Performance at Scale Masterclass: Workload Characteristics by Felipe Mendes

ScyllaDB
ScyllaDBScyllaDB
Workload Characteristics
Felipe Mendes
Felipe Cardeneti Mendes
● Solution Architect at ScyllaDB
● Linux and Open Source enthusiast
● Yes, I am Brazilian… :-)
Agenda
● What makes distributed databases so hard?
● What do we mean by "Performance at Scale"?
● What should I know?
● What should I do?
What Makes Distributed Databases so Hard?
4
Several options to choose from:
■ Distinct architectures
■ Different deployment models
■ Independent characteristics
What Makes Distributed Databases so Hard?
5
Confusing Service Tiers even under a single solution:
■ Good luck figuring out the AWS DynamoDB model
■ To serverless or not?
■ Which features do I actually need?
What Makes Distributed Databases so Hard?
Replication Strategies:
■ Single leader:
○ Aka active/passive
○ Split brain
○ Asynchronous vs Synchronous Replication
○ Leader election process
6
Designing Data Intensive Applications. Kleppmann, Martin. Chapter 5 – Replication, p.153
What Makes Distributed Databases so Hard?
Replication Strategies:
■ Multi leader:
○ Increased complexity
○ Handle conflicts
○ Asynchronous vs Synchronous Replication
○ Leader election process
7
Designing Data Intensive Applications. Kleppmann, Martin. Chapter 5 – Replication, p.169
What Makes Distributed Databases so Hard?
Replication Strategies:
■ Leaderless:
○ Clients write and read from any replicas
○ Most highly available approach
○ Different approaches for conflict resolution (see for ScyllaDB)
○ Tunable consistency, anti-entropy, hinted handoff, …
8
Designing Data Intensive Applications. Kleppmann, Martin. Chapter 5 – Replication, p.180
What Makes Distributed Databases so Hard?
Different storage organization:
■ Row oriented
■ Column oriented
Not to mention models:
9
What Makes Distributed Databases so Hard?
All of this…
Plus the fact that databases always get the
blame when things go wrong…
No wonder we decided to write a book. :-)
10
What do we mean by
"Performance at Scale"?
11
What do we mean by "Performance at Scale"?
12
Holistic view:
■ Context is fundamental for performance
■ It is not JUST about being "faster"
■ It is about wisely picking your battles
The different GCP disk types each meet these requirements in different ways. It would be all
too convenient if we could combine both disk types into one super-disk. Since our primary
focus for disk performance was low-latency reads, we would love to read from GCP's Local
SSDs (low latency) while still writing to Persistent Disks (snapshotting, redundancy via
replication). But is there a way to create such a super-disk at the software level?
How Discord Supercharges Network Disks for Extreme Low Latency
What do we mean by "Performance at Scale"?
13
Economics come into play:
■ Be realistic
■ Ready for the future
■ WHY are writes on DynamoDB 5x more
expensive than reads?
○ Writes are always done to a majority of replicas
○ B-trees require more work on the write path
○ Plus keep up with a replication log
AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)
Pricing for Provisioned Capacity
What do we mean by "Performance at Scale"?
14
Understand your limits!
Saturation
Region
Linear
Region
Retrograde
Region
100
kops/s
Road to hell
How Optimizely (Safely)
Maximizes Database Concurrency
What do we mean by "Performance at Scale"?
15
And when to stop worrying ;-)
Interaction Latency: Square’s User-Centric Mobile Performance Metric
What do we mean by "Performance at Scale"?
16
It must work.
■ Efficient hardware resource use
■ Resilient
■ Highly available
■ Developer friendly
■ Integrates with your existing ecosystem
What should I know?
17
What should I know?
18
One size doesn't fit all.
■ A previous success does not guarantee
another:
○ Databases are different
○ Workloads are different
○ Patterns and requirements often differ
What should I know?
19
Understand your workload
What should I know?
20
… and your Access Patterns!
■ Are ACID Transactions a MUST?
■ What level of querying flexibility (ad-hoc) is needed?
■ What's the ratio of reads vs writes?
■ How much traffic fluctuation exists?
○ POLEMIC MODE ON: Do auto-scaling even exists?
The problems with DynamoDB Auto Scaling and how it might be
improved – HackerNoon
What should I know?
21
Prepare for the worst.
■ … And your budget:
○ Single zone vs Multi zone vs Multi region
■ Simulate failure scenarios:
○ What if a node goes down?
○ … and a full zone?
○ … and different zones?
■ And meanwhile…
Understanding and Mitigating Distributed Database Network Costs
with ScyllaDB
OBSERVE
What should I do?
23
What should I do?
24
Write-heavy workloads:
■ Primarily shine under LSM trees
○ Append-only data structure
○ Resulting files are sorted and immutable
○ Compactions take care of merging data over time
■ ScyllaDB, Cassandra, BigTable, Apache HBase
What should I do?
25
Delete-heavy workloads:
■ Interestingly enough, deletes are writes! :-)
■ Even more so under LSM tree engines:
○ As resulting files are immutable, tombstones get written
○ Several tombstones can easily degrade the read path
○ Compaction optimizations exist (such as ICS & LCS)
○ ScyllaDB specific (exclusive?) optimization: empty replica pages
■ Interesting problems emerge when using your database
as a queue
■ And also interesting solutions :-)
What should I do?
26
Read-heavy workloads:
■ B+Tree engines (such as InnoDB) structure shines when
reads have to go disk
■ Conversely, LSM Trees may require touching multiple files
■ Evaluate:
○ Cache hit rate vs Cache misses
○ Data set size and projected growth
■ Most relational DBs, DynamoDB, Couchbase, MongoDB
What should I do?
27
Mixed workloads (or, when things start to get REALLY interesting):
■ From the perspective of the database:
○ 2 competing workloads
○ No secret sauce
○ While writes are expensive on B-trees, reads suffer on LSM trees
■ Keep an eye on disks performance:
○ Often reading cold data can easily introduce a bottleneck
○ Similarly, retrieving large datasets may cause the cache to become ineffective
○ Consider strategies such as BYPASS CACHE
■ Consider prioritizing each workload to meet your SLAs.
Keep in touch!
Felipe Mendes
Solution Architect
ScyllaDB
felipemendes@scylladb.com
LinkedIn
1 of 28

More Related Content

Similar to Database Performance at Scale Masterclass: Workload Characteristics by Felipe Mendes(20)

More from ScyllaDB(20)

ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
ScyllaDB177 views
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
ScyllaDB79 views

Recently uploaded(20)

Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver23 views
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting170 views

Database Performance at Scale Masterclass: Workload Characteristics by Felipe Mendes

  • 2. Felipe Cardeneti Mendes ● Solution Architect at ScyllaDB ● Linux and Open Source enthusiast ● Yes, I am Brazilian… :-)
  • 3. Agenda ● What makes distributed databases so hard? ● What do we mean by "Performance at Scale"? ● What should I know? ● What should I do?
  • 4. What Makes Distributed Databases so Hard? 4 Several options to choose from: ■ Distinct architectures ■ Different deployment models ■ Independent characteristics
  • 5. What Makes Distributed Databases so Hard? 5 Confusing Service Tiers even under a single solution: ■ Good luck figuring out the AWS DynamoDB model ■ To serverless or not? ■ Which features do I actually need?
  • 6. What Makes Distributed Databases so Hard? Replication Strategies: ■ Single leader: ○ Aka active/passive ○ Split brain ○ Asynchronous vs Synchronous Replication ○ Leader election process 6 Designing Data Intensive Applications. Kleppmann, Martin. Chapter 5 – Replication, p.153
  • 7. What Makes Distributed Databases so Hard? Replication Strategies: ■ Multi leader: ○ Increased complexity ○ Handle conflicts ○ Asynchronous vs Synchronous Replication ○ Leader election process 7 Designing Data Intensive Applications. Kleppmann, Martin. Chapter 5 – Replication, p.169
  • 8. What Makes Distributed Databases so Hard? Replication Strategies: ■ Leaderless: ○ Clients write and read from any replicas ○ Most highly available approach ○ Different approaches for conflict resolution (see for ScyllaDB) ○ Tunable consistency, anti-entropy, hinted handoff, … 8 Designing Data Intensive Applications. Kleppmann, Martin. Chapter 5 – Replication, p.180
  • 9. What Makes Distributed Databases so Hard? Different storage organization: ■ Row oriented ■ Column oriented Not to mention models: 9
  • 10. What Makes Distributed Databases so Hard? All of this… Plus the fact that databases always get the blame when things go wrong… No wonder we decided to write a book. :-) 10
  • 11. What do we mean by "Performance at Scale"? 11
  • 12. What do we mean by "Performance at Scale"? 12 Holistic view: ■ Context is fundamental for performance ■ It is not JUST about being "faster" ■ It is about wisely picking your battles The different GCP disk types each meet these requirements in different ways. It would be all too convenient if we could combine both disk types into one super-disk. Since our primary focus for disk performance was low-latency reads, we would love to read from GCP's Local SSDs (low latency) while still writing to Persistent Disks (snapshotting, redundancy via replication). But is there a way to create such a super-disk at the software level? How Discord Supercharges Network Disks for Extreme Low Latency
  • 13. What do we mean by "Performance at Scale"? 13 Economics come into play: ■ Be realistic ■ Ready for the future ■ WHY are writes on DynamoDB 5x more expensive than reads? ○ Writes are always done to a majority of replicas ○ B-trees require more work on the write path ○ Plus keep up with a replication log AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) Pricing for Provisioned Capacity
  • 14. What do we mean by "Performance at Scale"? 14 Understand your limits! Saturation Region Linear Region Retrograde Region 100 kops/s Road to hell How Optimizely (Safely) Maximizes Database Concurrency
  • 15. What do we mean by "Performance at Scale"? 15 And when to stop worrying ;-) Interaction Latency: Square’s User-Centric Mobile Performance Metric
  • 16. What do we mean by "Performance at Scale"? 16 It must work. ■ Efficient hardware resource use ■ Resilient ■ Highly available ■ Developer friendly ■ Integrates with your existing ecosystem
  • 17. What should I know? 17
  • 18. What should I know? 18 One size doesn't fit all. ■ A previous success does not guarantee another: ○ Databases are different ○ Workloads are different ○ Patterns and requirements often differ
  • 19. What should I know? 19 Understand your workload
  • 20. What should I know? 20 … and your Access Patterns! ■ Are ACID Transactions a MUST? ■ What level of querying flexibility (ad-hoc) is needed? ■ What's the ratio of reads vs writes? ■ How much traffic fluctuation exists? ○ POLEMIC MODE ON: Do auto-scaling even exists? The problems with DynamoDB Auto Scaling and how it might be improved – HackerNoon
  • 21. What should I know? 21 Prepare for the worst. ■ … And your budget: ○ Single zone vs Multi zone vs Multi region ■ Simulate failure scenarios: ○ What if a node goes down? ○ … and a full zone? ○ … and different zones? ■ And meanwhile… Understanding and Mitigating Distributed Database Network Costs with ScyllaDB
  • 23. What should I do? 23
  • 24. What should I do? 24 Write-heavy workloads: ■ Primarily shine under LSM trees ○ Append-only data structure ○ Resulting files are sorted and immutable ○ Compactions take care of merging data over time ■ ScyllaDB, Cassandra, BigTable, Apache HBase
  • 25. What should I do? 25 Delete-heavy workloads: ■ Interestingly enough, deletes are writes! :-) ■ Even more so under LSM tree engines: ○ As resulting files are immutable, tombstones get written ○ Several tombstones can easily degrade the read path ○ Compaction optimizations exist (such as ICS & LCS) ○ ScyllaDB specific (exclusive?) optimization: empty replica pages ■ Interesting problems emerge when using your database as a queue ■ And also interesting solutions :-)
  • 26. What should I do? 26 Read-heavy workloads: ■ B+Tree engines (such as InnoDB) structure shines when reads have to go disk ■ Conversely, LSM Trees may require touching multiple files ■ Evaluate: ○ Cache hit rate vs Cache misses ○ Data set size and projected growth ■ Most relational DBs, DynamoDB, Couchbase, MongoDB
  • 27. What should I do? 27 Mixed workloads (or, when things start to get REALLY interesting): ■ From the perspective of the database: ○ 2 competing workloads ○ No secret sauce ○ While writes are expensive on B-trees, reads suffer on LSM trees ■ Keep an eye on disks performance: ○ Often reading cold data can easily introduce a bottleneck ○ Similarly, retrieving large datasets may cause the cache to become ineffective ○ Consider strategies such as BYPASS CACHE ■ Consider prioritizing each workload to meet your SLAs.
  • 28. Keep in touch! Felipe Mendes Solution Architect ScyllaDB felipemendes@scylladb.com LinkedIn