4. ● Consulting ● Percona Server
● Support ● Percona XtraBackup
● Remote DBA ● Percona XtraDB
Cluster
● Engineering
● Percona Toolkit
● Conferences &
Training ● Many More
5. Today's Agenda
● Benchmarks
● Aggregation and Distributions
● Performance, Capacity & Utilization
● Rules of Thumb
● Queueing Theory and Scalability
14. Clear Benchmark Goals
● Validating hardware configuration
● Comparing two systems
● Checking for regressions
● Capacity planning
● Reproducing bad behavior to solve it
● Stress-testing to find bottlenecks
15. Hardware and Software
● Specs for CPU, disk, memory, network
● Software versions (OS, SUT, benchmark)
● Filesystem, RAID controller
● Disk queue scheduler
16. Presenting Results
● Ideally, make raw results available
● Include metrics from OS (CPU, RAM, IO,
network)
● Generate some plots to summarize
● This is where the rubber meets the road!
17. Better Aggregate Measures
● Average
● Percentiles
● 95th
● 99th
● Maximum
● Observation Duration
● Question: how bad can 95th percentile be?
23. Performance
● What is Performance?
● Two Metrics
● Response Time (time per task)
● Throughput (tasks per time)
● They're not reciprocals
● More on this later
25. Performance
● I often focus on response time
● It represents user experience
● Throughput indicates capacity rather than
performance
● For benchmarking, throughput is primary
26. Utilization
● The portion of time during which the
resource is busy
● i.e. there is at least one thing in progress
27. Utilization is Confusing
● Be very careful with tools that report
utilization
● From the Linux iostat man page:
● “%util: Percentage of CPU time during which
I/O requests were issued to the device
(bandwidth utilization for the device). Device
saturation occurs when this value is close to
100%.”
● Can you parse that? Is it true?
30. Capacity – My Definition
Capacity is the maximum throughput
... at achievable concurrency
... with acceptable performance
... as defined by response time
... meeting specified constraints
... over specified observation intervals.
31. Capacity Example
● What is capacity of the system at a
concurrency of 32 with 10-second 95th-
percentile response time not to exceed
2ms over a 60-minute duration?
● To determine this, we need goal-seeking
benchmark software
● Most benchmark software can't do this
32. Benchmarks, etc Recap
● Most benchmarks reveal very little
● Benchmark reports reveal even less
● It's good to go beyond the surface
33. Amdahl's Law
● “The speedup of a program using multiple
processors in parallel computing is limited
by the time needed for the sequential
fraction of the program.” - Wikipedia
● It's basically a law of diminishing returns.
34. Should I Defragment My Disk?
● Method 1: Google “defragment”
● Method 2: Try it and see
● Method 3: Measure if the disk is a
bottleneck
38. Little's Law
● N = XR
● That is,
● Concurrency = Throughput * Response Time
● This holds regardless of queueing, arrival
rate distribution, response time
distribution, etc.
39. Little's Law Example
● If disk IOs average 4ms...
● And there are 280 IOs per second...
● Then the disk's average concurrency is:
● N = 280 * .004
● N = 1.12
● Do you believe this?
● When might it not be true?
40. Little's Law Example #2
● If disk utilization is 98%
● And there are 280 IOs per second
● What do we know?
41. Utilization Law
● U = SX
● Also independent of distributions, etc...
● That is,
● Utilization = Service Time * Throughput
● Utilization = 98% and Throughput = 280
● S = U/X
● Service Time = .98 / 280 = .0035
42. Queueing Theory
● How can we predict the amount of
queueing in a system?
● How can we predict its response times?
● How can we predict capacity?
43. Erlang Queueing
● Erlang's formulas model the probability of
queueing for a given arrival rate, service
time, and number of servers.
● A “server” is anything capable of serving
a request.
● CPUs
● Disks
44. CPU -vs- Disk Queueing
● Scenario: 4-CPU, 4-disk (RAID0) server
● Thought experiment:
● How do processes queue for CPU?
● How do I/O requests queue on disks?
45. Notation
● Typically see something like M/M/1
● Each letter is a placeholder in A/S/n
● A = Arrival distribution
● S = Service-time distribution
● n = Number of servers
● A and S can be one of:
● Markov
● Deterministic
● General
49. Erlang C Function
● M/M/n queueing is modeled by Erlang C
● See http://en.wikipedia.org/wiki/Erlang_(unit)
50. What's Wrong With Erlang C?
● You must validate your arrival times.
● You must validate your service times.
● The equation is hard to work with.
● In practice, it's hard to use Erlang C.
51. Scalability
● Queueing causes non-linear scaling.
● But first, let's talk about linearity.
58. Scalability Limitations
● Locks
● Synchronization points
● Shared resources
● Duplicated data to be kept in sync
● Weakest-link problems
59. RAID10 On EBS
● Which is faster?
● RAID 10 over 10 EBS volumes
● RAID 10 over 20 EBS volumes
● Hint: http://goo.gl/Xm92Y
● Also, http://goo.gl/fAEIL
60. Debunking “Linear”
● Ask to see the actual numbers.
● They shouldn't be rounded off suspiciously.
● They must be truly linear.
● They must intersect the point (0, 0).