Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

3,071 views

Published on

In this session, we explain how to measure the key performance-impacting metrics in a cloud-based application and best practices for a reliable benchmarking process. Measuring the performance of applications correctly can be challenging and there are many tools available to measure and track performance. This session will provide you with specific examples of good and bad tests. We make it clear how to get reliable measurements of and how to map benchmark results to your application. We also cover the importance of selecting tests wisely, repeating tests, and measuring variability. In addition a customer will provide real-life examples of how they developed their application testing stack, utilize it for repeatable testing and identify bottlenecks.

Published in: Technology
  • Be the first to comment

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

  1. 1. • The best benchmark • Absolute vs. relative measures • Fixed time or fixed work • What’s different? • Use a good AMI 0.00 5.00 10.0015.0020.0025.0030.00 Ubuntu 12.4 ami-… AWS CentOS 5.4 ami-… CentOS 5.4 ami-… CentOS 5.4 ami-… CentOS 5.4 ami-… Average CPU result 0% 10% 20% 30% 40% 50% 60% Coefficient of Variance
  2. 2. • Application runs on premises • Primary requirement is integer CPU performance • Application is complex to set up, no benchmark tests exist, limited time • What instance would work best? 1. Choose a synthetic benchmark 2. Baseline: Build, configure, tune, and run it on premises 3. Run the same test (or tests) on a set of instance types 4. Use results from the instance tests to choose the best match
  3. 3. Integer AES Twofish SHA1 SHA2 BZip2 compress BZip2 decompress JPEG compress JPEG decompress PNG compress PNG decompress Sobel LUA Dijkstra Floating Point Black-Scholes Mandelbrot Sharpen image Blur image SGEMM DGEMM SFFT DFFT N-Body Ray trace Memory STREAM copy STREAM scale STREAM add STREAM triad
  4. 4. ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`" TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type` ./geekbench_x86_64 --no-upload >$GBTXT
  5. 5. Geekbench 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min) m3.xlarge 0.93 1.04% 2.04 2.31% 2.06 m3.2xlarge 0.93 1.40% 3.80 1.46% 2.08 m2.xlarge 0.80 2.84% 1.54 4.06% 1.99 m2.2xlarge 0.80 1.34% 2.82 1.21% 2.04 m2.4xlarge 0.76 2.28% 5.11 1.71% 2.01 c3.large 1.13 0.93% 1.32 0.71% 1.76 c3.xlarge 1.13 0.39% 2.51 1.81% 1.74 c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70 cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21
  6. 6. geekbench 1CPU ratio C.O.V. m3.xlarge instance-1 0.93 0.31% instance-2 0.97 0.23% instance-3 0.94 0.17% instance-4 0.94 0.10% instance-5 0.94 0.32% instance-6 0.94 0.10% instance-7 0.93 0.25% instance-8 0.93 0.38% instance-9 0.94 0.11% instance-10 0.94 0.09%
  7. 7. gb-integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min) c3.large 1.12 0.50% 1.37 0.43% NA c3.xlarge 1.13 0.38% 2.72 0.41% NA c3.2xlarge 1.12 0.38% 5.35 0.51% NA cc2.8xlarge 1.00 0.20% 17.88 3.31% NA geekbench c3.large 1.13 0.93% 1.32 0.71% 1.76 c3.xlarge 1.13 0.39% 2.51 1.81% 1.74 c3.2xlarge 1.13 0.19% 4.88 0.25% 1.70 cc2.8xlarge 1.00 0.71% 15.46 1.93% 2.21
  8. 8. 11
  9. 9. ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`" TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`" ./Run –c 1 –c $COPIES >$FN
  10. 10. UnixBench 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min) m3.xlarge 1.38 1.90% 2.49 1.36% 28.25 m3.2xlarge 1.42 1.85% 4.21 1.99% 28.29 m2.xlarge 0.40 5.82% 0.76 1.28% 28.30 m2.2xlarge 0.42 1.71% 1.23 1.75% 28.32 m2.4xlarge 0.48 3.31% 2.02 1.71% 28.34 c3.large 1.10 1.33% 1.91 1.54% 28.17 c3.xlarge 1.06 1.48% 2.85 1.26% 28.21 c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96 cc2.8xlarge 1.00 2.97% 6.44 2.65% 30.20
  11. 11. UB-Integer 1CPU ratio C.O.V. NCPU ratio C.O.V. RT (min) c3.large 1.05 0.24% 1.10 0.30% 0.17 c3.xlarge 1.05 0.27% 2.20 0.28% 0.17 c3.2xlarge 1.05 0.07% 4.34 0.23% 0.17 cc2.8xlarg e 1.00 0.10% 15.54 0.95% 0.17 UnixBench c3.large 1.10 1.33% 1.91 1.54% 28.17 c3.xlarge 1.06 1.48% 2.85 1.26% 28.21 c3.2xlarge 1.10 0.54% 4.50 1.02% 28.96 cc2.8xlarg e 1.00 2.97% 6.44 2.65% 30.20
  12. 12. www.spec.org
  13. 13. Benchmark Category 400.perlbench C Programming language 401.bzip2 C Compression 403.gcc C C compiler 429.mcf C Combinatorial optimization 445.gobmk C Artificial intelligence 456.hmmer C Search gene sequence 458.sjeng C Artificial intelligence 462.libquantum C Physics / quantum computing 464.h264ref C Video compression 471.omnetpp C++ Discrete event simulation 473.astar C++ Path-finding algorithms 483.xalancbmk C++ Xml processing
  14. 14. ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`” TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`” runspec –noreportable –tune=base –size=ref –rate=$COPIES –iterations=1 / 400 403 445 456 458 462 464 471 473 483
  15. 15. Est. SPECint 1CPU ratio C.O.V. RT (min) NCPU ratio C.O.V. RT (min) m3.xlarge 1.01 1.06% 54.39 2.24 1.15% 104.18 m3.2xlarge 1.01 1.67% 54.49 4.25 1.63% 109.22 m2.xlarge 0.76 1.97% 70.83 1.39 2.45% 85.37 m2.2xlarge 0.79 0.94% 68.85 2.76 1.24% 85.42 m2.4xlarge 0.78 0.16% 68.73 5.21 1.26% 89.91 c3.large 1.11 1.95% 50.00 1.25 1.47% 94.22 c3.xlarge 1.10 1.96% 50.29 2.39 1.28% 97.66 c3.2xlarge 1.08 0.87% 50.87 4.67 0.25% 100.22 cc2.8xlarge 1.00 0.29% 54.92 14.92 0.52% 125.74
  16. 16. ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`” TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`” sysbench –num-threads=$TDS --max-requests=30000 --test=cpu / --cpu-max-prime=100000 run > $FN
  17. 17. sysbench Default C.O.V. RT (min) m3.xlarge 3.21 1.44% 0.06 m3.2xlarge 6.41 1.38% 0.03 m2.xlarge 1.59 0.75% 0.11 m2.2xlarge 3.19 0.64% 0.06 m2.4xlarge 8.83 0.62% 0.02 c3.large 1.78 0.26% 0.10 c3.xlarge 3.55 0.53% 0.05 c3.2xlarge 6.55 8.45% 0.03 cc2.8xlarge 25.34 2.30% 0.01 tuned ratio C.O.V. RT (min) 1.69 1.29% 3.86 3.38 1.41% 1.93 0.80 0.23% 8.16 1.60 0.76% 4.07 4.71 0.20% 1.38 0.91 0.09% 7.13 1.83 0.02% 3.57 3.54 3.31% 1.85 13.69 1.10% 0.48
  18. 18. GB GB Int UB UB Int Est. SPECInt sysbench default sysbench tuned m3.xlarge 2.04 2.01 2.49 1.88 2.24 3.21 1.69 m3.2xlarge 3.80 3.96 4.21 3.77 4.25 6.41 3.38 m2.xlarge 1.54 1.52 0.76 1.59 1.38 1.59 0.80 m2.2xlarge 2.82 3.02 1.23 3.19 2.76 3.19 1.60 m2.4xlarge 5.11 5.54 2.02 6.48 5.21 8.83 4.71 c3.large 1.32 1.37 1.91 1.10 1.25 1.78 0.91 c3.xlarge 2.51 2.72 2.85 2.20 2.39 3.55 1.83 c3.2xlarge 4.88 5.35 4.50 4.34 4.67 6.55 3.54 cc2.8xlarge 15.46 17.88 6.44 15.5 4 14.92 25.34 13.69
  19. 19. • Application runs on premises • Primary requirement: memory throughput of 20K MB/sec • What instance would work best? 1. Choose a synthetic benchmark 2. Baseline: Build, configure, tune, and run it on premises 3. Run the same test (or tests) on a set of instance types 4. Use results from the instance tests to choose the best match
  20. 20. www.cs.virginia.edu/stream/top20/Bandwidth.html https://github.com/gregs1104/stream-scaling name kernel bytes iter FLOPS iter COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2 * McCalpin, John D.: "STREAM: Sustainable Memory Bandwidth in High Performance Computers",
  21. 21. ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`” TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`” ./stream | egrep "Number of Threads requested|Function|Triad|Failed|Expected|Observed" > $FN ./sysbench --num-threads=$TDS --test=memory run >$FN
  22. 22. Stream- Triad Geekbench Memory-Triad sysbench (default) m3.xlarge 23640.56 15375.64 302.95 m3.2xlarge 26046.17 14999.27 603.40 m2.xlarge 18766.58 17365.76 528.16 m2.2xlarge 22421.91 17600.00 1019.08 m2.4xlarge 19634.50 14405.82 1576.30 c3.large 11434.83 9967.96 2116.84 c3.xlarge 21141.30 13972.65 2643.33 c3.2xlarge 30235.78 20657.49 2944.91 cc2.8xlarge 55200.86 37067.32 1195.90 sysbench memory defaults --memory-block-size [1K] --memory-total-size [100G] --memory-scope {global,local} [global] --memory-hugetlb [off] --memory-oper {read, write, none} [write] --memory-access-mode {seq,rnd} [seq]
  23. 23. • I/O metrics – IOPs – Throughput – Latency • Test parameters: – Read % – Write % – Sequential – Random – Queue depth • Storage configuration – Volume(s) – RAID – LVM
  24. 24. 0 200 400 600 800 1000 1200 Seq. Read Seq. Write Mixed Seq Read Mixed Seq Write Rand Read Rand Write Mixed Rand Read Mixed Rand Write Latency(usec) PIOPs 2K Queue Depth 1D PIOPS 2K 1D PIOPS 2K QD2 2D PIOPS 2K 2D PIOPS 2K QD2
  25. 25. • disk copy • cp file1 /disk1/file1 • dd • dd if=/dev/zero of=/data1/testile1 bs=1048 count=1024000 • fio – flexible io tester • fio simple.cfg
  26. 26. Seconds MB/sec cp f1 f2 17.248 59.37 rm –rf f2; cp f1 f2 .853 1200.47 cp f1 f3 .880 1164.96 dd if=/dev/zero bs=1048 count=1024000 of=d1 .722 1419.01 dd if=/dev/urandom bs=1048 count=1024000 of=d2 79.710 12.84 fio simple.cfg NA 61.55
  27. 27. Random 1M I/O PIOPs 16disk MBps read 1006.73 write 904.03 r70w30 1005.91
  28. 28. If benchmarking your application is not practical, synthetic benchmarks can be used if you are careful. • Choose the best benchmark that represents your application • Analysis – what does “best” mean? • Run enough tests to quantify variability • Baseline – what is a “good result” ? • Samples – keep all of your results – more is better!
  29. 29. tech.just-eat.com @justeat_tech
  30. 30. https://loadtestingtool.com
  31. 31. https://github.com/etsy/statsd https://graphite.readthedocs.org
  32. 32. Please give us your feedback on this session. Complete session evaluations and earn re:Invent swag. http://bit.ly/awsevals

×