Benchmarking and Performance on AWS - AWS India Summit 2012

  • 971 views
Uploaded on

Benchmarking and Performance on AWS from the AWS Summit in India Corporate Track

Benchmarking and Performance on AWS from the AWS Summit in India Corporate Track

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
971
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Two versions of origin – surveyors mark or The term benchmarking was first used by cobblers to measure people's feet for shoes. They would place someone's foot on a "bench" and mark it out to make the pattern for the shoes. Benchmarking is used to measure performance using a specific indicator (cost per unit of measure, productivity per unit of measure, cycle time of x per unit of measure or defects per unit of measure) resulting in a metric of performance that is then compared to others.There should always be a goal or reason to benchmark – you measure in order to prove something works, determine if it can work,
  • Consumers are often faced with the challenge of choosing between multiple similar offerings when shopping for goods or services. There is rarely a single measure such as cost or size that makes selecting the best offering simple. For example, when shopping for a car, many people use gas mileage as one of the selection criteria to narrow the set of cars to consider for purchase. In the United States, the Environmental Protection Agency (EPA) dictates precisely how an automobile manufacture needs to test and report gas mileage. Defining a useful measurement to fairly compare competing products and/or services requires careful planning and can be quite complex to define and execute. Continuing with the EPA mileage example - the 2007 document detailing updates to gas mileage test and reporting methodology was 19 pages long and the technical support document detailing testing and reporting was 179 pages in length. Why so much detail? Being very prescriptive about how to measure and how to report fuel mileage helps ensure that comparisons from any two vehicles end up being “apples to apples” comparisons but entails excruciating levels of detail.
  • The Importance of Benchmarking (Decision making)The cost of fixing performance problems increases proportionally to the stage of development. The later in the software lifecycle you attempt to fix a problem, the more it will cost to fix it.
  • Benchmarks require running multiple experiments to get reliable results. With the cloud, you can run multiple experiments in parallel and significantly reduce the time it takes to get results.Deploying new configurations can be fully automated and done in minutes When you are done, you can save results to S3 and tear everything down….The beauty of the cloud is that you pay for only what you use. Running a benchmark to validate your use case is not only cost effective but also quick since you don’t have to wait months to procure, assemble, and configure test resources. Typically, it is possible to run benchmark tests that last for a few hours and cost a few dollars. See how Netflix was able to run a benchmark that involved 96 EC2 instances in each of the 3 availability zones (3.3 Million writes per second) that costs them a few hundred dollars and couple of hours. Moreover, unlike traditional datacenter or on-premises benchmarking, you don’t have to wait long for systems to be configured or have the need to ask for permission to execute these tests. You can run as many tests you like, as many number of times, any time you like. You have the flexibility to decide the scale of your tests and are not limited to small number of fixed resources.http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  • You can deploy and tear down configurations rapidly and you only pay for what you use. Generating load can be done with many small instances or a handful of very large instances.
  • You can rapidly grow and shrink the scale of benchmarks and only pay for what you use.Cloud is highly cost-effective because you can turn off and stop paying for it when you don’t need it or your users are not accessing. Build websites that sleep at night
  • AWS provides APIs so the entire benchmark lifecycle can be automated.
  • When you use your own application, you have experiments and results that will require the least amount of extrapolation to get reasonable answers.When you run standardized benchmarks, you have to figure out how the test design and configuration relates to your application, but you still control how the test is run and the analysis of test results.When you use published benchmark results, have to figure out how the test design, configuration and execution relate to your application. In some cases, published numbers have to confirm to strict reporting standards so analysis can be possible. A disciplined Process:Start with a goalUse a thoroughly defined scenarioRun a series of controlled experimentsTake careful notesCarefully control changesAlways measure with your goal in mindStop when you meet your goal …Look for bottlenecks when you don’t
  • Anecdote about a major web site failing and using benchmarking to figure out why and how to fix it.After taking over engineering for a major customer-facing web portal, the site starting failing under annual peak load.The team did not test performance for the previous 2 releases (no workloads defined). There was no dedicated performance test configuration and no spare hardware available for test. There were no test programs to generate load. The application did not have enough instrumentation to understand what was failing.After trial and error (and patching to add instrumentation), built up test ability and began testing. Would have been great to use AWS to spin up test cluster quickly to repro failures and test proposed fixes…
  • Github drop for YCSB: https://github.com/brianfrankcooper/YCSBWiki for YCSB: https://github.com/brianfrankcooper/YCSB/wikiTar for YCSB: https://github.com/downloads/brianfrankcooper/YCSB/ycsb-0.1.4.tar.gzBeforeDynamoDB launched, we wanted to make sure we had the scalability we promised.Built a DynamoDB plug-in for YCSB to test scale up to 100,000 requests per secondRan many experiments in parallel to get results quicklyFound a number of areas to improve in the AWS (client) toolkit, Logging level was too highSession cache improvementsSession Token Throttle conflict between YCSB framework thread connection model and optimal DynamoDB connection managementCustomer impact – multi-threaded clients will receive throttle messages well below provisioned DynamoDB throughput levels. DynamoDB is one of the first services to use STS, and this issue can happen for any service using STS, I.e. Any service that does not have a concept of provisioned throughput would also receive this throttling message. SDK has released a fix for this problem.Default SDK logging levelThe default logging level for DynamoDB was “INFO” and this level included output for every request and response.Customer impact – the default verbose logging level is a performance bottleneck for multi-threaded clients at scale. Before fixing, maximum throughput for a single jvm was 7K reads/second. After the fix, maximum throughput for a jvm was over 15K reads/second. Resolution – SDK made request logging “DEBUG” level for DynamoDBSDK http connection recyclingThe SDK contains code that periodically harvests unused http connections.Customer impact – since http connections include authentication for DynamoDB, new connections are expensive and the cost of finding and killing connections (while locking the connection pool) affects scalability. A prototype of the SDK wherein connections were not killed improved performance by 20 to 25% at scale. (Some tests demonstrated over 2.5X improvement in throughput with this change).….
  • Github drop for YCSB: https://github.com/brianfrankcooper/YCSBWiki for YCSB: https://github.com/brianfrankcooper/YCSB/wikiTar for YCSB: https://github.com/downloads/brianfrankcooper/YCSB/ycsb-0.1.4.tar.gzBeforeDynamoDB launched, we wanted to make sure we had the scalability we promised.Built a DynamoDB plug-in for YCSB to test scale up to 100,000 requests per secondRan many experiments in parallel to get results quicklyFound a number of areas to improve in the AWS (client) toolkit, Logging level was too highSession cache improvementsSession Token Throttle conflict between YCSB framework thread connection model and optimal DynamoDB connection managementCustomer impact – multi-threaded clients will receive throttle messages well below provisioned DynamoDB throughput levels. DynamoDB is one of the first services to use STS, and this issue can happen for any service using STS, I.e. Any service that does not have a concept of provisioned throughput would also receive this throttling message. SDK has released a fix for this problem.Default SDK logging levelThe default logging level for DynamoDB was “INFO” and this level included output for every request and response.Customer impact – the default verbose logging level is a performance bottleneck for multi-threaded clients at scale. Before fixing, maximum throughput for a single jvm was 7K reads/second. After the fix, maximum throughput for a jvm was over 15K reads/second. Resolution – SDK made request logging “DEBUG” level for DynamoDBSDK http connection recyclingThe SDK contains code that periodically harvests unused http connections.Customer impact – since http connections include authentication for DynamoDB, new connections are expensive and the cost of finding and killing connections (while locking the connection pool) affects scalability. A prototype of the SDK wherein connections were not killed improved performance by 20 to 25% at scale. (Some tests demonstrated over 2.5X improvement in throughput with this change).….

Transcript

  • 1. UnderstandingBenchmarking in the Cloud Robert Barnes
  • 2. What is benchmarking? 2
  • 3. A benchmark many people may know 3
  • 4. Why benchmark? How long will the current configuration be Will this platform provide adequate adequate? performance, now and in the future? For a specific workload, how does one platform compare to another? What configuration (infrastructure and application) will it take to meet current needs? What size instance will provide the best cost/performance for my application?How will the application running in mydatacenter perform in the cloud? Are the changes being made to a system going to have the intended impact on the system? 4
  • 5. Why can’t these questions be answered?• How many users does Drupal support?• How much memory does MySQL require?• What is the overhead of using Flash?• How many requests per second can Apache handle?• What instance type will it take to support 1000 unique users on AWS running Drupal? *without clarification 5
  • 6. Benchmarking is not easy on-premisesIt takes time to obtain and build testconfigurations 6
  • 7. Benchmarking is not easy…Buying the latest equipment each time getsexpensive 7
  • 8. Benchmarking is not easy…Generating large-scale load requires hugetemporary spikes in capacity 8
  • 9. Benchmarking is not easy…Building up and tearing down testconfigurations can be very labor intensive 9
  • 10. Benchmarking in AWS is fast…Benchmarking in AWS is fast with parallelexecution 10
  • 11. Benchmarking in AWS is affordable (pay as you go…) 11
  • 12. Benchmarking in AWS is scalable(elastic and supports multi-node tests) 12
  • 13. Development Benchmarking in AWS can be fully automated 13
  • 14. AWS is a great place to benchmark 14
  • 15. The Benchmark LifecycleStart with a Goal Measure Define your against Report workload goal Test Test Test Design Test Analysis Configuration Execution Generate Run a series of Load controlled experiments Carefully control changes 15
  • 16. 3 ways to use benchmarks1. Design and run a benchmark from your existing application and workloads2. Run a standard benchmark3. Use published benchmark results 16
  • 17. 1. Benchmark your application• Choose which parts of the application to test and in what combinations (workloads)• Determine how to generate load and how much of it• Decide how to measure and what metrics• Design how reports get generated and what report contents 17
  • 18. 1. Benchmark your application: exampleEmergency benchmarking 18
  • 19. 2. Run a standard benchmark• Lots of work already done: Workloads defined Load generation defined Measurement is defined Reports are defined Some tuning needs to be done to build and run Run controlled tests and automate for repetition 19
  • 20. 2. Run a standard benchmarkIs the test relevant to your requirements?How does the test map to your application? 20
  • 21. 2. Standard benchmark: exampleTesting DynamoDB – Before shipping DynamoDB, benchmarks were run to verify latency and scale – Short window for testing, selected Yahoo Cloud Serving Benchmark to run scaling tests • Multiple parallel tests set up to find optimal test configuration • Multiple DynamoDB databases provisioned and tests run in parallel • DynamoDB server scaling and latency validated • A number of client side issues found and fixed 21
  • 22. 2. Standard benchmark: exampleTesting DynamoDB 22
  • 23. 3. Use published benchmark resultsSimilar to running standard benchmarks butmore … Picture source: http://www.nzei.org.nz/ 23
  • 24. 3. Reading and interpreting a benchmark report1. What is being measured?2. Why is it being measured?3. How is it being measured?4. How closely does this benchmark resemble my results?5. How accurate are the reports and citations?6. Are the results repeatable? 24
  • 25. Not all benchmarks are fair… 25
  • 26. Cloud Tip: The 4 Rs– Relevant – the best test is based on your application– Recent – Out of date results are rarely useful– Repeatable – Is there enough information to repeat the test (cold fusion anyone ?)– Reliable – Do you trust the tools, the publisher, and the results? 26
  • 27. Example: dissecting a benchmark report 27
  • 28. Example: dissecting a benchmark report• Mistakes in test design Instance Cores X.Instance1 1 – CPU tests with vastly different X.Instance2 2 X.Instance3 2 instance types X.Instance4 4 X.Instance5 2 – The “5X” claim comes from X.Instance6 8 X.Instance7 4 comparing Y.Instance5 against X.Instance8 8 X.Instance1 Y.Instance1 4 Y.Instance2 4 Y.Instance3 4 Y.Instance4 4 Y.Instance5 4 28
  • 29. Example: dissecting a benchmark report• Mistakes in test configuration – Tests for vendor Y were run on Ubuntu 10.4 – Tests for vendor X were run on CentOS 5.4 29
  • 30. Example: dissecting a benchmark report• Mistakes in test analysis – Report spreadsheet contained several critical errors 30
  • 31. Example: dissecting a benchmark report• Mistakes in test analysis – The spreadsheet containing the data used to produce reports contained several critical errors Corrected: 31
  • 32. Example: dissecting a benchmark report• What the data should have looked like: – CPU performance (higher is better): – X.Instance7 is 1.9 times better than Y.Instance5 32
  • 33. Example: dissecting a benchmark report• What the report should have looked like: – Cost/performance (lower is better) – X.Instance7 is 2.13 times better than Y.Instance5 33
  • 34. Interesting ReadsQuestions to Ask About Benchmark Studies 1. What is the claim? 2. What is the claimed measurement? 3. What is the actual measurement? 4. Is it an apples-to-apples comparison? 5. Is the playing field level? 6. Was the data reported accurately? 7. Does it matter to you? Source: http://blog.cloudharmony.com/2011/11/many-are-skeptical-of-claims-that.html 34
  • 35. Not all benchmark reports are bad…Benchmarking High Performance I/O with SSD for Cassandra on AWShttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.htmlBenchmarking Cassandra Scalability on AWS - Over a million writes per secondhttp://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 35
  • 36. Benchmarking in the Cloud - Summary1. Benchmarking on premises is hard2. AWS is a great place to benchmark3. The best benchmark is your application4. Run standard benchmarks with controlled and repeatable tests5. Be a careful consumer published of benchmark reports Of course, everything on the internet is true…. 36
  • 37. Thank you! Robert Barnesrabarnes@amazon.com 37