Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Icsm2008 jiang
1. Automated Identification of Load
Testing Problems
Zhen Ming (Jack) Jiang, Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL), Queen’s University, Canada
Gilbert Hamann, Parminder Flora
Enterprise Performance Engineering, Research In Motion (RIM), Canada
2. What Is a Load Test?
■ A load test
– Mimics multiple users performing tasks at the same
time (field simulation)
– Lasts for several hours or a few days
– For example, load test an online bookstore to see if
the site can handle 1,000 users
3. Load Testing Challenges
■ No documented behavior
■ Time pressure
– Lasts for several hours or longer
– Final step in the development cycle
■ Monitoring overhead
– Profiling or extra instrumentation is not
recommended
■ Large volume of data
4. Current Practice
■ Crash check
– Restart, crash, hung?
■ Performance check
– Memory, disk, CPU, network usage
– Is there a memory leak?
■ Basic error check
– Grep for keywords like “failure” or “error”, etc.
Not
Sufficient!
5. Problems with Current Practice
■ Labour intensive and time consuming
– Large volumes of generated data
■ Not all “error” or “fail” is important
– “Failure to locate item in the cache”
■ Not all errors contain the term “error” or
“fail”
– “Message buffer limit is reached”
6. Our Approach
■ Intuition: Load testing involves the
execution of the same operations over a
large number of times
■ Most large enterprise applications have logging
enabled for:
– Remove issue resolution
– Cope with legal acts like “Sarbanes-Oxley Act”
■ Our Approach: Automatically discover runtime
anomalies by mining the execution logs
7. Anomaly Detection for a Load Test
■ (E2, E3) always follow each other:
– (acquire_lock, release_lock)
– (open_inbox, close_inbox)
■ If we see (E2, E6) this might be a problem
E1 E2 E3 E4
E1 E2 E3 E4
E1 E2 E3 E4
E1 E2 E6 E4
12. Step 3. Dominant Behavior
Identification
■ Execute-After Relation, (E1, *)
– E1 and E2 belong to the same group, and
– E2 is the next event that directly follows E1
13. Step 4. Anomaly Detection
■ z-stats highlights the differences between the
dominant behavior and the deviated behavior
■ The higher z-stats, the higher the contrasts, and
the more likely it statistically holds
15. Load Testing Problems
■ Bugs in the application under test
■ Problems with the load environment
– Mis-configuration
– Hardware failures
– Software Interactions
■ Problems with the load generation
– Incorrect use of load generation tools
– Buggy load generators
18. App 1 Anomaly Example
■ Application Problems:
– 54 out of 33,000 (<0.2%) event pairs indicate
an application problem with items being
dropped from the queue. Error message did
not have the word "error" in it.
■ Environment Problems:
– 4 times out of 33,000 (≈0.01%) event pairs
indicate an environment problems
19. Discussions and Limitations
■ Is the dominant behavior really the correct
behavior?
– E.g. Hardware failure
■ Process logs for the whole load test all at
once
■ False positives
– Due to the nature of the load
– Thread switches
20. Conclusions
Challenges of Load Testing
■ No documented behavior
■ Time pressure
– Lasts for several hours or longer
– Last step in the development cycle
■ Monitoring overhead
– Profiling or extra instrumentation is not
recommended
■ Large volumes of data
Problems with Current Practice
■ Labour intensive and time consuming
– Large volumes of generated data
■ Not all “error” or “fail” is important
– “Failure to locate item in the cache”
■ Not all errors contain the term “error” or
“fail”
– “Message buffer limit has reached”
Case Studies