Adaptive Testing, Oracle Generation, and Test Case Ranking ...
Adaptive Testing, Oracle Generation, and Test Case Ranking for Web Services Wei-Tek Tsai Software Research Laboratory Computer Science & Engineering Department Arizona State University [email_address]
Table of Content <ul><li>Background </li></ul><ul><li>Existed Dilemmas for SOA </li></ul><ul><li>Introduction to WebStrar </li></ul><ul><li>Difference between Blood and WS Group Testing </li></ul><ul><li>Testing Process </li></ul><ul><li>BBS Case Study </li></ul><ul><li>Impact of Training Sizes and Target Sizes </li></ul><ul><li>Conclusions and Future Work </li></ul>Impacts of Training on Test Case Ranking
Background <ul><li>Software development is shifting away from the product-oriented paradigm to the service-oriented paradigm. </li></ul><ul><li>Service-Oriented Architecture (SOA) and its implementation Web Services (WS) received significant attention as major computer companies are all adopting this new approach to develop software and systems. </li></ul><ul><li>However, trustworthiness becomes a serious problem and appropriate tradeoffs have to be paid during the WS testing phase </li></ul>
Verification of Web Services <ul><li>Collaborative Testing : Cooperation and collaboration among different testing activities and stakeholders including service provider, service consumer, and service brokers. </li></ul><ul><li>Specification-Based Testing : SOA proposes a fully specification-based process. WS define a XML-based protocol stack to facilitate service inter-communication and inter-operation. Specifications, such as WSDL, OWL-S, and WSFL describe the service features. Hence, test cases need to be generated based on these specifications. </li></ul>
Existed Dilemmas for SOA (continued) <ul><li>Run-time Testing: Most WS activities such as service publishing, discovering, matching, composition, binding, execution, monitoring are done at runtime. Thus </li></ul><ul><ul><li>Verification and testing must to be done in cluding test case generation, test execution, test evaluation, model checking must be done at run-time. </li></ul></ul><ul><li>Different implementations of the same specification: For the same specification of a service requirement, many alternative implementations may be available online. Effective algorithms are needed to rank and select the best WS. </li></ul>
Introduction to WebStrar <ul><li>WebStrar: Infrastructure for Web S ervices T esting, R eliability A ssessment, and R anking. It is is an infrastructure that facilitates the development of Web services, trustworthy Web services, and their applications. It provides </li></ul><ul><ul><li>the public (service providers, brokers, requestors , researchers, and regulators) on-line access to the tools and databases that enable describing (specifying), finding, scripting (composing complex services from existing services), testing, verification, validation, experimentation, and reliability evaluation of Web services. </li></ul></ul><ul><ul><li>WebStrar has WS group testing to rank services belong to the same specifications. </li></ul></ul>
Service providers UDDI service broker Clients Current Web Service Model Registry Trustworthy Service Broker check-in interface check-out interface service binder Service providers Test Master Database of test scripts acceptance interface Clients Trustworthy Web Service Model based on Testing SOAP call Results Found Publishing Find Found SOAP call Publishing request Approval Binding Data & Results inquiry testing accept- ance test
WebStrar Infrastructure WS ranking Test case ranking Oracle updates Trustworthy WS repository Reliability database WS directory Service Requestors / Clients Service Providers: submit WS + test cases + test case oracle Access Data - Reliability - Ranks Reliability Models Model checking WSDL OWL-S DAML-S Test case generators Test case database WS test master Service composition Composite WS Test case validation <ul><li>Dynamically Replace </li></ul><ul><li>Testing method </li></ul><ul><li>Reliability model </li></ul>Researchers Developers
Difference between blood and WS group testing The voting mechanism may be unreliable, and the number of faulty WS may be greater than the number correct WS to mislead the voter. Tests can be reliable or unreliable. Most BGT assumes tests are reliable. Reliability of tests Reliability of WS under test and testing process Reliability of testing process Reliability evaluation Need many tests for each group of WS to verify a variety of aspects. One test for each mix. Test coverage Oracle comparison and/or majority voting Contamination analysis. Verification Distributed and remote testing by agents and voters. Centralized testing Testing location WS unit, integration, and interoperability testing using adaptive, progressive, and concurrent testing. Bio/chemical tests. Testing methods Interoperability is constrained by WSDL, DAML-S, OWL-S, and composition semantics such as ontology. Arbitrary and physical mix. Sample mix Minimize the number of tests and voting needed. Minimize the number of tests needed. Optimization objectives Rank WS in a large pool of WS with the same specification; Rank the fault detection capacity of test scripts;Determine the oracle of each test script; and Fault identification. Find bad samples from a large pool of blood samples. Testing goals Web Services Group Testing (WSGT) Blood Group Testing (BGT) Compared features
Testing Process <ul><li>Test a large number of WS at both the unit and integration levels. </li></ul><ul><li>At each level, the testing process has two phases: </li></ul><ul><ul><li>Training Phase and </li></ul></ul><ul><ul><li>Volume Testing Phase . </li></ul></ul>
Phase 1: Training Phase <ul><li>Select a subset of WS randomly from the set of all WS to be tested. The size of the subset, which is named as “Training Size”, can be experimentally decided. </li></ul><ul><li>Apply each test case in the given set of test cases to test all the WS in the selected subset. </li></ul><ul><li>Voting: For each test input, the outputs from the WS under test are voted by a stochastic voting mechanism based on majority and deviation voting principles. </li></ul>
Phase 1: Training Phase (Cont’d) <ul><li>Oracle establishment: If a clear majority output is found, the output is used to form the oracle of the test case that generates the output. A confident level is defined based on the extent of the majority. The confident level will be dynamically adjusted in the phase 2 as well. </li></ul><ul><li>Test case ranking: Test cases will be ranked according to their fault detection capacity, which is proportional to the number failures the test cases detect. In the phase 2, the higher ranked test cases will be applied first to eliminate the WS that failed to pass the test. </li></ul>
Phase 1: Training Phase (Cont’d) <ul><li>WS ranking: The stochastic voting mechanism will not only find a majority output, but also rank the WS under group testing according to their average deviation to the majority output. </li></ul><ul><li>By the end of training phase testing, we have </li></ul><ul><ul><li>Tested and ranked the selected WS; </li></ul></ul><ul><ul><li>Ranked the potency of test cases; </li></ul></ul><ul><ul><li>Establish the oracle for test cases and their confidence levels. </li></ul></ul>
Phase 1: Training Phase (Cont’d) test script test script test script test script test script test script test script test script test script rank rank rank rank rank rank rank rank rank Generate test script ranking Find oracle and confident level Detect failure and compute WS reliability conf conf conf conf conf conf conf conf conf oracle oracle oracle oracle oracle oracle oracle oracle oracle WS WS WS WS WS WS WS WS WS WS voting . . .
Phase 2: Volume Testing Phase <ul><li>This phase continues testing the remaining WS and any newly arrived WS, based on the profiles and history (test case effectiveness, oracle, and WS ranking) obtained in the training phase. </li></ul><ul><li>By the end of Phase 2: </li></ul><ul><ul><li>all the WS available are tested; </li></ul></ul><ul><ul><li>A short ranked list of WS; </li></ul></ul><ul><ul><li>Test cases are updated and ranked; and </li></ul></ul><ul><ul><li>Oracles and their confidence levels are updated. </li></ul></ul>
Best Buy Stock WS Specification If the advancing volume or declining volume of some stocks increases >= 100% in the past 20 minutes compared to the same period of yesterday, it will send messages to alert the stock owners. If the prices of some stocks decreases >= 10% within the past 20 minutes, the server WS will send messages to the stock owners, reminding them to buy the stocks whose prices decrease >= 10% or sell them to stop further losses. The service automatically checks stock prices every 20 minutes. If the prices of some stocks increase >= 5% within the past 20 minutes, it will send messages to the all stock owners, reminding them to sell the stocks whose prices increase >= 5%, or buy the stocks to sell at a higher price. (2) 20 minutes have past since last stock price checking Client can query any stock’s price. If queried stock name is not empty and requested stock information is available, the server WS sends the requested stock price to the requesting client. (1) A client queries a stock’s price Specification Event
Impacts of Target Size on Testing Cost <ul><li>The smaller the target size, the lower the cost. This is so because more WS can be eliminated sooner. </li></ul><ul><li>The differences between the curves 1 to 12 are small, while a large gap exits between curves 12 and 13. The reason is that there are 12 fault-free WS under test. The number of failures detected from them is zero. If these fault-free WS are in the current target set, any WS will be eliminated if a single failure is detected. </li></ul><ul><li>When the target size moves from 12 to 13 or higher, the testing cost increases sharply, because the algorithm must find a better WS among a set of imperfect WS. </li></ul>
Impacts of Training Size on Testing Cost <ul><li>The smaller the training size, the lower the cost. </li></ul><ul><li>When the training size is less than or equal to the target size, increasing the training size does not increase the cost (the initial part of the curves is flat). When the training size exceeds the target size, the cost increases as the training size increases. </li></ul><ul><li>When the training size equals the total number of WS under test, it becomes exhaustive testing and no test runs can be saved. </li></ul>
Oracle Establishment and Confidence <ul><li>Note that the oracle is established by a majority voting. </li></ul><ul><li>If the training size is small, the confidence decreases, and it is even possible that an incorrect answer can get the majority vote. </li></ul><ul><li>Also incorrect WS does not always produce an incorrect answer. It often produces an incorrect answer some of times. </li></ul>
The Impacts of Training Size on Oracle 0.0% 1.9% 98.1% 5 1.3% 0.6% 98.1% 4 0.0% 1.9% 98.1% 3 0.0% 16.3% 83.8% 2 23.1% 0.0% 76.9% 1 Prob. of incorrect oracle Prob. of no oracle Prob. of correct oracle Training Size
Impacts of Training on Test Case Ranking 9.0% 15.2% 75.8% 5 12.2% 16.3% 71.5% 4 13.4% 21.5% 65.1% 3 5.0% 59.4% 35.6% 2 40.0% 0.0% 60.0% 1 Prob. of incorrect ranking Prob. of no decision Prob. of correct ranking Training Size
Conclusions <ul><li>This paper proposed an efficient process to test a large number of web services designed based on the same specification. </li></ul><ul><li>The experiment results reveal that the smaller the training size, the lower the cost. </li></ul><ul><li>However, a small training size can lead to incorrect oracle, leading to incorrect WS ranking. </li></ul><ul><li>A small training size can also lead to incorrect test case ranking, resulting a higher test cost in phase two. </li></ul><ul><li>Therefore, it is critical to select a reasonable sized training size in WS group testing. </li></ul>
Future Work <ul><li>Need to address the impact of the age of the test cases. Need to have an adaptive window to address this. </li></ul><ul><li>Also, we need a stochastic algorithm to perform the majority voting automatically for complex outputs. </li></ul>
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.