REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industrial track)

Sung Kim
Sung KimAssociate Prof.
REMI: Defect Prediction for
Efficient API Testing
ESEC/FSE 2015, Industrial track
September 3, 2015
Mijung Kim*, Jaechang Nam*, Jaehyuk Yeon+, Soonhwang Choi+, and Sunghun
Kim*
*Department of Computer Science and Engineering, HKUST
+Software R&D Center, Samsung Electronics CO., LTD
2
...
Motivation
• Reliability of APIs
• Cost-intensive software quality assurance
(QA) tasks at Samsung
– Creating test cases for APIs
– Testing APIs
3
Testing Phase at Samsung
Test
Cases
(TCs) Run
TCs
Passin
g Tests
Failing
Tests
Fix
APIs
4
APIs
• Test Cases
– 1 or 2 test cases for each API
• Uniformly distributed across all APIs
• Tedious task and not efficient
Modification and Enhancement
Goal
• Apply software defect prediction for the
efficient API testing.
5
Proposed Approach: REMI
Risk Evaluation Method for Interface testing
6
Test
Cases
(TCs) Run
TCs
Passin
g Tests
Failing
Tests
Fix
APIs
APIs
REMI
(API Rank)
Modification and Enhancement
7
Predict
Training
?
?
Model
API Project A
: Metric value
: Buggy-labeled instance
: Clean-labeled instance
?: Unlabeled instance
Software Defect Prediction
Related Work
Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07,
Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112
Lee@FSE`11,...
Approach
REMI: Risk Evaluation Method for Interface
testing
8
Collect
Metrics
Aggregate
Metrics
Label
Buggy/Clean
APIs
Build
Prediction
Model
Rank
APIs
API
Ranks
SW Repository
Defect
Histor
y
Funtion-level metrics
Code
Metrics
(28)
AltCountLineBlank, AltCountLineCode,
AltCountLineComment, CountInput,
CountLine, CountLineBlank,
CountLineCode, CountLineCodeDecl,
CountLineCodeExe, CountLineComment,
CountLineInactive,
CountLinePreprocessor, CountOutput,
CountPath, CountSemicolon, CountStmt,
CountStmtDecl, CountStmtEmpty,
CountStmtExe, Cyclomatic,
CyclomaticModified, CyclomaticStrict,
Essential, Knots, MaxEssentialKnots,
MaxNesting, MinEssentialKnots,
RatioCommentToCode
Process
Metrics
(12)
(Moser@ICSE`08)
numCommits, totalLocDeleted,
avgLocDeleted, maxLocDeleted,
totalLocAdded, avgLocAdded,
maxLocAdded,
totalLocChurn, avgLocChurn,
Rank APIs in Package A
Probabilit
y
1 api_a 98%
2 api_b 90%
3 api_c 79%
... ... ...
25 api_y 40%
26 api_z 30%
Test
Failure
s
Depth 2
(LoC=10)
Depth 1
(LoC=7)
Depth 0
(LoC=3)
my_api()
func_1()
func_3()
func_4()
func_2() func_5()
Research Questions
• Test Development Phase
– Can writing test cases based on REMI
ranking identify additional/distinct
defects?
• Test Execution Phase
– Can test execution with REMI identify
defects more quickly than that without
REMI?
9
Experimental Setup
• Random Forest
• Subject
– Tizen-wearable APIs
• Applied REMI for 36 functional packages with
about 1100 APIs
– Release Candidates (RC)
• RC2 to RC3
10
RCn-1 RCn
Build Rank
RESULT
11
Results for Test Development
Phase
Detect additional defects after applying REMI
12
0
1
2
RC2 before REMI RC2 after REMI
The number of additional defects by modifying and enhancing test cases
Results for Test Development
Phase
Detect additional defects after applying REMI
13
RC3 before REMI RC3 after REMI
The number of additional defects by modifying and enhancing test cases
Results for Test Execution Phase
Quickly detect defects with the same number of test cases
14
Version REMI
Bug Detection Ability
Test Run
Defects
Detected
Detection
Rate
RC2
without
REMI
873 6.5 0.74%
with REMI 873 18 2.06%
RC3
without
REMI
845 8.1 0.96%
with REMI 845 9 1.07%
Test cases from the selected 30% of APIs
Developer Feedbacks
• “The list of risky APIs provided before conducting
QA activities is helpful for testers to allocate their
testing effort efficiently, especially with tight time
constraints.”
• “In the process of applying REMI, overheads arise
during the tool configuration and executions
(approximately 1 to 1.5 hours).”
• “It is difficult to collect the defect information to
label buggy/clean APIs without noise.”
15
Conclusion
• REMI
– Defect Prediction is helpful in industry
• Test Case Development
– Could identify additional defects by developing
new test cases for risky APIs.
• Test Execution Phase
– Identify defects more quickly and efficiently when
with the tight time constraints.
16
1 of 16

More Related Content

Similar to REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industrial track)(20)

REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industrial track)

  • 1. REMI: Defect Prediction for Efficient API Testing ESEC/FSE 2015, Industrial track September 3, 2015 Mijung Kim*, Jaechang Nam*, Jaehyuk Yeon+, Soonhwang Choi+, and Sunghun Kim* *Department of Computer Science and Engineering, HKUST +Software R&D Center, Samsung Electronics CO., LTD
  • 3. Motivation • Reliability of APIs • Cost-intensive software quality assurance (QA) tasks at Samsung – Creating test cases for APIs – Testing APIs 3
  • 4. Testing Phase at Samsung Test Cases (TCs) Run TCs Passin g Tests Failing Tests Fix APIs 4 APIs • Test Cases – 1 or 2 test cases for each API • Uniformly distributed across all APIs • Tedious task and not efficient Modification and Enhancement
  • 5. Goal • Apply software defect prediction for the efficient API testing. 5
  • 6. Proposed Approach: REMI Risk Evaluation Method for Interface testing 6 Test Cases (TCs) Run TCs Passin g Tests Failing Tests Fix APIs APIs REMI (API Rank) Modification and Enhancement
  • 7. 7 Predict Training ? ? Model API Project A : Metric value : Buggy-labeled instance : Clean-labeled instance ?: Unlabeled instance Software Defect Prediction Related Work Munson@TSE`92, Basili@TSE`95, Menzies@TSE`07, Hassan@ICSE`09, Bird@FSE`11,D’ambros@EMSE112 Lee@FSE`11,...
  • 8. Approach REMI: Risk Evaluation Method for Interface testing 8 Collect Metrics Aggregate Metrics Label Buggy/Clean APIs Build Prediction Model Rank APIs API Ranks SW Repository Defect Histor y Funtion-level metrics Code Metrics (28) AltCountLineBlank, AltCountLineCode, AltCountLineComment, CountInput, CountLine, CountLineBlank, CountLineCode, CountLineCodeDecl, CountLineCodeExe, CountLineComment, CountLineInactive, CountLinePreprocessor, CountOutput, CountPath, CountSemicolon, CountStmt, CountStmtDecl, CountStmtEmpty, CountStmtExe, Cyclomatic, CyclomaticModified, CyclomaticStrict, Essential, Knots, MaxEssentialKnots, MaxNesting, MinEssentialKnots, RatioCommentToCode Process Metrics (12) (Moser@ICSE`08) numCommits, totalLocDeleted, avgLocDeleted, maxLocDeleted, totalLocAdded, avgLocAdded, maxLocAdded, totalLocChurn, avgLocChurn, Rank APIs in Package A Probabilit y 1 api_a 98% 2 api_b 90% 3 api_c 79% ... ... ... 25 api_y 40% 26 api_z 30% Test Failure s Depth 2 (LoC=10) Depth 1 (LoC=7) Depth 0 (LoC=3) my_api() func_1() func_3() func_4() func_2() func_5()
  • 9. Research Questions • Test Development Phase – Can writing test cases based on REMI ranking identify additional/distinct defects? • Test Execution Phase – Can test execution with REMI identify defects more quickly than that without REMI? 9
  • 10. Experimental Setup • Random Forest • Subject – Tizen-wearable APIs • Applied REMI for 36 functional packages with about 1100 APIs – Release Candidates (RC) • RC2 to RC3 10 RCn-1 RCn Build Rank
  • 12. Results for Test Development Phase Detect additional defects after applying REMI 12 0 1 2 RC2 before REMI RC2 after REMI The number of additional defects by modifying and enhancing test cases
  • 13. Results for Test Development Phase Detect additional defects after applying REMI 13 RC3 before REMI RC3 after REMI The number of additional defects by modifying and enhancing test cases
  • 14. Results for Test Execution Phase Quickly detect defects with the same number of test cases 14 Version REMI Bug Detection Ability Test Run Defects Detected Detection Rate RC2 without REMI 873 6.5 0.74% with REMI 873 18 2.06% RC3 without REMI 845 8.1 0.96% with REMI 845 9 1.07% Test cases from the selected 30% of APIs
  • 15. Developer Feedbacks • “The list of risky APIs provided before conducting QA activities is helpful for testers to allocate their testing effort efficiently, especially with tight time constraints.” • “In the process of applying REMI, overheads arise during the tool configuration and executions (approximately 1 to 1.5 hours).” • “It is difficult to collect the defect information to label buggy/clean APIs without noise.” 15
  • 16. Conclusion • REMI – Defect Prediction is helpful in industry • Test Case Development – Could identify additional defects by developing new test cases for risky APIs. • Test Execution Phase – Identify defects more quickly and efficiently when with the tight time constraints. 16

Editor's Notes

  1. Explain Tize
  2. Realibility
  3. EFL Public: gui framework of tizen The average number of TCs for each API: 2.35, MAX: 26 Specification change Fix TC errors Improve test coverage Select target APIs by developers’ experience
  4. Explain what the role of REMI.
  5. DP에 대한 설명은 짧게 하기 API concept 설명하기 새로운 slide로 예제로 API-level에 집중
  6. Aggregate function-level metrics to API-level metrics by using function call graphs for the source code of APIs and summate metric values for a specific depth of the call graphs. Bug history: fix commits or test results.
  7. 56 functional packages with about 3000 APIs but applied REMI for 36 packages with 1100 APIs in which at least one bug exists or no build error occurs while releasing a release candidate (RC) Ghotra et al. a single best classifier for all prediction models does not exist.
  8. RC2 before REMI: 70 test cases are modified and enhanced RC2 after REMI: 158 test cases are newly created
  9. RC3 before REMI: 47 test cases are modified and enhanced RC3 after REMI: 26 test cases are newly created
  10. TC Execution for all packages: 3 man-days Why 6.5. Average detected bugs from random selection for w/o remi It takes an hour to execute 400 test cases. (Developers at Samsung are advised to execute 100 test cases in 15 minutes