2024: Domino Containers - The Next Step. News from the Domino Container commu...
An overview of automated test suites and defect density in Android
1. An Overview of Automated Test Suites
and Defect Density in Android
Team Leader of the Software Quality
Engineering Research Group (SoftQual)
University of Calgary, Canada
(on leave)
www.ucalgary.ca/~vgarousi
vgarousi@ucalgary.ca
Visiting Associate Professor
Graduate School of Informatics
Middle East Technical University
Ankara, Türkiye
www.metu.edu.tr/~vahid
vahid@metu.edu.tr
Vahid Garousi, PhD, PEng
Android Developer Days
Ankara, Türkiye
June 15, 2013
Türkçe, English, Azerbaycan Türkçesi
2. • An Overview of the Android…
• Software architecture
• Code-base and size/complexity metrics
• and its Automated Test Suites
• Design of the Case Study
• Results of the Case Study
• Q/A
Outline
*
5. Android Code-base and Size/Complexity
metrics
*
Total LOC 9,344,603
Number of files
54,164
Percent of
Total
LOC by language C/C++ 5,925,096 63.41%
Java 2,090,904 22.38%
SQL 124,094 1.33%
XML 632,456 6.77%
Web files 572,053 6.12%
Total 9,344,603 100.00%
Average method LOC 8.78
Average method McCabe complexity 2.54
Maximum method LOC 2,857
Maximum method McCabe complexity 284
McCabe complexity metric: the number of control flow paths in a method
6. An Overview of the Android Code-base and
Size/Complexity metrics
*
McCabe complexity metric: the
number of control flow paths in a
method
7. • An Overview of the Android…
• Software architecture
• Code-base and size/complexity metrics
• and its Automated Test Suites
• Design of the Case Study
• Results of the Case Study
• Q/A
Outline
*
8. • What is an Automated Test Suite?
• A software to test our production software
• There are may test frameworks and tools
Background: What is Automated Testing?
*
Example test method in JUnit
9. Android’s Automated Test Suites:
An example Unit test checking whether the emergency number (911) is
properly set
*
10. Android’s Automated Test Suites:
An example GUI test checking correct display of the last bottom on a
scrolling list of buttons
*
11. Android’s Automated Test Suites
*
Number of test methods per
language
C/C++ 32
Java 13,374
Total 13,406
Number of asserts per
language
C/C++ 772
Java 33,566
Total 34,338
LOC per test type Unit Tests 213,834
GUI Tests 71,179
% of Android
Source
C/C++ (CPPUnit) 5,624 0.09%
Java (JUnit) 357,933 17.12%
XML 12,610 1.99%
Web files 788 0.14%
Total 376,955 4.03%
12. • An Overview of the Android…
• Design of the Case Study
• Goal
• Results of the Case Study
• Q/A
Outline
*
13. • Following the Goal, Question, Metric (GQM) approach
• The goal of this case study is to assess
• test coverage (efforts/cost) – an example next
• fault detection effectiveness (benefit)
• test cost-effectiveness (ROI: cost vs. benefit)
• and defect density
• in code-base of the Android 2.1 platform
• in the context of software comprehension, testing and
maintenance activities
Design of the Case Study - Goal
*
14. What is code (test) coverage?
*
• How much of the code-base has been tested?
15. • An Overview of the Android…
• Design of the Case Study
• Results
• Code Coverage (RQ 1)
• Fault Detection Effectiveness of the Test Suites (RQ 2)
• Code Coverage versus Number of Actual Reported Defects (RQ 3)
• Correlation between Code Coverage and Fault Detection Effectiveness (RQ 4)
• Defect Density (RQ 5)
• Q/A
Outline
*
Will be discussed in this talk… Time
constraints
16. • An Overview of the Android…
• Design of the Case Study
• Results
• Code Coverage
• Defect Density
• Q/A
Outline
*
17. Code Coverage
*
• It is obvious that the relative amount of effort for developing test cases for
different packages is not the same
• We hypothesized the following reasons for such an imbalance in test
strength and completeness across various packages (should be studied in
future studies):
• (1) implication of code reuse on testing, e.g., Calendar package from the Google
Calendar project
Browser
Calendar
Email
Framework
GlobalSearch
1
10
100
1,000
10,000
100,000
1 10 100 1,000 10,000 100,000 1,000,000
#oflines(covered)
# of lines (total)
18. • Design of the Case Study
• An Overview of the Android…
• Results
• Code Coverage
• Defect Density
• Defined as the average number of defects per thousand lines of
code
• Q/A
Outline
*
19. • Question: How did we measure the number of actual reported defects
• We mined the Android bug repository for the number of reported actual defects for each
package
• An example:
Defect Density
*
20. • The package with the highest defect density (DD) is Music (DD=200)
• The package with the lowest DD value is Contacts (DD=10)
Defect Density
*
20000150001000050000
1200
1000
800
600
400
200
0
LOC
Num.ofdefects
GlobalSearch
CalendarProvider
DownloadProvider
GoogleContactsProvider
Music
Mms
Email
Contacts
Camera
Calendar
Browser
21. • Design of the Case Study
• An Overview of the Android…
• Results
• Two announcements:
• Android Application Development course in ODTU in Fall 2013 (IS 525-Mobile
Software Engineering)
• Türkiye’deki Yazılım Mühendisliği uygulamaları anketi (Survey of Software
Engineering practices in Turkey)
• Q/A
Outline
*
22. Syllabus:
• An overview of mobile platforms (iOS, Android, Windows Phone, and BlackBerry)
• Software Engineering Issues for Mobile Application Development
• Development processes
• Tools
• User interface design
• Application portability
• Quality
• Security
• Testing
• App Development in Android
• App Basics
• UI Overview, Activities, Application Lifecycle
• Intents, Intent Filters, Broadcasts, BroadcastReceivers
• Shared Preferences, Files, SQLite DB, Content Provider
• Automated testing, Test-Driven Development
• Google Maps, MapView, MapActivity
• Threads, Services, Status Bar Notifications
Android Application Development course in ODTU
Fall 2013. Course number: IS 525 (Informatics Institute)
*
23. • Similar surveys in the past:
Türkiye’deki Yazılım Mühendisliği uygulamaları anketi
Survey of Software Engineering practices in Turkey
*
https://www.surveymonkey.com/s/
turkiye_yazilim
24. • Many thanks for your time
and attention
• For more information:
Q/A…
*
Vahid Garousi, Riley Kotchorek, Mike Smith
Test Cost-Effectiveness and Defect Density:
A Case Study on the Android Platform
Book chapter, Advances in Computers
vol. 89, pp. 163-206
Editor: Atif Memon
May 2013
26. • Implications of RQ 2 (Fault Detection Effectiveness of the Test
Suites):
• Mutation testing results of RQ 2 denoted the need for further
work by researchers and practitioners into improving the
power (fault detection effectiveness) of Android test suites.
Summary of Results and Implications
*
27. • Implications of RQ 4 (Correlation between Code Coverage
and Fault Detection Effectiveness):
• The replicated (re-confirmed) results from RQ 4 (for correlation
between code coverage and mutation score) are valuable in
the area of evidence-based software testing
Summary of Results and Implications
*
28. Fault Detection Effectiveness of the Test Suites (RQ 2)
*
Package Representative Java class selected for Mutation
Mutation
score
Browser com.android.browser/BrowserActivity.java 2.4%
Calendar com.android.calendar/AlertActivity.java 1.0%
CalendarProvider
com.android.providers.calendar/CalendarProvider.java
35.4%
Camera com.android.camera/Camera.java 25.3%
Contacts com.android.contacts/ContactsUtils.java 23.7%
ContactsProvider
com.android.providers.contacts/ContactsProvider.java
15.2%
Email com.android.email/Account.java 45.5%
Mms com.google.android.mms/ContentType.java 24.6%
Music com.android.music/MediaPlaybackActivity.java 25.4%
Framework frameworks/base/core/java/android/app/Activty.java 71.2%
DownloadProvider
com.android.providers.downloads/DownloadProvider.java
0.0%
GlobalSearch
com.android.providers.contacts/GlobalSearchSupport.java
50.3%
29. Correlation between Code Coverage and Fault Detection
Effectiveness (RQ 4)
*
70.00%60.00%50.00%40.00%30.00%20.00%10.00%0.00%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
Line coverage %
Mutationscore
GlobalSearch
DownloadProvider
Framework
Music Mms
Email
ContactsProvider
Contacts
Camera
CalendarProvider
Calendar
Browser
30. • Research grants, cumulative amount (2006-2013):
• $1.3 million of R&D funding
• Some are joint with colleagues
• Share of my group: $760K
• NSERC: Discovery, CRD (x2), ENGAGE (x3)
• Natural Sciences and Engineering Research Council
of Canada
• Similar counterpart of TÜBİTAK in Turkiye and the
NSF in the USA
• New Faculty Award, by the Alberta Innovates
• Several internal University of Calgary funds
• Outcomes:
• Publications (so far): 23 journal papers, 38
conference papers
• One software is now being commercialized
(optimization software for oil pipeline networks)
Software Quality Engineering Research
Group (SoftQual)
*
32. Visualization of Android Code-base
*
• Using a free tool called CodeCity
• CodeCity is an integrated environment for software analysis, in which
software systems are visualized as interactive, navigable 3D cities.
• Purpose:
• For program comprehension (architecture)
• Localizing design problems with disharmony maps
• For refactoring
number of methods
(NOM)
number of attributes (NOA) in a class
34. • RQ 1-What is amount of test coverage achieved by Android’s test suites?
• RQ 2-How effective are the Android test suites at detecting faults? In other
words, how effective are the test suites at detecting artificial faults (using
mutation testing)?
• Test cost-effectiveness:
• RQ 3-Do code coverage values correlate with the number of real
reported defects? We would expect that, for packages with larger
coverage values (meaning more rigorous testing), less defects are
reported by the users.
• RQ 4-As a follow-up to RQ 3: Do code coverage values correlate with
ratio of detecting artificial faults (mutation score)? i.e., how does test
coverage (as a notion of test cost) relate to mutation score (as a notion
of effectiveness)?
• RQ 5-What is the defect density of different packages? How do size
metrics (LOC) of package correlate with the number of defects reported for
them?
Design of the Case Study - Research Questions (RQs)
*
35. • Design of the Case Study
• An Overview of the Android…
• Results of the Case Study
• Code Coverage (RQ 1)
• Fault Detection Effectiveness of the Test Suites (RQ 2)
• Code Coverage versus Number of Actual Reported Defects (RQ 3)
• Correlation between Code Coverage and Fault Detection Effectiveness (RQ 4)
• Defect Density (RQ 5)
• Summary of Results and Implications
Outline
*
36. Code Coverage versus Number of Actual Reported
Defects (RQ 3)
*
• As we can observe, in contrary to what one would expect, for packages with larger
coverage values (meaning more rigorous testing), it is not necessarily true that less
defects have been reported by the users after release.
• Also, it is not necessarily true that components with low coverage have more
defects.
70.00%60.00%50.00%40.00%30.00%20.00%10.00%0.00%
1200
1000
800
600
400
200
0
Line coverage %
Num.ofdefects
GlobalSearchCalendarProvider
DownloadProvider GoogleContactsProvider
Framework
Music
Mms
Email
Contacts
Camera
Calendar
Browser
37. • Design of the Case Study
• An Overview of the Android…
• Results of the Case Study
• Summary of Results and Implications
Outline
*
38. • Implications of RQ 1 (Code Coverage):
• Results of RQ 1 demonstrated the variance in coverage
measures across difference packages and difference classes
of each package.
• As implications, those results call for further research and
investigation into the root cause of spending varying amount of
test efforts on different parts of the system and whether any of
the following two example factors have been the case in the
context of Android test suite development:
• (1) implication of code reuse on testing
• (2) risk-based testing
Summary of Results and Implications
*
39. • Implications of RQ 3 (Code Coverage versus Number of
Reported Actual Defects):
• As we observed and analyzed, in contrary to what one would
expect, for packages with larger coverage values (meaning
more rigorous testing), it is not necessarily true that less
defects have been reported by the users after release.
Summary of Results and Implications
*
70.00%60.00%50.00%40.00%30.00%20.00%10.00%0.00%
1200
1000
800
600
400
200
0
Line coverage %
Num.ofdefects
GlobalSearchCalendarProvider
DownloadProvider GoogleContactsProvider
Framework
Music
Mms
Email
Contacts
Camera
Calendar
Browser
40. • Implications of RQ 5 (Defect Density):
• We assessed how size metrics (LOC) of different packages
correlated with the number of defects reported for them.
• The package with the highest defect density (DD) was Music
(DD=0.19 per 1 KLOC)
• and the package with the lowest DD value is ContactsProvider
(DD=0.0003).
• Studying the root-cause(s) of variation in DD values, e.g.,
code complexity
Summary of Results and Implications
*
41. Fine-grained (Class-level) Coverage Measurement
*
300025002000150010005000
100
80
60
40
20
0
Code size of each Java class (SLOC)
Linecoverage%