Revisiting Assert Use in GitHub Projects

Revisiting Assert Use in
GitHub Projects
Pavneet Singh Kochhar, and David Lo
Singapore Management University
21st International Conference on
Evaluation and Assessment in Software Engineering (EASE)

Assertions
• To test the assumptions about a piece of code
• Contains a boolean expression
2/30
Next
statements
Assertion
Error
Expr.

Assertions Example
• Checking null condition
3/30

Use of Assertions
• Enforce
• Preconditions
• Postconditions
• Invariants
• Effective way to detect and correct bugs earlier
• Similar to unit test directly embedded in the code
• Tests the program on real data
• Also serve as documentation to improve readability and
maintainability
4/30

Original Study
Casalnuovo et al. – “Assert Use in GitHub Projects”
• Goal – To understand impact of asserts on defect occurrence
• 69 C and C++ projects from GitHub
• Metrics – LOC added, No. of developers, No. of bug-fix
commits, No. of asserts
• Bug-fix commits – “bug”, “error”, “defect”, “flaw”, “issue”
5/30

Original Study
RQ1: How does assertion use relate to defect occurrence?
Asserts have a negative and significant relationship with
defect occurrence.
RQ2: How does assertion use relate to the collaborative/human
aspects of software engineering, such as ownership and
experience?
Developers who have added asserts have higher
ownership and experience.
6/30

Original Study
RQ3: What aspects of network position of a method in a call-
graph are associated with assertion placement?
No conclusive results were found for other network
measures such as authority, in-degree, out-degree and
betweeness.
RQ4: Does the domain of application of a project relate to
assertion use?
Application domain has no impact on the number of
assertions added.
7/30

Our Study
Partial replication of Casalnuovo et al.
RQ1: How does assertion use relate to defect occurrence?
RQ2: How does assertion use relate to developer characteristics
such as code ownership and experience?
RQ3: How are asserts used by developers?
8/30

Original Study vs. Our Study
9/30
Original Our
Number of
Projects
69 185
Language C, C++ Java
Research
Questions
1 Assert vs. Defect
Occurrence
1 Assert vs. Defect
Occurrence
2 Assert vs. Ownership
& Experience
2 Assert vs. Ownership
& Experience
3 Assert vs. Network
metrics 3 Assert Usage
4 Assert vs. Domain of
application

Data Collection
342 projects
10
>10 asserts added
185 projects
Popular – Apache Hadoop, HttpClient, Maven etc.

Dataset
11
Project Details
Number of Projects 185
Number of Developers 2791
KLOC 20,033
Number of Files 201,600
Number of Methods 1,993,828
Assert Methods 30,253
Total Period 12/1998 – 04/2016
# All Commits Total 4,852,069
With Asserts 7,540
#Bugfix Commits Total 29,867
With Asserts 741

Statistical Method
12
• Hurdle regression model
- Hurdle component
- Count component
• Dependent variables
Number of bug fixing commits
• Independent variables
Number of asserts
• Control variables
Lines changed, number of developers

Research Questions
RQ1:
How does assertion use relate to
defect occurrence?
13

RQ1: Assertion & Defect Occurrence
14

RQ1: Assertion & Defect Occurrence
15

RQ1 Findings
Adding asserts lead to lower defect
occurrence
Asserts added to methods with many
developers has a larger effect
16

Research Questions
RQ2:
How does assertion use relate to
developer characteristics such as code
ownership and experience?
17

Ownership & Experience
18
• Ownership
% of changes made to a method by a developer.
• Experience
Total number of commits made by the developer to a
method.

RQ2: Assertion & Ownership
19
MWW test p-value < 2.2e-16
Effect Size (Cohen’s d) - Medium

RQ2: Assertion & Experience
20
MWW test p-value < 2.2e-16
Effect Size (Cohen’s d) - Small

RQ2 Findings
Developers who added asserts have higher
ownership and experience
21

Research Questions
RQ3:
How are asserts used by developers?
22

Types of Asserts
23
• Null Condition Check
• Process State Check
• Initialization Check
• Resource Check
• Resource Lock Check
• Min and Max Value Constraint Check
• Collection Data and Length Check
• Implausible Condition Check

RQ3: Assert Usage
24
Null Condition check

RQ3: Assert Usage
25
Process State check

RQ3: Assert Usage
26
Initialization check

RQ3: Assert Usage
27
Resource check

RQ3: Assert Usage
28
Resource Lock check

RQ3: Assert Usage
29
Min & Max Value Constraint check

RQ3: Assert Usage
30
Collection Data & Length check

RQ3: Assert Usage
31
Implausible Condition check

RQ3 Findings
Asserts are used for several purposes such
as null check, resource lock check etc.
32

Key Takeaways
Adding asserts to a method have a small yet
significant relationship with defect occurrence.
Developers that added asserts have higher
ownership and experience
Assert Usage - null condition, initialization, process
state, resource lock, implausible condition, etc.
33/30

Thank You!
www.kochharps.wix.com/pavneet
kochharps.2012@smu.edu.sg

Revisiting Assert Use in GitHub Projects

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Revisiting Assert Use in GitHub Projects

Similar to Revisiting Assert Use in GitHub Projects (20)

More from Pavneet Singh Kochhar

More from Pavneet Singh Kochhar (9)

Recently uploaded

Recently uploaded (20)

Revisiting Assert Use in GitHub Projects