Automated Identification of On-hold Self-admitted Technical Debt

Automated Identification of On-hold
Self-admitted Technical Debt
Rungroj Maipradit1, Bin Lin2,
Csaba Nagy2, Gabriele Bavota2,
Michele Lanza2, Hideaki Hata1, Kenichi Matsumoto1
1Nara Institute of Science and Technology
2Università della Svizzera italiana

Self-Admitted Technical Debt (SATD)
3
SATD
https://github.com/apache/hadoop/blob/e346e3638c595a512cd582739ff51fb64c3b4950/hadoop-common-project/hadoop-
common/src/main/java/org/apache/hadoop/fs/FileContext.java#L512

On-hold SATD
4
On-hold SATD[1]
https://github.com/apache/hadoop/blob/e346e3638c595a512cd582739ff51fb64c3b4950/hadoop-common-project/hadoop-
common/src/main/java/org/apache/hadoop/fs/FileContext.java#L512

On-hold SATD with References to Issues
5
Since the waiting condition has been fulfilled,
thus mention as SATD were wrong form of “wrong documentation”.
Issue id
Issue id
Status &
Resolution
https://issues.apache.org/jira/browse/HADOOP-6223

Automated Identification of On-hold SATD
6
RQ1:
What is the accuracy of our
approach in identifying
On-hold SATD?
RQ2:
How does On-hold SATD
evolve in open source
projects?
RQ3:
To what extent can our
approach identify
“ready-to-be-removed”
On-hold SATD?

Dataset
7
10 projects
133 On-hold /
1,397 Cross-ref
3 issue tracking systems
1,530 comments

8
RQ1:
On-hold SATD?
RQ2:
projects?
RQ3:
approach identify “ready-to-
be-removed” On-hold SATD?
Investigates the
performance of our
classifier in identifying
On-hold SATD.
Inspect the duration of
existence of On-hold SATD,
and the time it takes to address
SATD after issue is resolved.
Evaluates the reliability in
identifying On-hold SATD
which should be removed.

9
RQ1
feature extraction Classification selection
• Term abstraction
• Lemmatization
• Special character removal
Extract n-gram by
applying N-gram IDF
Auto-sklearn
(Automated Machine learning)
Methodology
Data preprocessing

10
RQ1
feature extraction Classification selection
• Lemmatization
Extract n-gram by
applying N-gram IDF
Auto-sklearn
Methodology
Data preprocessing
// TODO: CAMEL-1475 should fix this // TODO: abstractissueid should fix this
Term abstraction

11
RQ1 Methodology
2004-01-27 Username tomcat
successfully authenticated
username, tomcat,
successfully authenticated
Feature extraction Classification selection
• Lemmatization
Extract n-gram by
applying N-gram IDF [2]
Auto-sklearn
Data preprocessing
N-gram IDF

12
RQ1
Feature extraction Classification selection
• Lemmatization
Extract n-gram by
applying N-gram IDF
Auto-sklearn [3]
Methodology
Data preprocessing
Auto-sklearn
14 feature
preprocessing
15 classifiers HyperparametersData preprocessing

Result
13
Original approach BOW as feature With Oversampling Different ML algorithms
N-gram +
Auto-sklearn
BOW +
Auto-sklearn
N-gram +
Oversampling +
Auto-sklearn
N-gram +
Naive Bayes
N-gram +
SVM
N-gram +
KNN
Precision 0.79 0.69 0.38 0.64 0.87 0.88
Recall 0.70 0.68 0.48 0.56 0.38 0.15
F1-score 0.73 0.67 0.41 0.59 0.51 0.25
AUC 0.97 0.94 0.87 0.81 0.95 0.76
From 10-fold cross validation, our original approach
achieve the best performance on F1-score and AUC.
RQ1

14
RQ1:
On-hold SATD?
RQ2:
projects?
RQ3:
Investigates the
performance of our
On-hold SATD.
Our original approach
achieve the best
performance on F1-score
and AUC.

Distribution of life spans of
removed issue-referring comments
15
The median life span of On-hold SATD comments is 42 days,
while it is 119.5 days for cross-reference comments.
RQ2

Distribution of days needed to address
SATD comments after issues were resolved
16
Around 53% of On-hold SATD were removed within the same day when the issue was resolved.
RQ2
However, it took longer than one year to remove 13% of On-hold SATD.

17
RQ1:
On-hold SATD?
RQ2:
projects?
RQ3:
Investigates the
performance of our
On-hold SATD.
On-hold SATD has a shorter
lifespan compared to Cross-ref.
And some of on-hold SATD take
longer than a year to be removed.
achieve the best
and AUC.

Methodology
18
RQ3
Ready to be removed On-hold
Report to developer
“I think this is correct finding. Would you like to put a patch for this”
Feedback

Methodology
19
RQ3
6 On-hold SATD
were reported
2 response
from developer
Overall, the two cases for which we have already received feedback indicates the
practical value of our approach for On-hold SATD identification and removal.

20
RQ1:
On-hold SATD?
RQ2:
projects?
RQ3:
Investigates the
performance of our
On-hold SATD.
On-hold SATD has a shorter
lifespan compared to Cross-ref.
And some of on-hold SATD take
longer than a year to be removed.
Feedback indicates the
practical value of our
approach for On-hold SATD
identification and removal.
achieve the best
and AUC.

Questions
21
In one of our findings, after issue has already been solved 13% of comments were removed with a
delay more than one year. Does this problem exist only in OSS or it also happens in the industry?
If two on-hold SATD reference the same issue and one of them already removed.
Is it possible to suggest code modification to another one?

Automated Identification of On-hold Self-admitted Technical Debt

More Related Content

Recently uploaded

Featured

Automated Identification of On-hold Self-admitted Technical Debt

Editor's Notes