On the “Naturalness” of Buggy Code
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane,
Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu.
published in ICSE 2016
Jinhan Kim
2018.2.9
Naturalness
• Real software tends to be natural, like speech or natural
language.
• It tends to be highly repetitive and predictable.
Naturalness of Software1
[1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847,
2012.
What does it mean when a code is considered
“unnatural”?
Research Questions
• Are buggy lines less “natural” than non-buggy lines?
• Are buggy lines less “natural" than bug-fix lines?
• Is “naturalness" a good way to direct inspection effort?
Background
Language Model
• Language model assign a probability to every sequence of
words.
• Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁
ngram Language Model
• Using only the preceding n - 1 tokens.
• ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1
$gram Language Model2
• Improving language model by deploying an additional cache-
list of ngrams extracted from the local context, to capture the
local regularities.
[2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.
Study
Study Subject
Phase-1 (during active development)
• They chose to analyze each project for the period of one-year
which contained the most bug fixes in that project’s history.
• Then, extract snapshots at 1-month intervals.
Data Collection
Phase-2 (after release)
Entropy Measurement
• $gram
• The line and file entropies are computed by averaging over all
the tokens belong to a line and all lines corresponding to a file
respectively.
Entropy Measurement
• Package, class and method declarations
• previously unseen identifiers – higher entropy scores
• For-loop statements and catch clauses
• being often repetitive – lower entropy scores
Abstract-syntax-based line-types
and computing a syntax-sensitive
entropy score
Syntax-sensitive Entropy Score
• Matching between line and AST node.
• Then, compute how much a line’s entropy deviated from the
mean entropy of its line-type.
• => $gram+type
Relative bug-proneness
• => $gram+wType
Evaluation
RQ1: Are buggy lines less “natural" than
non-buggy lines?
Are buggy lines less “natural" than non-buggy lines?
Bug Duration
Bug Duration
Bugs that stay longer in a repository tend to have lower entropy than the
short-lived bugs
RQ2: Are buggy lines less “natural" than
bug-fix lines?
Are buggy lines less “natural" than bug-fix lines?
Example 1
Example 2
Example 3
Counterexample
RQ3: Is “naturalness" a good way to direct
inspection effort?
DP: Defect Prediction
• Two classifier
• Logistic Regression(LR)
• Random Forest(RF)
• Process metrics
• # of developers
• # of file-commit
• Code churn
• Previous bug history
SBF: Static Bug Finder
• SBF uses syntactic and semantic properties of source code.
• For this study, PMD and FindBugs are used.
• NBF: Naturalness Bug Finder
• AUCEC: Area Under the Cost-Effectiveness Curve
Detecting Buggy Files
Detecting Buggy Lines
Result
• Buggy lines, on average, have higher entropies, i.e. are “less
natural”, than non-buggy lines.
• Entropy of the buggy lines drops after bug-fixes with statistical
significance.
• Entropy can be used to guide bug-finding efforts at both the file-
level and the line-level.
Appendix

Review: On the Naturalness of Buggy Code

  • 1.
    On the “Naturalness”of Buggy Code Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu. published in ICSE 2016 Jinhan Kim 2018.2.9
  • 2.
    Naturalness • Real softwaretends to be natural, like speech or natural language. • It tends to be highly repetitive and predictable. Naturalness of Software1 [1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847, 2012.
  • 3.
    What does itmean when a code is considered “unnatural”?
  • 4.
    Research Questions • Arebuggy lines less “natural” than non-buggy lines? • Are buggy lines less “natural" than bug-fix lines? • Is “naturalness" a good way to direct inspection effort?
  • 5.
  • 6.
    Language Model • Languagemodel assign a probability to every sequence of words. • Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁
  • 7.
    ngram Language Model •Using only the preceding n - 1 tokens. • ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1
  • 8.
    $gram Language Model2 •Improving language model by deploying an additional cache- list of ngrams extracted from the local context, to capture the local regularities. [2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.
  • 9.
  • 10.
  • 11.
    Phase-1 (during activedevelopment) • They chose to analyze each project for the period of one-year which contained the most bug fixes in that project’s history. • Then, extract snapshots at 1-month intervals.
  • 12.
  • 13.
  • 14.
    Entropy Measurement • $gram •The line and file entropies are computed by averaging over all the tokens belong to a line and all lines corresponding to a file respectively.
  • 15.
    Entropy Measurement • Package,class and method declarations • previously unseen identifiers – higher entropy scores • For-loop statements and catch clauses • being often repetitive – lower entropy scores Abstract-syntax-based line-types and computing a syntax-sensitive entropy score
  • 16.
    Syntax-sensitive Entropy Score •Matching between line and AST node. • Then, compute how much a line’s entropy deviated from the mean entropy of its line-type. • => $gram+type
  • 17.
  • 18.
  • 19.
    RQ1: Are buggylines less “natural" than non-buggy lines?
  • 20.
    Are buggy linesless “natural" than non-buggy lines?
  • 21.
  • 22.
    Bug Duration Bugs thatstay longer in a repository tend to have lower entropy than the short-lived bugs
  • 23.
    RQ2: Are buggylines less “natural" than bug-fix lines?
  • 24.
    Are buggy linesless “natural" than bug-fix lines?
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    RQ3: Is “naturalness"a good way to direct inspection effort?
  • 30.
    DP: Defect Prediction •Two classifier • Logistic Regression(LR) • Random Forest(RF) • Process metrics • # of developers • # of file-commit • Code churn • Previous bug history
  • 31.
    SBF: Static BugFinder • SBF uses syntactic and semantic properties of source code. • For this study, PMD and FindBugs are used. • NBF: Naturalness Bug Finder • AUCEC: Area Under the Cost-Effectiveness Curve
  • 32.
  • 33.
  • 34.
    Result • Buggy lines,on average, have higher entropies, i.e. are “less natural”, than non-buggy lines. • Entropy of the buggy lines drops after bug-fixes with statistical significance. • Entropy can be used to guide bug-finding efforts at both the file- level and the line-level.
  • 35.