Review: On the Naturalness of Buggy Code

•Download as PPTX, PDF•

0 likes•238 views

Jinhan Kim

Engineering

On the “Naturalness” of Buggy Code
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane,
Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu.
published in ICSE 2016
Jinhan Kim
2018.2.9

Naturalness
• Real software tends to be natural, like speech or natural
language.
• It tends to be highly repetitive and predictable.
Naturalness of Software1
[1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847,
2012.

What does it mean when a code is considered
“unnatural”?

Research Questions
• Are buggy lines less “natural” than non-buggy lines?
• Are buggy lines less “natural" than bug-fix lines?
• Is “naturalness" a good way to direct inspection effort?

Language Model
• Language model assign a probability to every sequence of
words.
• Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁

ngram Language Model
• Using only the preceding n - 1 tokens.
• ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1

$gram Language Model2
• Improving language model by deploying an additional cache-
list of ngrams extracted from the local context, to capture the
local regularities.
[2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.

Phase-1 (during active development)
• They chose to analyze each project for the period of one-year
which contained the most bug fixes in that project’s history.
• Then, extract snapshots at 1-month intervals.

Entropy Measurement
• $gram
• The line and file entropies are computed by averaging over all
the tokens belong to a line and all lines corresponding to a file
respectively.

Entropy Measurement
• Package, class and method declarations
• previously unseen identifiers – higher entropy scores
• For-loop statements and catch clauses
• being often repetitive – lower entropy scores
Abstract-syntax-based line-types
and computing a syntax-sensitive
entropy score

Syntax-sensitive Entropy Score
• Matching between line and AST node.
• Then, compute how much a line’s entropy deviated from the
mean entropy of its line-type.
• => $gram+type

RQ1: Are buggy lines less “natural" than
non-buggy lines?

Are buggy lines less “natural" than non-buggy lines?

Bug Duration
Bugs that stay longer in a repository tend to have lower entropy than the
short-lived bugs

RQ2: Are buggy lines less “natural" than
bug-fix lines?

Are buggy lines less “natural" than bug-fix lines?

RQ3: Is “naturalness" a good way to direct
inspection effort?

DP: Defect Prediction
• Two classifier
• Logistic Regression(LR)
• Random Forest(RF)
• Process metrics
• # of developers
• # of file-commit
• Code churn
• Previous bug history

SBF: Static Bug Finder
• SBF uses syntactic and semantic properties of source code.
• For this study, PMD and FindBugs are used.
• NBF: Naturalness Bug Finder
• AUCEC: Area Under the Cost-Effectiveness Curve

Result
• Buggy lines, on average, have higher entropies, i.e. are “less
natural”, than non-buggy lines.
• Entropy of the buggy lines drops after bug-fixes with statistical
significance.
• Entropy can be used to guide bug-finding efforts at both the file-
level and the line-level.

What's hot

Natural Language Processing (NLP)Yuriy Guts

Natural language processing Md.Sumon Sarder

Nlp ambiguity presentationGurram Poorna Prudhvi

NLPguestff64339

Lecture 8 dynamic programmingOye Tu

Regular expressionsRatnakar Mikkili

Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj

Natural Language Processing in AISaurav Shrestha

Natural Language Processing: ParsingRushdi Shams

AI_Session 10 Local search in continious space.pptxAsst.prof M.Gokilavani

Regular expression to NFA (Nondeterministic Finite Automata)Niloy Biswas

NlpNishanthini Mary

Lecture 3 RE NFA DFA Rebaz Najeeb

Lexical analysis - Compiler DesignKuppusamy P

Asymptotic NotationProtap Mondal

Macrojayashri kolekar

Resume Parsing with Named Entity Clustering AlgorithmSwapnil Sonar

Introduction to Natural Language ProcessingPranav Gupta

Programming Languages / TranslatorsProject Student

Chapter 5 Syntax Directed TranslationRadhakrishnan Chinnusamy

What's hot (20)

Natural Language Processing (NLP)

Natural language processing

Nlp ambiguity presentation

NLP

Lecture 8 dynamic programming

Regular expressions

Natural Language processing Parts of speech tagging, its classes, and how to ...

Natural Language Processing in AI

Natural Language Processing: Parsing

AI_Session 10 Local search in continious space.pptx

Regular expression to NFA (Nondeterministic Finite Automata)

Nlp

Lecture 3 RE NFA DFA

Lexical analysis - Compiler Design

Asymptotic Notation

Macro

Resume Parsing with Named Entity Clustering Algorithm

Introduction to Natural Language Processing

Programming Languages / Translators

Chapter 5 Syntax Directed Translation

Similar to Review: On the Naturalness of Buggy Code

Do characters abuse more than words?Tharushi Ruwandika

A Panorama of Natural Language ProcessingTed Xiao

Natural Language ProcessingVarunjeet Singh Rekhi

Code Mixing computationally bahut challenging haiIIIT Hyderabad

Natural Language ProcessingToine Bogers

Spell checker for Kannada OCRdbpublications

Language modelsMaryam Khordad

PacMin @ AMPLab All-Handsfnothaft

Computational linguisticsshrey bhate

Artificial Intelligence Notes Unit 4DigiGurukul

Cross-Language Information RetrievalSumin Byeon

Automated Abstracts and Big DataSameer Wadkar

Authorship attributionReza Ramezani

Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara

BibleTech2013.pptxAndi Wu

Natural language processing (Python)Sumit Raj

Natural language processingBasha Chand

Introduction to Natural Language Processing (NLP)VenkateshMurugadas

2013 siam-cse-big-datac.titus.brown

Esa actAdvanced-Concepts-Team

Similar to Review: On the Naturalness of Buggy Code (20)

Do characters abuse more than words?

A Panorama of Natural Language Processing

Natural Language Processing

Code Mixing computationally bahut challenging hai

Natural Language Processing

Spell checker for Kannada OCR

Language models

PacMin @ AMPLab All-Hands

Computational linguistics

Artificial Intelligence Notes Unit 4

Cross-Language Information Retrieval

Automated Abstracts and Big Data

Authorship attribution

Noun Paraphrasing Based on a Variety of Contexts

BibleTech2013.pptx

Natural language processing (Python)

Natural language processing

Introduction to Natural Language Processing (NLP)

2013 siam-cse-big-data

Esa act

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

Java Programming :Event Handling(Types of Events)simmis5

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Roadmap to Membership of RICS - Pathways and Routes

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

Java Programming :Event Handling(Types of Events)

Coefficient of Thermal Expansion and their Importance.pptx

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Microscopic Analysis of Ceramic Materials.pptx

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

Water Industry Process Automation & Control Monthly - April 2024

Review: On the Naturalness of Buggy Code

1. On the “Naturalness” of Buggy Code Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu. published in ICSE 2016 Jinhan Kim 2018.2.9

2. Naturalness • Real software tends to be natural, like speech or natural language. • It tends to be highly repetitive and predictable. Naturalness of Software1 [1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847, 2012.

3. What does it mean when a code is considered “unnatural”?

4. Research Questions • Are buggy lines less “natural” than non-buggy lines? • Are buggy lines less “natural" than bug-fix lines? • Is “naturalness" a good way to direct inspection effort?

5. Background

6. Language Model • Language model assign a probability to every sequence of words. • Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁

7. ngram Language Model • Using only the preceding n - 1 tokens. • ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1

8. $gram Language Model2 • Improving language model by deploying an additional cache- list of ngrams extracted from the local context, to capture the local regularities. [2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.

9. Study

10. Study Subject

11. Phase-1 (during active development) • They chose to analyze each project for the period of one-year which contained the most bug fixes in that project’s history. • Then, extract snapshots at 1-month intervals.

12. Data Collection

13. Phase-2 (after release)

14. Entropy Measurement • $gram • The line and file entropies are computed by averaging over all the tokens belong to a line and all lines corresponding to a file respectively.

15. Entropy Measurement • Package, class and method declarations • previously unseen identifiers – higher entropy scores • For-loop statements and catch clauses • being often repetitive – lower entropy scores Abstract-syntax-based line-types and computing a syntax-sensitive entropy score

16. Syntax-sensitive Entropy Score • Matching between line and AST node. • Then, compute how much a line’s entropy deviated from the mean entropy of its line-type. • => $gram+type

17. Relative bug-proneness • => $gram+wType

18. Evaluation

19. RQ1: Are buggy lines less “natural" than non-buggy lines?

20. Are buggy lines less “natural" than non-buggy lines?

21. Bug Duration

22. Bug Duration Bugs that stay longer in a repository tend to have lower entropy than the short-lived bugs

23. RQ2: Are buggy lines less “natural" than bug-fix lines?

24. Are buggy lines less “natural" than bug-fix lines?

25. Example 1

26. Example 2

27. Example 3

28. Counterexample

29. RQ3: Is “naturalness" a good way to direct inspection effort?

30. DP: Defect Prediction • Two classifier • Logistic Regression(LR) • Random Forest(RF) • Process metrics • # of developers • # of file-commit • Code churn • Previous bug history

31. SBF: Static Bug Finder • SBF uses syntactic and semantic properties of source code. • For this study, PMD and FindBugs are used. • NBF: Naturalness Bug Finder • AUCEC: Area Under the Cost-Effectiveness Curve

32. Detecting Buggy Files

33. Detecting Buggy Lines

34. Result • Buggy lines, on average, have higher entropies, i.e. are “less natural”, than non-buggy lines. • Entropy of the buggy lines drops after bug-fixes with statistical significance. • Entropy can be used to guide bug-finding efforts at both the file- level and the line-level.

35. Appendix

Review: On the Naturalness of Buggy Code

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Review: On the Naturalness of Buggy Code

Similar to Review: On the Naturalness of Buggy Code (20)

Recently uploaded

Recently uploaded (20)

Review: On the Naturalness of Buggy Code