SlideShare a Scribd company logo
1 of 35
On the “Naturalness” of Buggy Code
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane,
Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu.
published in ICSE 2016
Jinhan Kim
2018.2.9
Naturalness
• Real software tends to be natural, like speech or natural
language.
• It tends to be highly repetitive and predictable.
Naturalness of Software1
[1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847,
2012.
What does it mean when a code is considered
“unnatural”?
Research Questions
• Are buggy lines less “natural” than non-buggy lines?
• Are buggy lines less “natural" than bug-fix lines?
• Is “naturalness" a good way to direct inspection effort?
Background
Language Model
• Language model assign a probability to every sequence of
words.
• Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁
ngram Language Model
• Using only the preceding n - 1 tokens.
• ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1
$gram Language Model2
• Improving language model by deploying an additional cache-
list of ngrams extracted from the local context, to capture the
local regularities.
[2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.
Study
Study Subject
Phase-1 (during active development)
• They chose to analyze each project for the period of one-year
which contained the most bug fixes in that project’s history.
• Then, extract snapshots at 1-month intervals.
Data Collection
Phase-2 (after release)
Entropy Measurement
• $gram
• The line and file entropies are computed by averaging over all
the tokens belong to a line and all lines corresponding to a file
respectively.
Entropy Measurement
• Package, class and method declarations
• previously unseen identifiers – higher entropy scores
• For-loop statements and catch clauses
• being often repetitive – lower entropy scores
Abstract-syntax-based line-types
and computing a syntax-sensitive
entropy score
Syntax-sensitive Entropy Score
• Matching between line and AST node.
• Then, compute how much a line’s entropy deviated from the
mean entropy of its line-type.
• => $gram+type
Relative bug-proneness
• => $gram+wType
Evaluation
RQ1: Are buggy lines less “natural" than
non-buggy lines?
Are buggy lines less “natural" than non-buggy lines?
Bug Duration
Bug Duration
Bugs that stay longer in a repository tend to have lower entropy than the
short-lived bugs
RQ2: Are buggy lines less “natural" than
bug-fix lines?
Are buggy lines less “natural" than bug-fix lines?
Example 1
Example 2
Example 3
Counterexample
RQ3: Is “naturalness" a good way to direct
inspection effort?
DP: Defect Prediction
• Two classifier
• Logistic Regression(LR)
• Random Forest(RF)
• Process metrics
• # of developers
• # of file-commit
• Code churn
• Previous bug history
SBF: Static Bug Finder
• SBF uses syntactic and semantic properties of source code.
• For this study, PMD and FindBugs are used.
• NBF: Naturalness Bug Finder
• AUCEC: Area Under the Cost-Effectiveness Curve
Detecting Buggy Files
Detecting Buggy Lines
Result
• Buggy lines, on average, have higher entropies, i.e. are “less
natural”, than non-buggy lines.
• Entropy of the buggy lines drops after bug-fixes with statistical
significance.
• Entropy can be used to guide bug-finding efforts at both the file-
level and the line-level.
Appendix

More Related Content

What's hot

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Lecture 8 dynamic programming
Lecture 8 dynamic programmingLecture 8 dynamic programming
Lecture 8 dynamic programmingOye Tu
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 
AI_Session 10 Local search in continious space.pptx
AI_Session 10 Local search in continious space.pptxAI_Session 10 Local search in continious space.pptx
AI_Session 10 Local search in continious space.pptxAsst.prof M.Gokilavani
 
Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Niloy Biswas
 
Lecture 3 RE NFA DFA
Lecture 3   RE NFA DFA Lecture 3   RE NFA DFA
Lecture 3 RE NFA DFA Rebaz Najeeb
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler DesignKuppusamy P
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmSwapnil Sonar
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Programming Languages / Translators
Programming Languages / TranslatorsProgramming Languages / Translators
Programming Languages / TranslatorsProject Student
 

What's hot (20)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
NLP
NLPNLP
NLP
 
Lecture 8 dynamic programming
Lecture 8 dynamic programmingLecture 8 dynamic programming
Lecture 8 dynamic programming
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
AI_Session 10 Local search in continious space.pptx
AI_Session 10 Local search in continious space.pptxAI_Session 10 Local search in continious space.pptx
AI_Session 10 Local search in continious space.pptx
 
Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)Regular expression to NFA (Nondeterministic Finite Automata)
Regular expression to NFA (Nondeterministic Finite Automata)
 
Nlp
NlpNlp
Nlp
 
Lecture 3 RE NFA DFA
Lecture 3   RE NFA DFA Lecture 3   RE NFA DFA
Lecture 3 RE NFA DFA
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
Macro
MacroMacro
Macro
 
Resume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering AlgorithmResume Parsing with Named Entity Clustering Algorithm
Resume Parsing with Named Entity Clustering Algorithm
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Programming Languages / Translators
Programming Languages / TranslatorsProgramming Languages / Translators
Programming Languages / Translators
 
Chapter 5 Syntax Directed Translation
Chapter 5   Syntax Directed TranslationChapter 5   Syntax Directed Translation
Chapter 5 Syntax Directed Translation
 

Similar to Review: On the Naturalness of Buggy Code

Do characters abuse more than words?
Do characters abuse more than words?Do characters abuse more than words?
Do characters abuse more than words?Tharushi Ruwandika
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiIIIT Hyderabad
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCRdbpublications
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Handsfnothaft
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information RetrievalSumin Byeon
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big DataSameer Wadkar
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attributionReza Ramezani
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 
BibleTech2013.pptx
BibleTech2013.pptxBibleTech2013.pptx
BibleTech2013.pptxAndi Wu
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-datac.titus.brown
 

Similar to Review: On the Naturalness of Buggy Code (20)

Do characters abuse more than words?
Do characters abuse more than words?Do characters abuse more than words?
Do characters abuse more than words?
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
 
Language models
Language modelsLanguage models
Language models
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Hands
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information Retrieval
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attribution
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
BibleTech2013.pptx
BibleTech2013.pptxBibleTech2013.pptx
BibleTech2013.pptx
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
 
Esa act
Esa actEsa act
Esa act
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Review: On the Naturalness of Buggy Code

  • 1. On the “Naturalness” of Buggy Code Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu. published in ICSE 2016 Jinhan Kim 2018.2.9
  • 2. Naturalness • Real software tends to be natural, like speech or natural language. • It tends to be highly repetitive and predictable. Naturalness of Software1 [1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847, 2012.
  • 3. What does it mean when a code is considered “unnatural”?
  • 4. Research Questions • Are buggy lines less “natural” than non-buggy lines? • Are buggy lines less “natural" than bug-fix lines? • Is “naturalness" a good way to direct inspection effort?
  • 6. Language Model • Language model assign a probability to every sequence of words. • Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁
  • 7. ngram Language Model • Using only the preceding n - 1 tokens. • ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1
  • 8. $gram Language Model2 • Improving language model by deploying an additional cache- list of ngrams extracted from the local context, to capture the local regularities. [2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.
  • 11. Phase-1 (during active development) • They chose to analyze each project for the period of one-year which contained the most bug fixes in that project’s history. • Then, extract snapshots at 1-month intervals.
  • 14. Entropy Measurement • $gram • The line and file entropies are computed by averaging over all the tokens belong to a line and all lines corresponding to a file respectively.
  • 15. Entropy Measurement • Package, class and method declarations • previously unseen identifiers – higher entropy scores • For-loop statements and catch clauses • being often repetitive – lower entropy scores Abstract-syntax-based line-types and computing a syntax-sensitive entropy score
  • 16. Syntax-sensitive Entropy Score • Matching between line and AST node. • Then, compute how much a line’s entropy deviated from the mean entropy of its line-type. • => $gram+type
  • 19. RQ1: Are buggy lines less “natural" than non-buggy lines?
  • 20. Are buggy lines less “natural" than non-buggy lines?
  • 22. Bug Duration Bugs that stay longer in a repository tend to have lower entropy than the short-lived bugs
  • 23. RQ2: Are buggy lines less “natural" than bug-fix lines?
  • 24. Are buggy lines less “natural" than bug-fix lines?
  • 29. RQ3: Is “naturalness" a good way to direct inspection effort?
  • 30. DP: Defect Prediction • Two classifier • Logistic Regression(LR) • Random Forest(RF) • Process metrics • # of developers • # of file-commit • Code churn • Previous bug history
  • 31. SBF: Static Bug Finder • SBF uses syntactic and semantic properties of source code. • For this study, PMD and FindBugs are used. • NBF: Naturalness Bug Finder • AUCEC: Area Under the Cost-Effectiveness Curve
  • 34. Result • Buggy lines, on average, have higher entropies, i.e. are “less natural”, than non-buggy lines. • Entropy of the buggy lines drops after bug-fixes with statistical significance. • Entropy can be used to guide bug-finding efforts at both the file- level and the line-level.