SlideShare a Scribd company logo
On the “Naturalness” of Buggy Code
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane,
Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu.
published in ICSE 2016
Jinhan Kim
2018.2.9
Naturalness
• Real software tends to be natural, like speech or natural
language.
• It tends to be highly repetitive and predictable.
Naturalness of Software1
[1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847,
2012.
What does it mean when a code is considered
“unnatural”?
Research Questions
• Are buggy lines less “natural” than non-buggy lines?
• Are buggy lines less “natural" than bug-fix lines?
• Is “naturalness" a good way to direct inspection effort?
Background
Language Model
• Language model assign a probability to every sequence of
words.
• Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁
ngram Language Model
• Using only the preceding n - 1 tokens.
• ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1
$gram Language Model2
• Improving language model by deploying an additional cache-
list of ngrams extracted from the local context, to capture the
local regularities.
[2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.
Study
Study Subject
Phase-1 (during active development)
• They chose to analyze each project for the period of one-year
which contained the most bug fixes in that project’s history.
• Then, extract snapshots at 1-month intervals.
Data Collection
Phase-2 (after release)
Entropy Measurement
• $gram
• The line and file entropies are computed by averaging over all
the tokens belong to a line and all lines corresponding to a file
respectively.
Entropy Measurement
• Package, class and method declarations
• previously unseen identifiers – higher entropy scores
• For-loop statements and catch clauses
• being often repetitive – lower entropy scores
Abstract-syntax-based line-types
and computing a syntax-sensitive
entropy score
Syntax-sensitive Entropy Score
• Matching between line and AST node.
• Then, compute how much a line’s entropy deviated from the
mean entropy of its line-type.
• => $gram+type
Relative bug-proneness
• => $gram+wType
Evaluation
RQ1: Are buggy lines less “natural" than
non-buggy lines?
Are buggy lines less “natural" than non-buggy lines?
Bug Duration
Bug Duration
Bugs that stay longer in a repository tend to have lower entropy than the
short-lived bugs
RQ2: Are buggy lines less “natural" than
bug-fix lines?
Are buggy lines less “natural" than bug-fix lines?
Example 1
Example 2
Example 3
Counterexample
RQ3: Is “naturalness" a good way to direct
inspection effort?
DP: Defect Prediction
• Two classifier
• Logistic Regression(LR)
• Random Forest(RF)
• Process metrics
• # of developers
• # of file-commit
• Code churn
• Previous bug history
SBF: Static Bug Finder
• SBF uses syntactic and semantic properties of source code.
• For this study, PMD and FindBugs are used.
• NBF: Naturalness Bug Finder
• AUCEC: Area Under the Cost-Effectiveness Curve
Detecting Buggy Files
Detecting Buggy Lines
Result
• Buggy lines, on average, have higher entropies, i.e. are “less
natural”, than non-buggy lines.
• Entropy of the buggy lines drops after bug-fixes with statistical
significance.
• Entropy can be used to guide bug-finding efforts at both the file-
level and the line-level.
Appendix

More Related Content

What's hot

[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
Deep Learning JP
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation
Deep Learning JP
 
[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章
Haruki Eguchi
 
continual learning survey
continual learning surveycontinual learning survey
continual learning survey
ぱんいち すみもと
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
Eiichi Matsumoto
 
モンテカルロ法と情報量
モンテカルロ法と情報量モンテカルロ法と情報量
モンテカルロ法と情報量
Shohei Miyashita
 
失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用
Hiroyuki Masuda
 
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
Deep Learning JP
 
特徴選択のためのLasso解列挙
特徴選択のためのLasso解列挙特徴選択のためのLasso解列挙
特徴選択のためのLasso解列挙
Satoshi Hara
 
GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNN
Abhishek Tiwari
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
ぱんいち すみもと
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
harmonylab
 
Grad-CAMの始まりのお話
Grad-CAMの始まりのお話Grad-CAMの始まりのお話
Grad-CAMの始まりのお話
Shintaro Yoshida
 
【論文読み会】Universal Language Model Fine-tuning for Text Classification
【論文読み会】Universal Language Model Fine-tuning for Text Classification【論文読み会】Universal Language Model Fine-tuning for Text Classification
【論文読み会】Universal Language Model Fine-tuning for Text Classification
ARISE analytics
 
Survey of Attention mechanism
Survey of Attention mechanismSurvey of Attention mechanism
Survey of Attention mechanism
SwatiNarkhede1
 
パターン認識と機械学習 §8.3.4 有向グラフとの関係
パターン認識と機械学習 §8.3.4 有向グラフとの関係パターン認識と機械学習 §8.3.4 有向グラフとの関係
パターン認識と機械学習 §8.3.4 有向グラフとの関係
Prunus 1350
 
[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives
[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives
[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives
Deep Learning JP
 
SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来
Hidekazu Oiwa
 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
btandale
 

What's hot (20)

[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation
 
[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章
 
continual learning survey
continual learning surveycontinual learning survey
continual learning survey
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
 
モンテカルロ法と情報量
モンテカルロ法と情報量モンテカルロ法と情報量
モンテカルロ法と情報量
 
失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用
 
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
 
特徴選択のためのLasso解列挙
特徴選択のためのLasso解列挙特徴選択のためのLasso解列挙
特徴選択のためのLasso解列挙
 
GUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNNGUI based handwritten digit recognition using CNN
GUI based handwritten digit recognition using CNN
 
Jokyokai
JokyokaiJokyokai
Jokyokai
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
 
Grad-CAMの始まりのお話
Grad-CAMの始まりのお話Grad-CAMの始まりのお話
Grad-CAMの始まりのお話
 
【論文読み会】Universal Language Model Fine-tuning for Text Classification
【論文読み会】Universal Language Model Fine-tuning for Text Classification【論文読み会】Universal Language Model Fine-tuning for Text Classification
【論文読み会】Universal Language Model Fine-tuning for Text Classification
 
Survey of Attention mechanism
Survey of Attention mechanismSurvey of Attention mechanism
Survey of Attention mechanism
 
パターン認識と機械学習 §8.3.4 有向グラフとの関係
パターン認識と機械学習 §8.3.4 有向グラフとの関係パターン認識と機械学習 §8.3.4 有向グラフとの関係
パターン認識と機械学習 §8.3.4 有向グラフとの関係
 
[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives
[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives
[DL輪読会]Deep Learning based Recommender System: A Survey and New Perspectives
 
SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来
 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
 

Similar to Review: On the Naturalness of Buggy Code

Do characters abuse more than words?
Do characters abuse more than words?Do characters abuse more than words?
Do characters abuse more than words?
Tharushi Ruwandika
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Varunjeet Singh Rekhi
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
IIIT Hyderabad
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
dbpublications
 
Language models
Language modelsLanguage models
Language models
Maryam Khordad
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Hands
fnothaft
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
shrey bhate
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
DigiGurukul
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information Retrieval
Sumin Byeon
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
Sameer Wadkar
 
Data Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH AlgorithmsData Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH Algorithms
deepika90811
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attribution
Reza Ramezani
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
Tomoyuki Kajiwara
 
BibleTech2013.pptx
BibleTech2013.pptxBibleTech2013.pptx
BibleTech2013.pptx
Andi Wu
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
Sumit Raj
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Basha Chand
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
VenkateshMurugadas
 

Similar to Review: On the Naturalness of Buggy Code (20)

Do characters abuse more than words?
Do characters abuse more than words?Do characters abuse more than words?
Do characters abuse more than words?
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
 
Language models
Language modelsLanguage models
Language models
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Hands
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information Retrieval
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
Data Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH AlgorithmsData Mining Email SPam Detection PPT WITH Algorithms
Data Mining Email SPam Detection PPT WITH Algorithms
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attribution
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
BibleTech2013.pptx
BibleTech2013.pptxBibleTech2013.pptx
BibleTech2013.pptx
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 

Recently uploaded

Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
Madan Karki
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 

Recently uploaded (20)

Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
john krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptxjohn krisinger-the science and history of the alcoholic beverage.pptx
john krisinger-the science and history of the alcoholic beverage.pptx
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 

Review: On the Naturalness of Buggy Code

  • 1. On the “Naturalness” of Buggy Code Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu. published in ICSE 2016 Jinhan Kim 2018.2.9
  • 2. Naturalness • Real software tends to be natural, like speech or natural language. • It tends to be highly repetitive and predictable. Naturalness of Software1 [1] A. Hindle, E. Barr, M. Gabel, Z. Su, and P. Devanbu. On the naturalness of software. In ICSE, pages 837–847, 2012.
  • 3. What does it mean when a code is considered “unnatural”?
  • 4. Research Questions • Are buggy lines less “natural” than non-buggy lines? • Are buggy lines less “natural" than bug-fix lines? • Is “naturalness" a good way to direct inspection effort?
  • 6. Language Model • Language model assign a probability to every sequence of words. • Given a code token sequence, 𝑆 = 𝑡1 𝑡2 … 𝑡 𝑁
  • 7. ngram Language Model • Using only the preceding n - 1 tokens. • ℎ = 𝑡1 𝑡2 … 𝑡𝑖−1
  • 8. $gram Language Model2 • Improving language model by deploying an additional cache- list of ngrams extracted from the local context, to capture the local regularities. [2] Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In SIGSOFT FSE, pages 269–280, 2014.
  • 11. Phase-1 (during active development) • They chose to analyze each project for the period of one-year which contained the most bug fixes in that project’s history. • Then, extract snapshots at 1-month intervals.
  • 14. Entropy Measurement • $gram • The line and file entropies are computed by averaging over all the tokens belong to a line and all lines corresponding to a file respectively.
  • 15. Entropy Measurement • Package, class and method declarations • previously unseen identifiers – higher entropy scores • For-loop statements and catch clauses • being often repetitive – lower entropy scores Abstract-syntax-based line-types and computing a syntax-sensitive entropy score
  • 16. Syntax-sensitive Entropy Score • Matching between line and AST node. • Then, compute how much a line’s entropy deviated from the mean entropy of its line-type. • => $gram+type
  • 19. RQ1: Are buggy lines less “natural" than non-buggy lines?
  • 20. Are buggy lines less “natural" than non-buggy lines?
  • 22. Bug Duration Bugs that stay longer in a repository tend to have lower entropy than the short-lived bugs
  • 23. RQ2: Are buggy lines less “natural" than bug-fix lines?
  • 24. Are buggy lines less “natural" than bug-fix lines?
  • 29. RQ3: Is “naturalness" a good way to direct inspection effort?
  • 30. DP: Defect Prediction • Two classifier • Logistic Regression(LR) • Random Forest(RF) • Process metrics • # of developers • # of file-commit • Code churn • Previous bug history
  • 31. SBF: Static Bug Finder • SBF uses syntactic and semantic properties of source code. • For this study, PMD and FindBugs are used. • NBF: Naturalness Bug Finder • AUCEC: Area Under the Cost-Effectiveness Curve
  • 34. Result • Buggy lines, on average, have higher entropies, i.e. are “less natural”, than non-buggy lines. • Entropy of the buggy lines drops after bug-fixes with statistical significance. • Entropy can be used to guide bug-finding efforts at both the file- level and the line-level.