SlideShare a Scribd company logo
A Large Scale Study of
Multiple Programming Languages
and Code Quality
Pavneet S. Kochhar, Dinusha Wijedasa, and David Lo
Singapore Management University
{kochharps.2012,dwijedasa,davidlo}@smu.edu.sg
International Conference on Software Analysis, Evolution and
Reengineering (SANER 2016)
Motivation
• Over 700 languages1 available
• Developers often use multiple languages
– To improve performance, productivity and agility
– Linux OS kernel mainly written in C
Utilities & app. written in C++, Perl, Python
• Concerns
– Increase in complexity
– Proper interfaces between languages
1. https://en.wikipedia.org/wiki/List_of_programming_languages
2
Study Goal
3
What is the impact of the usage of
multiple languages
on bug proneness?
Outline
• Study Goal
• Empirical Study Methodology
– Data Collection and Basic Statistics
– Categorizing Bugs
– Statistical Methods
• Research Questions
• Findings
• Conclusion and Future Work
4
Data Collection
• We collect projects hosted on
• Most popular projects in 17 languages
– C, C++, C#, Objective-C, Go, Java, JavaScript,
CoffeeScript, TypeScript, Ruby, Php, Python, Perl,
Clojure, Erlang, Haskell, Scala
• In total, we have 628 popular projects
– Linux, Redis, Php-src, Elasticsearch
– 85M SLOC, 134K developers, 3.17M commits
5
Basic Statistics
6
Distribution of GitHub Projects w.r.t Number of
Language Usage
Language Usage Number of
Projects
% of Projects
Written in a single
language
297 47.29%
Written in multiple
languages
331 52.71%
Total – 628 projects
Basic Statistics
7
Project Details
Primary
Language
Number
of
Projects
Number of
Associated
Languages
Number of
Developers
SLOC
(KLOC)
Period
C 104 9 25,203 17,350 4/1999 - 10/2015
C++ 94 12 6,855 14,179 7/2000 - 10/2015
C# 46 8 4,818 8,848 6/2001 - 10/2015
Objective-C 38 4 2,319 378 7/2008 - 10/2015
Go 41 10 6,297 4,013 3/2008 - 10/2015
Basic Statistics
8
Evolution History Details
Primary
Language
Total Commits BugFix Commits
Number of
Commits
Code Churn
(KLOC)
Bug Fixes Code Churn
(KLOC)
C 780,153 229,411 258,822 42,736
C++ 284,331 250,238 112,591 63,881
C# 235,048 138,719 69,974 15,786
Objective-C 30,048 16,617 7,561 1,031
Go 117,282 33,134 32,371 4,486
Categorizing Bugs
• Categorize each bug fix commit – cause
and impact
• Keyword matching
– 10% of bug fix commits (926,789)
– Semi-automated ways (Keywords & manual)
• Supervised classification
– To classify remaining 90% of the commits
– Randomly selected 270 commits & manually
classified to check precision & recall.
9
Statistical Method
• Negative Binomial Regression (NBR)
• To handle multi-collinearity, we compute
variance inflation factors (VIFs)
• Dependent variable - Number of bug fix
commits
• Control variables - age, size (LOC), number
of developers and total number of commits.
10
Outline
• Study Goal
• Empirical Study Methodology
–Data Collection and Basic Statistics
–Categorizing Bugs
–Statistical Methods
• Research Questions
• Findings
• Conclusion and Future Work
11
Research Questions
12
RQ1: Does the use of more languages correlate with
higher bug proneness?
RQ2: Are some languages more bug prone when
they are used with other languages?
RQ3: What are the relationships between multi-
language usage and bugs of various
categories?
Findings
13
RQ1:
Does the use of more languages correlate
with higher bug proneness?
RQ1: More Languages More Bugs
14
Coeff. Std. Error z-val Pr(> |z|)
(Intercept) 4.718 0.089 53.19 0.000 ***
age 2.939e-04 3.794e-05 7.75 0.000 ***
size 3.390e-07 1.310e-07 2.59 0.010 **
developers 9.807e-04 1.107e-04 8.86 0.000 ***
commits 6.048e-05 5.177e-06 11.68 0.000 ***
num-langs 0.273 0.029 9.50 0.000 ***
***p < 0.001, **p < 0.01,*p < 0.05
RQ1: More Languages More Bugs
15
Number of languages has a
significant impact on increasing
bug proneness.
Findings
16
RQ2:
Are some languages more bug prone when
they are used with other languages?
17
Coeff. Std. Error z-val Pr(> |z|)
(Intercept) 5.208 0.074 70.864 < 2e-16 ***
age 3.091e-04 3.828e-05 8.075 6.77e-16 ***
size 4.143e-07 1.462e-07 2.833 0.005 **
developers 7.672e-04 1.068e-04 7.181 6.92e-13 ***
commits 6:077e-05 5.286e-06 11.498 < 2e-16 ***
C++S -0.550 0.275 -2.005 0.045 *
C++M 0.609 0.138 4.403 0.000 ***
JavaS -0.568 0.164 -3.473 0.001 ***
JavaM 0.530 0.192 2.767 0.006 **
TypeScriptS -3.664 0.915 -4.005 0.000 ***
TypeScriptM 0.542 0.131 4.139 0.000 ***
***p < 0.001, **p < 0.01, *p < 0.05; S=Single-language, M=Multi-language
RQ2: Multi-Language Setting
18
Six languages C++, Objective-C,
Java, TypeScript, Clojure, and Scala
are more defect prone when they are
used with other languages.
RQ2: Multi-Language Setting
Findings
19
RQ3:
Does the correlation between number of
languages and bug proneness exist for
various bug categories?
RQ3: Bug Categories
20
Memory Bugs Concurrency Bugs
Coeff. Std. Error Sig. Coeff. Std. Error Sig.
(Intercept) 0.297 0.148 * 0.265 0.160
age 4.078e-04 6.230e-05 *** 6.296e-05 6.750e-05
size 8.519e-07 2.107e-07 *** 1.978e-07 2.283e-07
developers 4.875e-04 1.782e-04 ** 4.272e-04 1.899e-04 *
commits 4.017e-05 8.335e-06 *** 5.308e-05 8.942e-06 ***
num-langs 0.421 0.047 *** 0.396 0.050 ***
***p < 0.001, **p < 0.01,*p < 0.05
21
Number of languages has a
positive and significant impact on
the number of bug fixing commits
for all categories of bugs – highest
for memory, concurrency and
algorithm bugs.
RQ3: Bug Categories
Threats to Validity
22
• Construct Validity
• Bug fix commits as a proxy of bug proneness
instead of issue reports.
• Semi-automated way to classify bug into its
cause & impact categories.
• Internal Validity
• Use of CLOC tool to compute lines of code.
• Don’t consider complexity of applications.
• External Validity
• Only projects on GitHub.
Conclusion
23
• Increasing the number of languages to implement a
project, significantly increases bug proneness.
• Six languages C++, Objective-C, Java,
TypeScript, Clojure, and Scala are significantly
defect prone when used with other languages.
• The impact of the number of languages to cause
higher bug proneness is significantly observed in all
bug categories.
• Higher impact in memory, concurrency and
algorithm categories.
Future Work
• Identify language pairs that are less
compatible and more bug prone.
• Identify the types of bugs that affect multi-
language programs.
• Expand the study beyond 17 programming
languages.
24
Thank you!
Questions? Comments? Advice?
{kochharps.2012,dwijedasa,davidlo}
@smu.edu.sg
Appendix (Basic Statistics)
26
Other Languages Used with each Primary
Language
Primary
Language
Languages Used with the Primary Language
C Java, C++, Ruby, PHP, Python, Javascript, Perl, Objective-C, TypeScript
C++ Javascript, C, Java, Objective-C, Python, Ruby, CoffeeScript, PHP, Perl, Haskell,
C#, TypeScript
C# C++, C, Javascript, Ruby, Perl, Python, Objective-C, Java
Objective-C Python, Javascript, C, Ruby
Go C++, TypeScript, Javascript, Ruby, C, Python, Perl, Java, PHP, Objective-C
Appendix
27
Project Details
Primary
Language
#Projects #Associated
Languages
#Developers SLOC
(KLOC)
Period
C 104 9 25,203 17,350 4/1999 to 10/2015
C++ 94 12 6,855 14,179 7/2000 to 10/2015
C# 46 8 4,818 8,848 6/2001 to 10/2015
Objective-C 38 4 2,319 378 7/2008 to 10/2015
Go 41 10 6,297 4,013 3/2008 to 10/2015
Java 84 10 5,456 5,004 3/2006 to 10/2015
CoffeeScript 44 3 3,930 564 12/2009 to 10/2015
JavaScript 245 13 10,536 3,596 3/2006 to 10/2015
TypeScript 42 8 4,941 16,792 5/2002 to 10/2015
Ruby 87 8 24,184 2,991 1/1998 to 10/2015
Php 59 4 12,145 2,959 4/2003 to 10/2015
Python 152 5 11,397 1,864 7/2005 to 10/2015
Perl 38 5 1,482 297 1/2004 to 10/2015
Clojure 46 5 2,171 432 4/2008 to 10/2015
Erlang 36 7 2,429 2,670 5/2001 to 10/2015
Haskell 37 5 4,158 1,053 4/2004 to 10/2015
Scala 42 8 5,698 2,319 2/2003 to 10/2015
Summary 628 134,019 85,309 1/1998 to 10/2015
Appendix
28
Evolution History Details
Primary
Language
Total Commits BugFix Commits
#Commits Code Churn
(KLOC)
Bug Fixes Code Churn
(KLOC)
C 780,153 229,411 258,822 42,736
C++ 284,331 250,238 112,591 63,881
C# 235,048 138,719 69,974 15,786
Objective-C 30,048 16,617 7,561 1,031
Go 117,282 33,134 32,371 4,486
Java 156,153 105,063 58,922 21,412
CoffeeScript 80,613 25,016 14,267 2,630
JavaScript 123,500 45,256 33,498 6,254
TypeScript 188,698 167,050 51,970 18,531
Ruby 337,457 46,014 67,619 6,577
Php 218,751 77,268 73,700 10,221
Python 199,974 37,691 59,670 6,332
Perl 38,231 7,944 7,649 418
Clojure 41,588 5,650 6,994 511
Erlang 77,411 24,425 15,579 1,977
Haskell 128,149 21,834 24,558 4,612
Scala 133,754 70,807 31,044 24,287
Summary 3,171,141 1,302,137 926,789 231,682
Appendix
29
Other Languages Used with each Primary
Language
Primary Language Languages Used with the Primary Language
C Java, C++, Ruby, PHP, Python, Javascript, Perl, Objective-C, TypeScript
C++ Javascript, C, Java, Objective-C, Python, Ruby, CoffeeScript, PHP, Perl, Haskell, C#,
TypeScript
C# C++, C, Javascript, Ruby, Perl, Python, Objective-C, Java
Objective-C Python, Javascript, C, Ruby
Go C++, TypeScript, Javascript, Ruby, C, Python, Perl, Java, PHP, Objective-C
Java C++, C, Python, Javascript, Clojure, Perl, Ruby, Objective-C, C#, Scala
CoffeeScript Javascript, Ruby, Python
JavaScript C#, PHP, C++, Clojure, CoffeeScript, Ruby, Erlang, Python, Haskell, Java, Go, TypeScript,
Scala
TypeScript C++, Java, Python, Javascript, CoffeeScript, C, Perl, PHP
Ruby CoffeeScript, Javascript, Python, C++, C, Java, Perl, PHP
Php Javascript, Python, Perl, C
Python Javascript, C, C++, CoffeeScript, PHP
Perl Javascript, Java, Python, C++, C
Clojure Javascript, Java, C, Objective-C, Ruby
Erlang C, Python, Javascript, PHP, C++, Java, Clojure
Haskell C, JavaScript, Python, Perl, Java
Scala Java, Ruby, Javascript, Perl, C#, Python, CoffeeScript, Clojure

More Related Content

Similar to A Large Scale Study of Multiple Programming Languages and Code Quality

Sahil_vig_Developer
Sahil_vig_DeveloperSahil_vig_Developer
Sahil_vig_Developer
Sahil Vig
 
CV_NgoQuocVuong
CV_NgoQuocVuongCV_NgoQuocVuong
CV_NgoQuocVuong
Vuong Ngo
 
gngillis_std_20160818
gngillis_std_20160818gngillis_std_20160818
gngillis_std_20160818
Greg Gillis
 
gngillis_std_20160818
gngillis_std_20160818gngillis_std_20160818
gngillis_std_20160818
Greg Gillis
 
CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126
Jonathan Francisco
 

Similar to A Large Scale Study of Multiple Programming Languages and Code Quality (20)

week1.ppt
week1.pptweek1.ppt
week1.ppt
 
Bikram kishor rout
Bikram kishor routBikram kishor rout
Bikram kishor rout
 
Bikram kishor rout
Bikram kishor routBikram kishor rout
Bikram kishor rout
 
Neha_Maggu
Neha_MagguNeha_Maggu
Neha_Maggu
 
What are your Programming Language's Energy-Delay Implications?
What are your Programming Language's Energy-Delay Implications?What are your Programming Language's Energy-Delay Implications?
What are your Programming Language's Energy-Delay Implications?
 
An overview of automated test suites and defect density in Android
An overview of automated test suites and defect density in AndroidAn overview of automated test suites and defect density in Android
An overview of automated test suites and defect density in Android
 
Sundeep 2 years
Sundeep 2 yearsSundeep 2 years
Sundeep 2 years
 
Sahil_vig_Developer
Sahil_vig_DeveloperSahil_vig_Developer
Sahil_vig_Developer
 
CV_NgoQuocVuong
CV_NgoQuocVuongCV_NgoQuocVuong
CV_NgoQuocVuong
 
Chandra_CV 3 8Yr Exp
Chandra_CV 3 8Yr Exp Chandra_CV 3 8Yr Exp
Chandra_CV 3 8Yr Exp
 
gngillis_std_20160818
gngillis_std_20160818gngillis_std_20160818
gngillis_std_20160818
 
gngillis_std_20160818
gngillis_std_20160818gngillis_std_20160818
gngillis_std_20160818
 
Rashi jain resume test engineer
Rashi jain resume test engineerRashi jain resume test engineer
Rashi jain resume test engineer
 
Hello World - Introduction to coding.pptx
Hello World - Introduction to coding.pptxHello World - Introduction to coding.pptx
Hello World - Introduction to coding.pptx
 
Morphis Technologies Overview
Morphis Technologies OverviewMorphis Technologies Overview
Morphis Technologies Overview
 
CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126CV_FranciscoJonathanG_20160126
CV_FranciscoJonathanG_20160126
 
Enhancing Developer Productivity with Code Forensics
Enhancing Developer Productivity with Code ForensicsEnhancing Developer Productivity with Code Forensics
Enhancing Developer Productivity with Code Forensics
 
Improving software economics
Improving software economicsImproving software economics
Improving software economics
 
Mkkailashbio
MkkailashbioMkkailashbio
Mkkailashbio
 
why to do BCA course?
why to do BCA course?why to do BCA course?
why to do BCA course?
 

More from Pavneet Singh Kochhar

Revisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsRevisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub Projects
Pavneet Singh Kochhar
 
Understanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App DevelopersUnderstanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App Developers
Pavneet Singh Kochhar
 

More from Pavneet Singh Kochhar (12)

Mining Testing Questions on Stack Overflow
Mining Testing Questions on Stack OverflowMining Testing Questions on Stack Overflow
Mining Testing Questions on Stack Overflow
 
Cataloging GitHub Repositories
Cataloging GitHub RepositoriesCataloging GitHub Repositories
Cataloging GitHub Repositories
 
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
 
Revisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsRevisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub Projects
 
Practitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault LocalizationPractitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault Localization
 
Understanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App DevelopersUnderstanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App Developers
 
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source Projects
 
Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?
 
It’s Not a Bug, It’s a Feature: Does Misclassification Affect Bug Localization?
It’s Not a Bug, It’s a Feature:Does Misclassification Affect Bug Localization?It’s Not a Bug, It’s a Feature:Does Misclassification Affect Bug Localization?
It’s Not a Bug, It’s a Feature: Does Misclassification Affect Bug Localization?
 
Automatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationAutomatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report Reclassification
 
An Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source ProjectsAn Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source Projects
 

Recently uploaded

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 

Recently uploaded (20)

Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 

A Large Scale Study of Multiple Programming Languages and Code Quality

  • 1. A Large Scale Study of Multiple Programming Languages and Code Quality Pavneet S. Kochhar, Dinusha Wijedasa, and David Lo Singapore Management University {kochharps.2012,dwijedasa,davidlo}@smu.edu.sg International Conference on Software Analysis, Evolution and Reengineering (SANER 2016)
  • 2. Motivation • Over 700 languages1 available • Developers often use multiple languages – To improve performance, productivity and agility – Linux OS kernel mainly written in C Utilities & app. written in C++, Perl, Python • Concerns – Increase in complexity – Proper interfaces between languages 1. https://en.wikipedia.org/wiki/List_of_programming_languages 2
  • 3. Study Goal 3 What is the impact of the usage of multiple languages on bug proneness?
  • 4. Outline • Study Goal • Empirical Study Methodology – Data Collection and Basic Statistics – Categorizing Bugs – Statistical Methods • Research Questions • Findings • Conclusion and Future Work 4
  • 5. Data Collection • We collect projects hosted on • Most popular projects in 17 languages – C, C++, C#, Objective-C, Go, Java, JavaScript, CoffeeScript, TypeScript, Ruby, Php, Python, Perl, Clojure, Erlang, Haskell, Scala • In total, we have 628 popular projects – Linux, Redis, Php-src, Elasticsearch – 85M SLOC, 134K developers, 3.17M commits 5
  • 6. Basic Statistics 6 Distribution of GitHub Projects w.r.t Number of Language Usage Language Usage Number of Projects % of Projects Written in a single language 297 47.29% Written in multiple languages 331 52.71% Total – 628 projects
  • 7. Basic Statistics 7 Project Details Primary Language Number of Projects Number of Associated Languages Number of Developers SLOC (KLOC) Period C 104 9 25,203 17,350 4/1999 - 10/2015 C++ 94 12 6,855 14,179 7/2000 - 10/2015 C# 46 8 4,818 8,848 6/2001 - 10/2015 Objective-C 38 4 2,319 378 7/2008 - 10/2015 Go 41 10 6,297 4,013 3/2008 - 10/2015
  • 8. Basic Statistics 8 Evolution History Details Primary Language Total Commits BugFix Commits Number of Commits Code Churn (KLOC) Bug Fixes Code Churn (KLOC) C 780,153 229,411 258,822 42,736 C++ 284,331 250,238 112,591 63,881 C# 235,048 138,719 69,974 15,786 Objective-C 30,048 16,617 7,561 1,031 Go 117,282 33,134 32,371 4,486
  • 9. Categorizing Bugs • Categorize each bug fix commit – cause and impact • Keyword matching – 10% of bug fix commits (926,789) – Semi-automated ways (Keywords & manual) • Supervised classification – To classify remaining 90% of the commits – Randomly selected 270 commits & manually classified to check precision & recall. 9
  • 10. Statistical Method • Negative Binomial Regression (NBR) • To handle multi-collinearity, we compute variance inflation factors (VIFs) • Dependent variable - Number of bug fix commits • Control variables - age, size (LOC), number of developers and total number of commits. 10
  • 11. Outline • Study Goal • Empirical Study Methodology –Data Collection and Basic Statistics –Categorizing Bugs –Statistical Methods • Research Questions • Findings • Conclusion and Future Work 11
  • 12. Research Questions 12 RQ1: Does the use of more languages correlate with higher bug proneness? RQ2: Are some languages more bug prone when they are used with other languages? RQ3: What are the relationships between multi- language usage and bugs of various categories?
  • 13. Findings 13 RQ1: Does the use of more languages correlate with higher bug proneness?
  • 14. RQ1: More Languages More Bugs 14 Coeff. Std. Error z-val Pr(> |z|) (Intercept) 4.718 0.089 53.19 0.000 *** age 2.939e-04 3.794e-05 7.75 0.000 *** size 3.390e-07 1.310e-07 2.59 0.010 ** developers 9.807e-04 1.107e-04 8.86 0.000 *** commits 6.048e-05 5.177e-06 11.68 0.000 *** num-langs 0.273 0.029 9.50 0.000 *** ***p < 0.001, **p < 0.01,*p < 0.05
  • 15. RQ1: More Languages More Bugs 15 Number of languages has a significant impact on increasing bug proneness.
  • 16. Findings 16 RQ2: Are some languages more bug prone when they are used with other languages?
  • 17. 17 Coeff. Std. Error z-val Pr(> |z|) (Intercept) 5.208 0.074 70.864 < 2e-16 *** age 3.091e-04 3.828e-05 8.075 6.77e-16 *** size 4.143e-07 1.462e-07 2.833 0.005 ** developers 7.672e-04 1.068e-04 7.181 6.92e-13 *** commits 6:077e-05 5.286e-06 11.498 < 2e-16 *** C++S -0.550 0.275 -2.005 0.045 * C++M 0.609 0.138 4.403 0.000 *** JavaS -0.568 0.164 -3.473 0.001 *** JavaM 0.530 0.192 2.767 0.006 ** TypeScriptS -3.664 0.915 -4.005 0.000 *** TypeScriptM 0.542 0.131 4.139 0.000 *** ***p < 0.001, **p < 0.01, *p < 0.05; S=Single-language, M=Multi-language RQ2: Multi-Language Setting
  • 18. 18 Six languages C++, Objective-C, Java, TypeScript, Clojure, and Scala are more defect prone when they are used with other languages. RQ2: Multi-Language Setting
  • 19. Findings 19 RQ3: Does the correlation between number of languages and bug proneness exist for various bug categories?
  • 20. RQ3: Bug Categories 20 Memory Bugs Concurrency Bugs Coeff. Std. Error Sig. Coeff. Std. Error Sig. (Intercept) 0.297 0.148 * 0.265 0.160 age 4.078e-04 6.230e-05 *** 6.296e-05 6.750e-05 size 8.519e-07 2.107e-07 *** 1.978e-07 2.283e-07 developers 4.875e-04 1.782e-04 ** 4.272e-04 1.899e-04 * commits 4.017e-05 8.335e-06 *** 5.308e-05 8.942e-06 *** num-langs 0.421 0.047 *** 0.396 0.050 *** ***p < 0.001, **p < 0.01,*p < 0.05
  • 21. 21 Number of languages has a positive and significant impact on the number of bug fixing commits for all categories of bugs – highest for memory, concurrency and algorithm bugs. RQ3: Bug Categories
  • 22. Threats to Validity 22 • Construct Validity • Bug fix commits as a proxy of bug proneness instead of issue reports. • Semi-automated way to classify bug into its cause & impact categories. • Internal Validity • Use of CLOC tool to compute lines of code. • Don’t consider complexity of applications. • External Validity • Only projects on GitHub.
  • 23. Conclusion 23 • Increasing the number of languages to implement a project, significantly increases bug proneness. • Six languages C++, Objective-C, Java, TypeScript, Clojure, and Scala are significantly defect prone when used with other languages. • The impact of the number of languages to cause higher bug proneness is significantly observed in all bug categories. • Higher impact in memory, concurrency and algorithm categories.
  • 24. Future Work • Identify language pairs that are less compatible and more bug prone. • Identify the types of bugs that affect multi- language programs. • Expand the study beyond 17 programming languages. 24
  • 25. Thank you! Questions? Comments? Advice? {kochharps.2012,dwijedasa,davidlo} @smu.edu.sg
  • 26. Appendix (Basic Statistics) 26 Other Languages Used with each Primary Language Primary Language Languages Used with the Primary Language C Java, C++, Ruby, PHP, Python, Javascript, Perl, Objective-C, TypeScript C++ Javascript, C, Java, Objective-C, Python, Ruby, CoffeeScript, PHP, Perl, Haskell, C#, TypeScript C# C++, C, Javascript, Ruby, Perl, Python, Objective-C, Java Objective-C Python, Javascript, C, Ruby Go C++, TypeScript, Javascript, Ruby, C, Python, Perl, Java, PHP, Objective-C
  • 27. Appendix 27 Project Details Primary Language #Projects #Associated Languages #Developers SLOC (KLOC) Period C 104 9 25,203 17,350 4/1999 to 10/2015 C++ 94 12 6,855 14,179 7/2000 to 10/2015 C# 46 8 4,818 8,848 6/2001 to 10/2015 Objective-C 38 4 2,319 378 7/2008 to 10/2015 Go 41 10 6,297 4,013 3/2008 to 10/2015 Java 84 10 5,456 5,004 3/2006 to 10/2015 CoffeeScript 44 3 3,930 564 12/2009 to 10/2015 JavaScript 245 13 10,536 3,596 3/2006 to 10/2015 TypeScript 42 8 4,941 16,792 5/2002 to 10/2015 Ruby 87 8 24,184 2,991 1/1998 to 10/2015 Php 59 4 12,145 2,959 4/2003 to 10/2015 Python 152 5 11,397 1,864 7/2005 to 10/2015 Perl 38 5 1,482 297 1/2004 to 10/2015 Clojure 46 5 2,171 432 4/2008 to 10/2015 Erlang 36 7 2,429 2,670 5/2001 to 10/2015 Haskell 37 5 4,158 1,053 4/2004 to 10/2015 Scala 42 8 5,698 2,319 2/2003 to 10/2015 Summary 628 134,019 85,309 1/1998 to 10/2015
  • 28. Appendix 28 Evolution History Details Primary Language Total Commits BugFix Commits #Commits Code Churn (KLOC) Bug Fixes Code Churn (KLOC) C 780,153 229,411 258,822 42,736 C++ 284,331 250,238 112,591 63,881 C# 235,048 138,719 69,974 15,786 Objective-C 30,048 16,617 7,561 1,031 Go 117,282 33,134 32,371 4,486 Java 156,153 105,063 58,922 21,412 CoffeeScript 80,613 25,016 14,267 2,630 JavaScript 123,500 45,256 33,498 6,254 TypeScript 188,698 167,050 51,970 18,531 Ruby 337,457 46,014 67,619 6,577 Php 218,751 77,268 73,700 10,221 Python 199,974 37,691 59,670 6,332 Perl 38,231 7,944 7,649 418 Clojure 41,588 5,650 6,994 511 Erlang 77,411 24,425 15,579 1,977 Haskell 128,149 21,834 24,558 4,612 Scala 133,754 70,807 31,044 24,287 Summary 3,171,141 1,302,137 926,789 231,682
  • 29. Appendix 29 Other Languages Used with each Primary Language Primary Language Languages Used with the Primary Language C Java, C++, Ruby, PHP, Python, Javascript, Perl, Objective-C, TypeScript C++ Javascript, C, Java, Objective-C, Python, Ruby, CoffeeScript, PHP, Perl, Haskell, C#, TypeScript C# C++, C, Javascript, Ruby, Perl, Python, Objective-C, Java Objective-C Python, Javascript, C, Ruby Go C++, TypeScript, Javascript, Ruby, C, Python, Perl, Java, PHP, Objective-C Java C++, C, Python, Javascript, Clojure, Perl, Ruby, Objective-C, C#, Scala CoffeeScript Javascript, Ruby, Python JavaScript C#, PHP, C++, Clojure, CoffeeScript, Ruby, Erlang, Python, Haskell, Java, Go, TypeScript, Scala TypeScript C++, Java, Python, Javascript, CoffeeScript, C, Perl, PHP Ruby CoffeeScript, Javascript, Python, C++, C, Java, Perl, PHP Php Javascript, Python, Perl, C Python Javascript, C, C++, CoffeeScript, PHP Perl Javascript, Java, Python, C++, C Clojure Javascript, Java, C, Objective-C, Ruby Erlang C, Python, Javascript, PHP, C++, Java, Clojure Haskell C, JavaScript, Python, Perl, Java Scala Java, Ruby, Javascript, Perl, C#, Python, CoffeeScript, Clojure