Background: The International Software Benchmarking Standards Group (ISBSG) dataset makes it possible to estimate a project’s size, effort, duration, and cost.
Aim: The aim was to analyze the ISBSG variables that have been used by researchers for software effort estimation from 2000, when the first papers were published, until the end of 2013.
Method: A systematic mapping review was applied to over 167 papers obtained after the filtering process. From these, it was found that 133 papers produce effort estimation and only 107 list the independent variables used in the effort estimation models.
Results: Seventy-one out of 118 ISBSG variables have been used at least once. There is a group of 20 variables that appear in more than 50% of the papers and include Functional Size (62%), Development Type (58%), Language Type (53%), and Development Platform (52%) following ISBSG recommendations. Sizing and Size attributes altogether represent the most relevant group along with Project attributes that includes 24 technical features of the project and the development platform. All in all, variables that have more missing values are used less frequently.
Conclusions: This work presents a snapshot of the existing usage of ISBSG variables in software development estimation. Moreover, some insights are provided to guide future studies.
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
166 - ISBSG variables most frequently used for software effort estimation: A mapping review
1. ISBSG variables most frequently used for software effort estimation: A mapping review
Fernando González-Ladrón-de-Guevara
Marta Fernández-Diego
2. Introduction
•The International Software Benchmarking Standards Group (ISBSG) is a non-profit organization that designed and maintains an international public repository.
•This dataset makes it possible to estimate a project’s size, effort, duration, and cost.
•It is important that ISBSG users have a sound knowledge of the ISBSG data prior to analyzing or using it.
2
3. Aim
•Analyze the ISBSG variables that have been used by researchers for software effort estimation
–from 2000, when the first papers were published
–until the end of 2013
3
4. Research questions
•RQ1: What are the most used variables to generate effort estimation models?
•RQ2: What are the most relevant features of these variables?
4
7. Data collection
•A systematic mapping study was applied to over 167 papers obtained after the filtering process.
•133 out of 167 papers produce effort estimates.
•Only 107 list the independent variables used in the model.
7
8. RQ1
•RQ1: What are the most used variables to generate effort estimation models?
–ISBSG includes 118 variables but only 71 (60.2%) of them have been used in the set of papers analyzed.
8
11. RQ1
•RQ1: What are the most used variables to generate effort estimation models?
–All ISBSG variables form part of a group of variables that include related data fields according to the ISBSG criteria.
11
13. RQ2
•RQ2: What are the most relevant features of these variables?
–This work has also synthesized and described the most used ISBSG variables, the concept they represent and their relationships.
–Also this paper discusses the nature of the variables and has placed particular emphasis on their properties, specially the issue of missing data.
13
15. RQ2 (nominal variables)
•RQ2: What are the most relevant features of these variables?
–The more missing values a variable has, the less used this variable is when compared to the rest of variables within its own group.
–Usually LT is most used than PPL except for cases where more information about the specific used programming language is required.
–The variables OT, AT, and BAT present many different discrete values.
15
17. RQ2 (continuous variables)
•RQ2: What are the most relevant features of these variables?
–FS is preferred to AFP.
–The group Size attributes includes five fields that breakdown the FS into inputs, outputs, enquiries, files, and interfaces. The percentage of missing values is around 65% in all cases.
–The reason ATS may not be used more is because it has a high number of missing values (88%).
17
18. Results
•A collection of 107 selected references
•A matrix that shows a mapping of the 20 top- ranked most used variables in the estimation models of such references
•The 71 ISBSG variables that have been used to construct effort estimation models
•The 20 most used variables have been described arranged by groups along with their relationships and some underlying dependencies
18
19. Conclusions
•This work presents the results of a systematic mapping study about the usage of ISBSG variables until 2013.
•The analysis is a first approximation to how and to what extent ISBSG variables and groups of variables have been used in software engineering to build effort estimation models.
•New RQ: How the level of usage of the most frequent variables is influenced by the type of estimation methods that have been used in the papers?
–The paper under review even suggests a prospective guide for selecting the variables to be used in effort estimation models.
19