This document analyzes different techniques for aggregating software metrics at various levels of a system. It finds that traditional techniques like mean, median and sum generally agree with econometric inequality indices like Gini, Theil and Hoover, though Kolm does not always correlate. The nature of the relationship between techniques varies depending on the sensitivity and how deviations from uniformity are treated. The paper concludes different techniques may be better suited to specific cases and research questions.
Metrics - You can't control the unfamiliarICSM 2011
Paper: You Can't Control the Unfamiliar: A Study on the Relations Between Aggregation Techniques for Software Metrics
Authors: Bogdan Vasilescu, Alexander Serebrenik and Mark Van Den Brand
Session: Research Track 11 - Metrics
Paper:
Vasilescu B, Serebrenik A and van den Brand MGJ (2011), "You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics", In Proceedings of the 27th IEEE International Conference on Software Maintenance, pp. 313-322. IEEE.
EnTagRec: An Enhanced Tag Recommendation System for Software Information SitesAlexander Serebrenik
Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.
To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode.
We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.
Software engineering is inherently a collaborative venture, involving many stakeholders that coordinate their efforts to produce large software systems. While importance of human aspects in software engineering has been recognised already in the 1970s, emergence of open source software (late 1990s) and platforms such as Stack Overflow and GitHub (late 2000s) enabled application of empirical methods to study of human aspects of software engineering.
In the first part of the talk we present a selection of recent results pertaining to two main
questions: who are the software developers and in what kind of activities they engage. The second part of the talk focuses on tools and techniques that have been used to obtain
the aforementioned results.
Metrics - You can't control the unfamiliarICSM 2011
Paper: You Can't Control the Unfamiliar: A Study on the Relations Between Aggregation Techniques for Software Metrics
Authors: Bogdan Vasilescu, Alexander Serebrenik and Mark Van Den Brand
Session: Research Track 11 - Metrics
Paper:
Vasilescu B, Serebrenik A and van den Brand MGJ (2011), "You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics", In Proceedings of the 27th IEEE International Conference on Software Maintenance, pp. 313-322. IEEE.
EnTagRec: An Enhanced Tag Recommendation System for Software Information SitesAlexander Serebrenik
Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged.
To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode.
We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.
Software engineering is inherently a collaborative venture, involving many stakeholders that coordinate their efforts to produce large software systems. While importance of human aspects in software engineering has been recognised already in the 1970s, emergence of open source software (late 1990s) and platforms such as Stack Overflow and GitHub (late 2000s) enabled application of empirical methods to study of human aspects of software engineering.
In the first part of the talk we present a selection of recent results pertaining to two main
questions: who are the software developers and in what kind of activities they engage. The second part of the talk focuses on tools and techniques that have been used to obtain
the aforementioned results.
Vier jongeren uit Nederland gaan een week meelopen met leden van de Westboro Baptist Church in Topeka, Kansas om een documentaire te maken. De Westboro Baptist Church staat bekend als de meest gehate familie van Amerika. In de documentaire volgen wij de woordvoorster van de kerk: Shirley Phelps en laten haar haar verhaal doen, ook volgen wij de vier jongeren die hun verhaal vertellen over hoe zei zich voelen en wat ze doen.
Deze presentatie is 23 januari gehouden, en vertelt de plan van aanpak.
Software evolution research is a thriving area of software engineering research. Recent years have seen a growing interest in variety of evolution topics, as witnessed by the growing number of publications dedicated to the subject. Without attempting to be complete, in this talk we provide an overview of emerging trends in software evolution research, such as extension of the traditional boundaries of software, growing attention for social and socio-technical aspects of software development processes, and interdisciplinary research applying research techniques from other research areas to study software evolution, and software evolution research techniques to other research areas. As a large body of software evolution research is empirical in nature, we are confronted by important challenges pertaining to reproducibility of the research, and its generalizability.
Stop by Riverbend Market tomorrow for the arts and crafts expo. We have hand knitted hats and gloves by Sue Carlson, childrens books by Agi Trifontaine, maple drying hangers by Rachel Steger and many other great gifts for the holidays!
An overview on data mining designed for imbalanced datasetseSAT Journals
Abstract The imbalanced datasets with the classifying categories are not around equally characterized. A problem in imbalanced dataset occurs in categorization, where the amount of illustration of single class will be greatly lesser than the illustrations of the previous classes. Current existence brought improved awareness during implementation of machine learning methods to complex real world exertion, which is considered by several through imbalanced data. In machine learning the imbalanced datasets has become a critical problem and also usually found in many implementation such as detection of fraudulent calls, bio-medical, engineering, remote-sensing, computer society and manufacturing industries. In order to overcome the problems several approaches have been proposed. In this paper a study on Imbalanced dataset problem and examine various sampling method utilized in favour of evaluation of the datasets, moreover the interpretation methods are further suitable for imbalanced datasets mining. Keywords: Imbalance Problems, Imbalanced datasets, sampling strategies, Machine Learning.
Vier jongeren uit Nederland gaan een week meelopen met leden van de Westboro Baptist Church in Topeka, Kansas om een documentaire te maken. De Westboro Baptist Church staat bekend als de meest gehate familie van Amerika. In de documentaire volgen wij de woordvoorster van de kerk: Shirley Phelps en laten haar haar verhaal doen, ook volgen wij de vier jongeren die hun verhaal vertellen over hoe zei zich voelen en wat ze doen.
Deze presentatie is 23 januari gehouden, en vertelt de plan van aanpak.
Software evolution research is a thriving area of software engineering research. Recent years have seen a growing interest in variety of evolution topics, as witnessed by the growing number of publications dedicated to the subject. Without attempting to be complete, in this talk we provide an overview of emerging trends in software evolution research, such as extension of the traditional boundaries of software, growing attention for social and socio-technical aspects of software development processes, and interdisciplinary research applying research techniques from other research areas to study software evolution, and software evolution research techniques to other research areas. As a large body of software evolution research is empirical in nature, we are confronted by important challenges pertaining to reproducibility of the research, and its generalizability.
Stop by Riverbend Market tomorrow for the arts and crafts expo. We have hand knitted hats and gloves by Sue Carlson, childrens books by Agi Trifontaine, maple drying hangers by Rachel Steger and many other great gifts for the holidays!
An overview on data mining designed for imbalanced datasetseSAT Journals
Abstract The imbalanced datasets with the classifying categories are not around equally characterized. A problem in imbalanced dataset occurs in categorization, where the amount of illustration of single class will be greatly lesser than the illustrations of the previous classes. Current existence brought improved awareness during implementation of machine learning methods to complex real world exertion, which is considered by several through imbalanced data. In machine learning the imbalanced datasets has become a critical problem and also usually found in many implementation such as detection of fraudulent calls, bio-medical, engineering, remote-sensing, computer society and manufacturing industries. In order to overcome the problems several approaches have been proposed. In this paper a study on Imbalanced dataset problem and examine various sampling method utilized in favour of evaluation of the datasets, moreover the interpretation methods are further suitable for imbalanced datasets mining. Keywords: Imbalance Problems, Imbalanced datasets, sampling strategies, Machine Learning.
Presentation given at the Workshop on Recommendation Utility Evaluation: Beyond RMSE in conjunction with the conference on recommender systems (ACM) on September 9, 2012
Towards Continuous Performance Assessment of Java Applications With PerfBotAlexander Serebrenik
Bots for continuous performance assessment are gaining use as a productivity tool. We discuss how and why open source projects use them and present an in-depth case study of the Nanosoldier bot used by the team behind the Julia programming language. Based on analysing the history of bot usage and interviews with developers we identify lack of a shared platform for performance measurement as an obstacle to wider adoption of performance measurement bots. To address this, we propose a prototype implementation of such a platform called PerfBot.
Joint work with Florian Markusse and Philipp Leitner, presented at 5th International Workshop on
Bots in Software Engineering, collocated with ICSE 2023, Melbourne Australia.
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...Alexander Serebrenik
The intersection of ageism and sexism can create a hostile environment for veteran software developers belonging to marginalized genders. In this study, we conducted 14 interviews to examine the experiences of people at this intersection, primarily women, in order to discover the strategies they employed in order to successfully remain in the field. We identified 283 codes, which fell into three main categories: Strategies, Experiences, and Perception. Several strategies we identified, such as (Deliberately) Not Trying to Look Younger, were not previously described in the software engineering literature. We found that, in some companies, older women developers are recognized as having particular value, further strengthening the known benefits of diversity in the workforce. Based on the experiences and strategies, we suggest organizations employing software developers to consider the benefits of hiring veteran women software developers. For example, companies can draw upon the life experiences of older women developers in order to better understand the needs of customers from a similar demographic. While we recognize that many of the strategies employed by our study participants are a response to systemic issues, we still consider that, in the short-term, there is benefit in describing these strategies for developers who are experiencing such issues today.
This paper is a joint work with Sterre van Breukelen, Ann Barcomb and Sebastian Baltes
Preprint https://arxiv.org/abs/2302.03723
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...Alexander Serebrenik
Many software developers started to work from home on a short notice during the early periods of COVID-19. A number of previous papers have studied the wellbeing and productivity of software developers during COVID-19. The studies mainly use surveys based on predefined questionnaires. In this paper, we investigate the problems and joys that software developers experienced during the early months of COVID-19 by analyzing their discussions in online forum devRant, where discussions can be open and not bound by predefined survey questionnaires. The devRant platform is designed for developers to share their joys and frustrations of life. We manually analyze 825 devRant posts between January and April 12, 2020 that developers created to discuss their situation during COVID19. WHO declared COVID-19 as pandemic on March 11, 2020. As such, our data offers us insights in the early months of COVID-19. We manually label each post along two dimensions: the topics of the discussion and the expressed sentiment polarity (positive, negative, neutral). We observed 19 topics that we group into six categories: Workplace & Professional aspects, Personal & Family well-being, Technical Aspects, Lockdown preparedness, Financial concerns, and Societal and Educational concerns. Around 49% of the discussions are negative and 26% are positive. We find evidence of developers’ struggles with lack of documentation to work remotely and with their loneliness while working from home. We find stories of their job loss with little or no savings to fallback to. The analysis of developer discussions in the early months of a pandemic will help various stakeholders (e.g., software companies) make important decision early to alleviate developer problems if such a pandemic or similar emergency situation occurs in near future. Software engineering research can make further efforts to develop automated tools for remote work (e.g., automated documentation).
Empirical Software Engineering 27(5): 117 (2022), presented at ICSE 2023 as part of the Journal First program.
Software developers are known to experience a wide range of emotions while performing development tasks. Emotions expressed in developer communication might reflect openness of the ecosystem to newcomers, presence of conflicts, problems in the software development process or source code itself. In this talk, based on a recent work with Nicole Novielli, I present an overview of the state-of-the-art research on analysis of emotions in software engineering focusing on the studies of emotion in context of software ecosystems. To encourage further applications of emotion analysis in the industry and research we also discuss currently available emotion analysis tools and datasets as well as outline directions for future research.
This is a keynote talk given at the 11th International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems (SESoS 2023), collocated with ICSE 2023 in Melbourne, Australia.
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Alexander Serebrenik
Modern software development practices increasingly rely on third-party libraries due to the inherent benefits of reuse. However, libraries may contain security vulnerabilities that can propagate to the dependent applications. To counter this, maintainers of dependent projects should monitor their dependencies and security reports to ensure that only patched releases of the upstream applications are in use. As manual maintenance of dependencies has shown to be ineffective, several automated tools (aka bots) have been proposed to assist developers in rapidly identifying and resolving vulnerable dependencies.
In this work, we focus on Dependabot, a popular bot providing security and version updates, and study developers' receptivity to its security updates in engineered and actively maintained JavaScript projects. Moreover, we carry out a fine-grained analysis of the lifecycle of every vulnerability to manifest how they are dealt with in the presence of Dependabot.
Our findings show that the task of fixing vulnerable dependencies is, to a large extent, delegated to Dependabot and that developers merge the majority of security updates within several days. On the other hand, when developers do not merge a security update, they usually address the identified vulnerability manually. This approach, however, often takes up to several months which in turn could expose the projects to security issues.
This paper has won the ACM Distinguished paper award at MSR 2023.
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAlexander Serebrenik
Static analysis tools generate a large number of
alarms that require manual inspection. In prior work, repositioning of alarms is proposed to (1) merge multiple similar alarms
together and replace them by a fewer alarms, and (2) report
alarms as close as possible to the causes for their generation. The
premise is that the proposed merging and repositioning of alarms
will reduce the manual inspection effort. To evaluate the premise,
this paper presents an empirical study with 249 developers on
the proposed merging and repositioning of static alarms. The
study is conducted using static analysis alarms generated on C
programs, where the alarms are representative of the merging vs.
non-merging and repositioning vs. non-repositioning situations
in real-life code. Developers were asked to manually inspect and
determine whether assertions added corresponding to alarms in
C code hold. Additionally, two spatial cognitive tests are also
done to determine relationship in performance. The empirical
evaluation results indicate that, in contrast to expectations, there
was no evidence that merging and repositioning of alarms reduces
manual inspection effort or improves the inspection accuracy (at
times a negative impact was found). Results on cognitive abilities
correlated with comprehension and alarm inspection accuracy.
Static analysis tools help to detect common programming errors but generate a large number of false positives.
Moreover, when applied to evolving software systems, around
95% of alarms generated on a version are repeated, i.e., they
have also been generated on the previous version. Version-aware
static analysis techniques (VSATs) have been proposed to suppress
the repeated alarms that are not impacted by the code changes
between the two versions. The alarms reported by VSATs after
the suppression, called delta alarms, still constitute 63% of the
tool-generated alarms.
We observe that delta alarms can be further postprocessed
using their corresponding code changes: the code changes due
to which VSATs identify them as delta alarms. However, none
of the existing VSATs or alarms postprocessing techniques
postprocesses delta alarms using the corresponding code changes.
Based on this observation, we use the code changes to classify
delta alarms into six classes that have different priorities assigned
to them. The assignment of priorities is based on the type of
code changes and their likelihood of actually impacting the delta
alarms. The ranking of alarms, obtained by prioritizing the
classes, can help suppress alarms that are ranked lower, when
resources to inspect all the tool-generated alarms are limited.
We performed an empirical evaluation using 9789 alarms
generated on 59 versions of seven open source C applications.
The evaluation results indicate that the proposed classification
and ranking of delta alarms help to identify, on average, 53% of
delta alarms as more likely to be false positives than the others.
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsAlexander Serebrenik
Recently, the job market for Artificial Intelligence (AI) engineers
has exploded. Since the role of AI engineer is relatively new, limited
research has been done on the requirements as set by the industry.
Moreover, the definition of an AI engineer is less established than
for a data scientist or a software engineer. In this study we explore,
based on job ads, the requirements from the job market for the
position of AI engineer in The Netherlands. We retrieved job ad
data between April 2018 and April 2021 from a large job ad database,
Jobfeed from TextKernel. The job ads were selected with a process
similar to the selection of primary studies in a literature review. We
characterize the 367 resulting job ads based on meta-data such as
publication date, industry/sector, educational background and job
titles. To answer our research questions we have further coded 125
job ads manually.
The job tasks of AI engineers are concentrated in five categories:
business understanding, data engineering, modeling, software development and operations engineering. Companies ask for AI engineers with different profiles: 1) data science engineer with focus
on modeling, 2) AI software engineer with focus on software development, 3) generalist AI engineer with focus on both models
and software. Furthermore, we present the tools and technologies
mentioned in the selected job ads, and the soft skills.
Our research helps to understand the expectations companies
have for professionals building AI-enabled systems. Understanding
these expectations is crucial both for prospective AI engineers and
educational institutions in charge of training those prospective
engineers. Our research also helps to better define the profession of
AI engineering. We do this by proposing an extended AI engineering life-cycle that includes a business understanding phase.
Joint work with Marcel Meesters and Petra Heck.
Community smells are patterns indicating suboptimal organization and communication of software development teams that have been shown to be related to suboptimal organisation of the source code. Given a long standing association of women and communication mediation, we have conducted a series of studies relating gender diversity to community smells, as well as comparing the results of the data analysis with developers' perception. To get further insights in the relation bwteen gender and community smells, we replicate our study focusing on the Brazilian software teams; indeed, culture-specific expectations on the behavior of people of different genders might have affected the perception of the importance of gender diversity and refactoring strategies when mitigating community smells. Finally, we extend the prediction model by including variables related to national diversity and see how the interplay between national diversity and gender diversity influences presence of community smells.
This talk is based on a series of papers published in 2019-2022 and co-authored with Gemma Catolino, Filomena Ferrucci, Stefano Lambiase, Tiago Massoni, Fabio Palomba, Camila Sarmento, and Damian Andrew Tamburri.
Overview of a series of papers published in 2019-2021 on community smells, and their relation to code smells and gender, as well as resolution strategies.
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...Alexander Serebrenik
Women are underrepresented at all levels in computer science (CS) faculties of Dutch
universities. In this report we focus on experiences related to hiring and promoting women as assistant, associate and full professors (or equivalent at NWO-I CWI).
In 2003 Dave et al. have coined the term “opinion mining” to refer to “processing a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Nine years later, in 2012 Brooks and Swigger have applied sentiment analysis in the context of software engineering. Today another nine years have passed and it is time to look back: what have we achieved as a research community and where should we go next?
To answer this question we conducted a systematic literature review involving 185 papers. Based on the literature review we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for SE tasks, and provide critical insights for the further development of opinion mining techniques in the SE domain.
This work has been done together with Bin Lin, Gabriele Bavota and Michele Lanza from Università della Svizzera italiana, Switzerland, Nathan Cassee from Eindhoven University of Technology, The Netherlands and Nicole Novielli from University of Bari, Italy.
In this talk I will present results obtained on removing self-admitted technical debt. Self-admitted technical debt is an indication in the source code, usually n the source code comments, that the code is not in the right shape yet. Joint work with Emad Shihab, Everton Maldonado, Rabe Abdelkareem, Fiorella Zampetti, Massimiliano Di Penta and Gianmarco Fucci.
Gender Diversity and Inclusion and Software Engineering
Icsm 2011 you can't control the unfamiliar
1. Metrics are usually computed at a low level:
classes, methods, …
/ W&I / MDSE 3-11-2012 PAGE 0
2. Multitude of data values obscures a general
picture of the system maintainability
/W&I / MDSE 3-11-2012 PAGE 1
3. That we are actually interested in!
/W&I / MDSE 3-11-2012 PAGE 2
4. You Can't Control the Unfamiliar:
A Study on the Relations
Between Aggregation
Techniques for Software Metrics
Bogdan Vasilescu
Alexander Serebrenik
Mark van den Brand
5. Two kinds of aggregation
Same metrics, different Same artifact, different
artifacts metrics
/W&I / MDSE 3-11-2012 PAGE 4
6. Various techniques can be
found in the literature
Same metrics, different Traditional: mean,
artifacts median, sum, …
Econometric
inequality indices:
Gini, Theil, Hoover,
Kolm, Atkinson
/W&I / MDSE 3-11-2012 PAGE 5
7. Various techniques can be
found in the literature
Same metrics, different Traditional: mean,
artifacts median, sum, …
Which
aggregation
Econometric
technique
inequality indices:
Gini, Theil, Hoover,
should we
Kolm, Atkinson
use?
/W&I / MDSE 3-11-2012 PAGE 6
8. Questions
1. Which and to what extent do the different
aggregation techniques agree?
2. What is the nature of the relation between the
various aggregation techniques?
3. How does the correlation coefficient change as the
systems evolve?
/W&I / MDSE 3-11-2012 PAGE 7
9. Qualitas Corpus 20101126
• Qualitas Corpus 20101126r, 106 systems
• FitJava v1.1, 2 packages, 2240 SLOC
• NetBeans v6.9.1, 3373 packages 1890536 SLOC.
/W&I / MDSE 3-11-2012 PAGE 8
10. 1) Agreement between diff techniques
• Agreement:
• Aggregation: Class SLOC Package
• Techniques agree if they rank the packages similarly
We use rank-based correlation coefficient: Kendall’s
/W&I / MDSE 3-11-2012 PAGE 9
11. 1) Agreement: different inequality indices?
• Gini, Theil, Hoover, Atkinson – agree
• aggregates obtained convey the same information
• Kolm does not!
/W&I / MDSE 3-11-2012 PAGE 10
12. 1) Agreement: traditional and ineq indices?
• mean
• Kolm: strong (0,8) and statistically significant (92%)
• median, standard deviation, and variance
• sum
• does not correlate with any other aggregation technique
/W&I / MDSE 3-11-2012 PAGE 11
13. 2) Nature of the relation: Typical patterns
• Theil is known to be more • Linear relation with a “fat”
sensitive to the rich head
• Theil increases faster
when Gini increases
/W&I / MDSE 3-11-2012 PAGE 12
14. Which aggregation technique? (1)
• Theil, Hoover, Gini and Atkinson agree
• Any can be chosen from the correlation point of view
• Some might be “better” in each specific case
• easy to interpret: Gini [0,1]
• provide additional insights: Theil (explanation)
• negative values: Gini, Hoover
− affects the domain!
• sensitive for high values: Theil, Atkinson
• deviations from uniformity: Gini, Hoover
/ W&I / MDSE 3-11-2012 PAGE 13
15. Which aggregation technique? (2)
• Kolm and mean agree
• Kolm is reliable for skewed distributions
− better alternative (“by no means”)
• Not in the paper:
− agreement observed for NOC
− but not for DIT!
/ W&I / MDSE 3-11-2012 PAGE 14