Successfully reported this slideshow.

Msr2021 tutorial-di penta



1 of 102
1 of 102

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Msr2021 tutorial-di penta

  2. 2. MY MSR EXPERIENCE • 19 papers published (17 full research papers ) • Program committee member in 9 edition s • Program co-chair in 2012 and 201 3 • General chair in 201 5 • Steering committee member 2011-2018
  3. 3. GOALS OFTHISTUTORIAL • Explain different ways for contributing to MSR researc h • Go over the paper’s evaluation criteria and try to satisfy them
  4. 4. NOTES • I will refer to some exemplar paper s • Those are just examples, but some of them quite representative one s • All are MSR-related papers, not only from the MSR conference
  5. 5. ANALYSIS OF MSR REPORTING • I’m studying this with Davide Falessi and Alexander Serebrenik • We are interested to hear your opinion, especially if you are a senior member of the community (SurveyHero, takes 15 min. )
  9. 9. METHODOLOGICAL PAPERS Providing techniques that will hopefully help future mining research
  10. 10. FIX INDUCING CHANGES (SZZ ALGORITHM) When Do Changes Induce Fixes? (On Fridays.) Jacek Śliwerski International Max Planck Research School Max Planck Institute for Computer Science Saarbrücken, Germany Thomas Zimmermann Andreas Zeller Department of Computer Science Saarland University Saarbrücken, Germany {tz, zeller} ABSTRACT As a software system evolves, programmers make changes that sometimes cause problems. We analyze CVS archives for fix-in- ducing changes—changes that lead to problems, indicated by fixes. We show how to automatically locate fix-inducing changes by link- ing a version archive (such as CVS) to a bug database (such as BUGZILLA). In a first investigation of the MOZILLA and ECLIPSE history, it turns out that fix-inducing changes show distinct patterns with respect to their size and the day of week they were applied. Categories and Subject Descriptors D.2.7 [Software Engineering]: Distribution, Maintenance, and Enhancement—corrections, version control; D.2.8 [Metrics]: Com- plexity measures General Terms Management, Measurement 1. INTRODUCTION Which change properties may lead to problems? We can inves- tigate which properties of a change correlate with inducing fixes, for instance, changes made on a specific day or by a specific group of developers. How error-prone is my product? We can assign a metric to the product—on average, how likely is it that a change induces a later fix? How can I filter out problematic changes? When extracting the architecture via co-changes from a version archive, there is no need to consider fix-inducing changes, as they get undone later. Can I improve guidance along related changes? When using co- changes to guide programmers along related changes, we would like to avoid fix-inducing changes in our suggestions. This paper describes our first experiences with fix-inducing chang- es. We discuss how to extract data from version and bug archives (Section 2), and how we link bug reports to changes (Section 3). In Section 4, we describe how to identify and locate fix-inducing changes. Section 5 shows the results of our investigation of the
  11. 11. LINKING ISSUES TO COMMITS “ fi x 367920 setting pop3 Messages as junk/not junk ignored when Message quarantining turned on sr=mscott ” Solution: Regular expression matching e.g. $l=~/BR (d+)/ || $l=~/fixs+(d+)/i || $l=~/PRs+(d+)/ || 
 $l=~/Bugzillas+(d+)/i ||
 $l=~/Bugs+(d+)/i || $l=~/^#(d+)/i
  12. 12. IDENTIFYING FIX INDUCING CHANGES bug fi xing fi x inducing changes fi x inducing change Affected lines Affected lines fi le source code lines cn ^cn before bug fi xing ci cj ck
  14. 14. INFRASTRUCTURE • Setting up tools or data for other researcher s • Sometimes a consequence of a methodological contribution
  15. 15. SRCML An XML-Based Lightweight C++ Fact Extractor Michael L. Collard, Huzefa H. Kagdi, Jonathan I. Maletic Department of Computer Science Kent State University Kent Ohio 44242 330 672 9039,, Abstract A lightweight fact extractor is presented that utilizes XML tools, such as XPath and XSLT, to extract static information from C++ source code programs. The source code is first converted into an XML representation, srcML, to facilitate the use of a wide variety of XML tools. The method is deemed lightweight because only a partial parsing of the source is done. Additionally, the technique is quite robust and can be applied to incomplete and non-compile-able source code. The trade off to this approach is that queries on some low level details cannot be directly addressed. This approach is applied to a fact extractor benchmark as comparison with other, abet heavier weight, fact extractors. Fact extractors are widely used to support understanding tasks associated with maintenance, reverse engineering and various other software engineering tasks. a lightweight, robust, and tolerant C++ fact extractor. We use the term lightweight to highlight the fact that only lightweight parsing is done and a number of very low-level type facts can not be directly derived from the data source (i.e., srcML markup of the C++ source). Our method allows the extraction of high-level entities such as functions, classes, namespaces, and templates, as well as middle-level entities such as individual statements (if, while, etc.), declarations and expressions. Lower-level entities such as variables and function calls can also be queried. Additionally, it allows the extraction of entities that are typically discarded during pre- processing such as comments, pre-processor directives, and macros. The entities are extracted with full lexical information such as white space and all original source code information. The following section will address some of the problems encountered during fact extraction and address the related work in the field of fact extraction. We then describe srcML and our C++ to srcML translator.
  16. 16. SRCML • Parses source code and produces the output in XM L • Multi-languag e • Also supports transformations, lightweight slicing/ data fl ow analysis
  17. 17. PERCEVAL Perceval: Software Project Data at Your Will Santiago Dueñas Bitergia Valerio Cosentino Bitergia Gregorio Robles Universidad Rey Juan Carlos Jesus M. Gonzalez-Barahona Universidad Rey Juan Carlos ABSTRACT Software development projects, in particular open source ones, heavily rely on the use of tools to support, coordinate and promote development activities. Despite their paramount value, they con- tribute to fragment the project data, thus challenging practitioners and researchers willing to derive insightful analytics about software projects. In this demo we present Perceval, a loyal helper able to perform automatic and incremental data gathering from almost any tool related with contributing to open source development, among others, source code management, issue tracking systems, mailing lists, forums, and social media. Perceval is an industry strong free software tool that has been widely used in Bitergia, a company devoted to offer commercial software analytics of software projects. It hides the technical complexities related to data acquisition and eases the definition of analytics. A video showcasing the main features of Perceval can be found at KEYWORDS Software mining, empirical software engineering, open source soft- However, accessing and gathering this data is often a time- consuming and an error-prone task, that entails many considera- tions and technical expertise [1, 12, 16]. It may require to understand how to obtain an OAuth [11] token (e.g., StackExchange, GitHub) or prepare storage to download the data (e.g., Git repositories, mail- ing list archives); when dealing with development support tools that expose their data via APIs, special attention has to be paid to the terms of service (e.g., an excessive number of requests could lead to temporary or permanent bans); recovery solutions to tackle connection problems when fetching remote data should also taken into account; storing the data already received and retrying failed API calls may speed up the overall gathering process and reduce the risk of corrupted data. Nonetheless, even if these problems are known, many scholars and practitioners tend to re-invent the wheel by retrieving the data themselves with ad-hoc scripts. In this paper, we present Perceval, a tool that simplifies the col- lection of project data by covering more than 20 well-known tools and platforms related to contributing to open source development, thus enabling the definition of software analytics. It rebuilts and 2018 ACM/IEEE 40th International Conference on Software Engineering: Companion Proceedings
  18. 18. PERCEVAL •Gathers data from a wide number of software repositories •git, GitHub, issue trackers, Slack, Gerrit, Docker hub, and many others
  19. 19. PYDRILLER PyDriller: Python Framework for Mining So�ware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands Maurício Aniche Delft University of Technology Delft, The Netherlands Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i� ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing �elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P��������, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P�������� can achieve the same results with, on average, 50% less LOC and signi�cantly lower complexity. URL:, Materials:, Pre-print: CCS CONCEPTS • Software and its engineering; actionable insights for software engineering, such as understanding the impact of code smells [13–15], exploring how developers are doing code reviews [2, 4, 10, 21] and which testing practices they follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the di�erent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often di�cult to use. One of the main reasons for such di�culty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P��������, a Python framework that helps developers to mine software repositories. P�������� provides
  20. 20. PYDRILLER r • Python-based mining framewor k • Changed fi les, diffs, metric s • Watch back this morning Tutorial 
 by Mauricio Aniche and Alberto Bacchelli
  21. 21. GHTORRENT
  26. 26. PERSPECTIVE PAPERS Provide insights on how (not to) mine certain repositorie s Lessons learned, things to avoid
  27. 27. ON MINING GIT… The Promises and Perils of Mining Git Christian Bird⇤, Peter C. Rigby†, Earl T. Barr⇤, David J. Hamilton⇤, Daniel M. German†, Prem Devanbu⇤ ⇤University of California, Davis, USA †University of Victoria, Canada {bird,barr,hamiltod,devanbu} {pcr,dmg} Abstract We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decen- tralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded con- 500 1000 1500 2000 2500 3000 Number of Projects Subversion Git Bazaar CVS Darcs Hg
  28. 28. … AND GITHUB The Promises and Perils of Mining GitHub Eirini Kalliamvakou University of Victoria Georgios Gousios Delft University of Technology Kelly Blincoe University of Victoria Leif Singer University of Victoria Daniel M. German⇤ University of Victoria Daniela Damian University of Victoria ABSTRACT With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the infor- mation stored in GitHub’s event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at un- derstanding the characteristics of the repositories in GitHub and how users take advantage of GitHub’s main features— namely commits, pull requests, and issues. Our results indi- cate that, while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. We show, for example, that the majority of the projects are personal and inactive; that GitHub is also being used for free storage and as a Web hosting service; and that almost 40% of all pull requests do not appear as merged, even though they were. We provide a set of recommendations for software engineer- ing researchers on how to approach the data in GitHub. Categories and Subject Descriptors D.2.8 [Software Engineering]: Management—Software con- “fork & pull” model in which developers create their own copy of a repository and submit a pull request when they want the project maintainer to pull their changes into the main branch. In addition to code hosting, collaborative code review, and integrated issue tracking, GitHub has integrated social features. Users are able to subscribe to information by “watching” projects and “following” users, resulting in a feed of information on those projects and users of interest. Users also have profiles that can be populated with identifying information and contain their recent activity within the site. With over 10.6 million repositories1 hosted as of January 2014, GitHub is currently the largest code hosting site in the world. Its popularity, the integrated social features, and the availability of metadata through an accessible api have made GitHub very attractive for software engineering researchers. Existing research has been both qualitative [4, 7, 16, 17, 19] and quantitative [10, 24, 25, 26]. Qualitative studies have fo- cused on how developers use GitHub’s social features to form impressions and draw conclusions on other developers’ and projects’ activity to assess success, performance, and possi- ble collaboration opportunities. Quantitative studies have aimed to systematically archive GitHub’s publicly available data and use that to investigate development practices and network structure in the GitHub environment. As part of our research on collaboration on GitHub [15],
  30. 30. EMPIRICAL
  31. 31. ABOUT EMPIRICAL RESEARCH Quantitative, Qualitative, or both Observing patterns in a project Finding correlations between variables
  32. 32. QUANTITATIVE STUDY An Empirical Analysis of the Docker Container Ecosystem on GitHub Jürgen Cito∗, Gerald Schermann∗, John Erik Wittern†, Philipp Leitner∗, Sali Zumberi∗, Harald C. Gall∗ ∗ Software Evolution and Architecture Lab University of Zurich, Switzerland {lastname} † IBM T. J. Watson Research Center Yorktown Heights, NY, USA Abstract—Docker allows packaging an application with its dependencies into a standardized, self-contained unit (a so-called container), which can be used for software development and to run the application on any system. Dockerfiles are declarative definitions of an environment that aim to enable reproducible builds of the container. They can often be found in source code repositories and enable the hosted software to come to life in its execution environment. We conduct an exploratory empirical study with the goal of characterizing the Docker ecosystem, prevalent quality issues, and the evolution of Dockerfiles. We base our study on a data set of over 70000 Dockerfiles, and contrast this general population with samplings that contain the Top-100 and Top-1000 most popular Docker-using projects. We find that most quality issues (28.6%) arise from missing version pinning (i.e., specifying a concrete version for dependencies). Further, we were not able to build 34% of Dockerfiles from a representative sample of 560 projects. Integrating quality checks, e.g., to issue version pinning warnings, into the container build process could result into more reproducible builds. The most popular projects change more often than the rest of the Docker population, with 5.81 revisions per year and 5 lines of code changed on average. ity [4], we study the Docker ecosystem with respect to quality of Dockerfiles and their change and evolution behavior within software repositories. We developed a tool chain that trans- forms Dockerfiles and their evolution in Git repositories into a relational database model. We mined the entire population of Dockerfiles on GitHub as of October 2016, and summarize our findings on the ecosystem in general, quality aspects, and evolution behavior. The results of our study can inform standard bodies around containers and tool developers to develop better support to improve quality and drive ecosystem change. We make the following contributions through our ex- ploratory study: Ecosystem Overview. We characterize the ecosystem of Docker containers on GitHub by analyzing the distribution of projects using Docker, broken down by primary programming language, project size, and the base infrastructure (base image) 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)
  33. 33. QUALITATIVE STUDY (ONE PROJECT) Communication in Open Source Software Development Mailing Lists Anja Guzzi1 , Alberto Bacchelli2 , Michele Lanza2 , Martin Pinzger3 , Arie van Deursen1 1: Department of Software and Computer Technology - Delft University of Technology, The Netherlands 2: REVEAL @ Faculty of Informatics - University of Lugano, Switzerland 3: Institute for Informatics Systems - University of Klagenfurt, Austria Abstract—Open source software (OSS) development teams use electronic means, such as emails, instant messaging, or forums, to conduct open and public discussions. Researchers investigated mailing lists considering them as a hub for project communica- tion. Prior work focused on specific aspects of emails, for example the handling of patches, traceability concerns, or social networks. This led to insights pertaining to the investigated aspects, but not to a comprehensive view of what developers communicate about. Our objective is to increase the understanding of development mailing lists communication. We quantitatively and qualitatively analyzed a sample of 506 email threads from the development mailing list of a major OSS project, Lucene. Our investigation reveals that implementation details are discussed only in about 35% of the threads, and that a range of other topics is discussed. Moreover, core developers participate in less than 75% of the threads. We observed that the development mailing list is not the main player in OSS project communication, as it also includes other channels such as the issue repository. I. Introduction Open source software (OSS) development teams use elec- tronic means, such as emails, instant messaging, or forums, Nevertheless, there is no clear, updated, and well-rounded picture of the communication taking place in open source development mailing lists that supports these assumptions. In fact, at our disposal, we only have either abstract and outdated knowledge (e.g., obtained as a side e↵ect of the analysis of the Linux project), which does not consider the recent shift of interest to new social platforms (e.g., GitHub and Jira), or a very specialized understanding (e.g., regarding specific information, such as the process of code review [25]), which does not take into account all the information that can be distilled from development emails. Our goal is to increase our understanding of development mailing lists communication: What do participants talk about? How much do they discuss each topic? What is the role of the development mailing lists for OSS project communication? Answering these questions can confirm or cast doubts on the previous assumptions, and it can provide insights for future research on mining developers’ communication and for building tools to help project teams communicate e↵ectively. To answer these questions, we conducted an in-depth analysis of the communication taking place in the development mailing
  35. 35. TECHNOLOGICAL • Those should be the ice on the cak e • Consequence of all previous researc h • Exploiting software repositories to help developers
  36. 36. RECOMMENDING RELEVANT STACKOVERFLOW DISCUSSIONS Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter Luca Ponzanelli1, Gabriele Bavota2, Massimiliano Di Penta2, Rocco Oliveto3, Michele Lanza1 1: REVEAL @ Faculty of Informatics – University of Lugano, Switzerland 2: University of Sannio, Benevento, Italy 3: University of Molise, Pesche (IS), Italy ABSTRACT Developers often require knowledge beyond the one they possess, which often boils down to consulting sources of information like Application Programming Interfaces (API) documentation, forums, Q&A websites, etc. Knowing what to search for and how is non- trivial, and developers spend time and energy to formulate their problems as queries and to peruse and process the results. We propose a novel approach that, given a context in the IDE, automatically retrieves pertinent discussions from Stack Overflow, evaluates their relevance, and, if a given confidence threshold is surpassed, notifies the developer about the available help. We have implemented our approach in Prompter, an Eclipse plug-in. Prompter has been evaluated through two studies. The first was aimed at evaluating the devised ranking model, while the second was conducted to evaluate the usefulness of Prompter. problems, the main one being the absence of automation: Every time developers need to look for information, they interrupt their work flow, leave the IDE, and use a Web browser to perform and refine searches, and assess the results. Finally, they transfer the obtained knowledge to the problem context in the IDE. The information is retrieved from di↵erent sources, such as forums, mailing lists [2], blogs, Q&A websites, bug trackers [1], etc. A prominent example is Stack Overflow, popular among developers as a venue for sharing programming knowledge. Stack Overflow is vast: In 2010 it already had 300k users, and millions of questions, answers, and comments [23]. This makes finding the right piece of information cumbersome and challenging. Recommender systems [33] represent a possible solution to this problem. A recommender system gathers and analyzes data, iden- tifies useful artifacts, and suggests them to the developer. Seminal
  37. 37. APPROACH Search Service Eclipse Prompter Query Generation Service Search Engines Google Bing Blekko Stack Overflow API Service Ranking Model Search Engine Proxy Code Context 1 3 2 Code Context Query & Triggering Info Query & Code Context 4 Query 5 Results 6 Discussion IDs 7 Documents 8 Ranked Results
  38. 38. TOOL (PROMPTER) 1 2
  39. 39. EVALUATION NP P 20 40 60 80 100 Treatment Completeness •User stud y •Developers performing the task with and without the tool
  41. 41. RECAP There are different ways you can contribute to MSR research, beyond empirical studies
  43. 43. ANSWER -YOU CAN’T There is always a chance reviewers won’t like your paper
  44. 44. This is an opportunity to make our work more convincing Don’t despair! In the end we will thank the reviewers
  47. 47. FROM MSR 202 1 CALL FOR PAPERS • Soundness of approac h • Relevance to software engineerin g • Clarity of relation with related wor k • Quality of presentatio n • Quality of evaluation [for long papers ] • Ability to replicate [for long papers ] • Novelty
  48. 48. RELEVANCE
  49. 49. RELEVANCE Ok my paper is about software engineering, 
 so it’s fi ne…
  50. 50. QUESTIONSTO ASK • Does the paper solve a problem relevant for any stakeholder ? • Does phenomenon being investigated by the study frequently occur and impact real projects ? • Is the achieved improvement tangible for the interested stakeholder?
  51. 51. RELEVANCE: EXAMPLES OF WEAK CONTRIBUTION The investigated code bad smell occurs in the 1% of the studied projects
  52. 52. RELEVANCE: EXAMPLES OF WEAK CONTRIBUTION We improve defect prediction precision 30% precision to 40%
  53. 53. THAT BEING SAID… Sometimes very small improvements pave the road towards tangible, signi fi cant ones!
  54. 54. MSR RESEARCHER TEMPTATION Here’s a new dataset… let’s try to do something with that!
  55. 55. PROBLEM-DRIVENVS OF DATA-DRIVEN RESEARCH How would my study (help to) solve a problem developers have?
  56. 56. NOVELTY
  57. 57. EXAMPLES OF NOVEL CONTRIBUTIONS • Novel approach: propose an approach improving the state- of-the-ar t • New empirical results: New, possibly unexpected, empirical evidenc e • Negative result: Shows that something does not work • Replication: Con fi rms (in a different context) previous results
  59. 59. VERSIONING MINING Details to describe and justify : • History rang e • Branche s • Commit orderin g • On excluding merge commits
  60. 60. THREATSTO DISCUSS • History can be rewritte n • When mining repositories, there’s little you can d o • At least, discuss the threats
  61. 61. NOT ALL BUG-RELATED ISSUES ARE BUGS 0 150 300 450 600 Mozilla Eclipse JBoss 156 24 121 99 382 209 345 194 270 Bugs Non bugs Others Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, Yann-Gaël Guéhéneuc: Is it a bug or an enhancement?: a text-based approach to classify change requests. CASCON 2008: 23
  62. 62. It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction Kim Herzig Saarland University Saarbrücken, Germany Sascha Just Saarland University Saarbrücken, Germany Andreas Zeller Saarland University Saarbrücken, Germany Abstract—In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified—that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We estimate the impact of this misclassification on earlier studies and recommend manual data validation for future studies. Index Terms—mining software repositories; bug reports; data quality; noise; bias I. INTRODUCTION In empirical software engineering, it has become common- place to mine data from change and bug databases to detect where bugs have occurred in the past, or to predict where they will occur in the future. The accuracy of such measurements and predictions depends on the quality of the data. Therefore, TABLE I PROJECT DETAILS. Maintainer Tracker type # reports HTTPClient APACHE Jira 746 Jackrabbit APACHE Jira 2,402 Lucene-Java APACHE Jira 2,443 Rhino MOZILLA Bugzilla 1,226 Tomcat5 APACHE Bugzilla 584 These are the questions we address in this paper. From five open source projects (Section II), we manually classified more than 7,000 issue reports into a fixed set of issue report categories clearly distinguishing the kind of maintenance work required to resolve the task (Section III). Our findings indicate substantial data quality issues: Issue report classifications are unreliable. In the five bug databases investigated, more than 40% of issue reports
  63. 63. THREAT: MISSING LINKS nmbd_incomingdgrams.c: Fix bug with Syntax 5.1 servers reported by SGI where they do host announcements to LOCAL_MASTER_BROWSER_NAME<00 rather than WORKGROUP<1d Quieten level 0 debug when probing for modules.We shouldn't display so loud an error when a smb_probe_module() fails.Also tidy up debugs a bit. Bug 375.
  64. 64. MISSING LINKS The Missing Links: Bugs and Bug-fix Commits Adrian Bachmann1 , Christian Bird2 , Foyzur Rahman2 , Premkumar Devanbu2 and Abraham Bernstein1 1 Department of Informatics, University of Zurich, Switzerland 2 Computer Science Department, University of California, Davis, USA {bachmann,bernstein} {cabird,mfrahman,ptdevanbu} ABSTRACT Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a difficult task, requiring an ex- pert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and fi- 1. INTRODUCTION Software process data, especially bug reports and commit logs, are widely used in software engineering research. The integration of these two provides valuable information on the history and evolution of a software project. It is used, e.g., to predict the number and locale of bugs in future software releases (e.g., [27, 31, 17, 6]). The two data sources are nor- mally integrated by scanning through the version control log messages for potential bug report numbers; conscien- tious developers enter this information when they check-in bug fixes (e.g., see [14]). We used similar techniques in our previous work, and, in fact, improved current practice by adding heuristics to check the results [3, 4]. Even so, the links (between program code commits and bug reports) thus extracted cannot be guaranteed to be correct, as they are reliant on voluntary developer annotations in commit logs. In prior work, we have shown that such data sets are
  65. 65. ONTHE USE OFTOOLS • You are not reinventing the whee l • The MSR community is contributing with great tool s • Consider about reusing them
  66. 66. ISTHETOOL WORKING? • A minimal validation to check whether a tool correctly work s • We gave up in using a popular tool as its results were wrong
  67. 67. MACHINE LEARNING RETRAINING/TUNING Machine learning-based tools may need to be retrained/tuned if applied in a completely different context
  69. 69. EMPIRICAL EVALUATION SOUNDNESS • This topic would require a separate tutorial (and there are many ) • Suitable design, appropriate use of statistical procedures, threats to validity discussed/ mitigated, … • We will focus on projects’ selection
  70. 70. HOW BIG? - 20YEARS AGO The evaluation is very small… only one project is analyzed
  71. 71. HOW BIG? - 10YEARS AGO The evaluation is very small… only fi ve projects are analyzed
  72. 72. HOW BIG? -TODAY The evaluation is very small… only 100 projects are analyzed
  73. 73. JOKE APART… I use this argument very rarely against (and in favor) of a paper
  74. 74. ONE SIZE DOES NOT FIT ALL The size and type of the dataset depends o n • the goals of the pape r • the research method being use d • depth vs. breadth
  75. 75. CHOICE OF DATASETS • Existing datasets: are they appropriate to your research? Are they too obsolete ? • Mining your own dataset: de fi ne a clear selection criteria
  76. 76. ON PROJECTS’ SELECTION • Toy projects, tutorial s • Forked project s • Inactive projects
  77. 77. STARS MAY NO T BETHE BESTTHING… The Journal of Systems and Software 146 (2018) 112–129 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: Controversy Corner What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform Hudson Borges∗ , Marco Tulio Valente Department of Computer Science, UFMG, Brazil a r t i c l e i n f o Article history: Received 4 September 2017 Revised 27 August 2018 Accepted 7 September 2018 Available online 10 September 2018 Keywords: a b s t r a c t Besides a git-based version control system, GitHub integrates several social coding features. Particularly, GitHub users can star a repository, presumably to manifest interest or satisfaction with an open source project. However, the real and practical meaning of starring a project was never the subject of an in- depth and well-founded empirical investigation. Therefore, we provide in this paper a throughout study on the meaning, characteristics, and dynamic growth of GitHub stars. First, by surveying 791 developers, we report that three out of four developers consider the number of stars before using or contributing
  78. 78. DIVERSITY (WHEN NEEDED) Diversity in Software Engineering Research Meiyappan Nagappan Software Analysis and Intelligence Lab Queen’s University, Kingston, Canada Thomas Zimmermann Microsoft Research Redmond, WA, USA Christian Bird Microsoft Research Redmond, WA, USA ABSTRACT One of the goals of software engineering research is to achieve gen- erality: Are the phenomena found in a few projects reflective of others? Will a technique perform as well on projects other than the projects it is evaluated on? While it is common sense to select a sample that is representative of a population, the importance of di- versity is often overlooked, yet as important. In this paper, we com- bine ideas from representativeness and diversity and introduce a measure called sample coverage, defined as the percentage of pro- jects in a population that are similar to the given sample. We intro- duce algorithms to compute the sample coverage for a given set of projects and to select the projects that increase the coverage the most. We demonstrate our technique on research presented over the span of two years at ICSE and FSE with respect to a population of 20,000 active open source projects monitored by Knowing the coverage of a sample enhances our ability to reason about the findings of a study. Furthermore, we propose reporting guidelines for research: in addition to coverage scores, papers should discuss the target population of the research (universe) and dimensions that potentially can influence the outcomes of a re- search (space). Categories and Subject Descriptors D.2.6 [Software Engineering]: Metrics et al. [2] examined 1,000 projects. Another example is the study by Gabel and Su that examined 6,000 projects [3]. But if care isn’t taken when selecting which projects to analyze, then increasing the sample size does not actually contribute to the goal of increased generality. More is not necessarily better. As an example, consider a researcher who wants to investigate a hypothesis about say distributed development on a large number of projects in an effort to demonstrate generality. The researcher goes to the website and randomly selects twenty projects, all of them JSON parsers. Because of the narrow range of functionality of the projects in the sample, any findings will not be very repre- sentative; we would learn about JSON parsers, but little about other types of software. While this is an extreme and contrived example, it shows the importance of systematically selecting projects for em- pirical research rather than selecting projects that are convenient. With this paper we provide techniques to (1) assess the quality of a sample, and to (2) identify projects that could be added to further improve the quality of the sample. Other fields such as medicine and sociology have published and accepted methodological guidelines for subject selection [2] [4]. While it is common sense to select a sample that is representative of a population, the importance of diversity is often overlooked yet as important [5]. As stated by the Research Governance Framework
  81. 81. REPRODUCIBILITY • Not just about replication package s • Including details in your paper, which should be self-contained
  82. 82. SUPPORTINGTECHNOLOGY Zenodo, Jupyter notebooks, Docker containers, Virtual Machines
  83. 83. REPOSITORIES AREVOLATILE! • Q&A posts get delete d • GitHub repositories become private, archived, or get delete d • The same may happen to any content available on the Internet
  84. 84. 78% OF PROMPTER’S RECOMMENDATIONS CHANGED AFTER ONEYEAR Empir Software Eng DOI 10.1007/s10664-015-9397-1 Prompter Turning the IDE into a self-confident programming assistant Luca Ponzanelli1 · Gabriele Bavota2 · Massimiliano Di Penta3 · Rocco Oliveto4 · Michele Lanza1 © Springer Science+Business Media New York 2015 Abstract Developers often require knowledge beyond the one they possess, which boils down to asking co-workers for help or consulting additional sources of information, such as Application Programming Interfaces (API) documentation, forums, and Q&A websites. However, it requires time and energy to formulate one’s problem, peruse and process the
  85. 85. IN CONCLUSION… If you run a study today, this may not be reproduced from scratch tomorrow unless having all data
  87. 87. PRESENTATION QUALITY • I rarely reject a paper because of tha t • Not just matter of getting your paper in, but rather to let others better understanding your work
  88. 88. FOLLOWING ATEMPLATE • There are recurring templates for papers belonging to different categorie s • Such templates may help the reader know where to fi nd what
  89. 89. EMPIRICAL PAPER • Introductio n • Study design (include the data extraction process ) • Study result s • Threats to validit y • Related wor k • Conclusion
  90. 90. EMPIRICAL STUDY DESIGN • De fi nitio n • Research Questions / Hypothese s • Context Selectio n • Data extraction methodolog y • Data analysis methodology
  91. 91. TECHNOLOGICAL PAPER • Introductio n • Backgrounds (if any ) • Approac h • Empirical evaluation (may be split ) • Related Wor k • Conclusions
  92. 92. NOTE • You do not have to stick to those template s • There are good reasons to avoid that
  93. 93. FOR EXAMPLE YOU MAY HAVE • First study needed to understand the proble m • Approach de fi nition (based on fi rst study ) • Approach evaluation
  94. 94. GAME SMELL PAPER (MSR 2020) Detecting Video Game-Specific Bad Smells in Unity Projects Antonio Borrelli University of Sannio Benevento, Italy Vittoria Nardone University of Sannio Benevento, Italy Giuseppe A. Di Lucca University of Sannio Benevento, Italy Gerardo Canfora University of Sannio Benevento, Italy Massimiliano Di Penta University of Sannio Benevento, Italy ABSTRACT The growth of the video game market, the large proportion of games targeting mobile devices or streaming services, and the increasing complexity of video games trigger the availability of video game- speci�c tools to assess performance and maintainability problems. This paper proposes UnityLinter, a static analysis tool that supports Unity video game developers to detect seven types of bad smells we have identi�ed as relevant in video game development. Such smell types pertain to performance, maintainability and incorrect behavior problems. After having de�ned the smells by analyzing the existing literature and discussion forums, we have assessed their relevance with a survey involving 68 participants. Then, we have analyzed the occurrence of the studied smells in 100 open- source Unity projects, and also assessed UnityLinter’s accuracy. Results of our empirical investigation indicate that developers well- received performance- and behavior-related issues, while some maintainability issues are more controversial. UnityLinter is, in general, accurate enough in detecting smells (86%-100% precision and 50%-100% recall), and our study shows that the studied smell types occur in 39%-97% of the analyzed projects. 1 INTRODUCTION Video games represent a conspicuous and increasing share of the software development market. In 2018, the video game industry has generated 134.9 billion dollars, with over 10% increase over 2017 [25]. Such a market is changing continuously also in terms of platforms on which video games are deployed. In the past, video games mainly targeted consoles and desktop computers; nowadays mobile devices account for nearly half of the market [24], and the current trend is the streaming of video game contents. While the video game market is increasing, development skills in this area still represent a niche. Just to give an idea, Stack Over�ow features over 1.5M discussions tagged [java] and 1.2M tagged An- droid, while only 50k are about Unity3D. It is therefore clear how in this context developers may need suitable support while creating their video games, helping them to avoid introducing performance bottlenecks, or making the game di�cult to maintain and evolve. Static code analysis tools (SCAT) are a typical support developers have while coding. Such tools, known also as “linters” (from the �rst tool developed by Johnson for the C language [28]) analyze the source code or the compiled (e.g., bytecode) program to highlight
  95. 95. YET… One reviewer complained that research questions weren’t all addressed in a single place
  96. 96. RELEASE NOTE GENERATION (TSE 2017) ARENA: An Approach for the Automated Generation of Release Notes Laura Moreno, Member, IEEE, Gabriele Bavota, Member, IEEE, Massimiliano Di Penta, Member, IEEE, Rocco Oliveto, Member, IEEE, Andrian Marcus, Member, IEEE, and Gerardo Canfora Abstract—Release notes document corrections, enhancements, and, in general, changes that were implemented in a new release of a software project. They are usually created manually and may include hundreds of different items, such as descriptions of new features, bug fixes, structural changes, new or deprecated APIs, and changes to software licenses. Thus, producing them can be a time-consuming and daunting task. This paper describes ARENA (Automatic RElease Notes generAtor), an approach for the automatic generation of release notes. ARENA extracts changes from the source code, summarizes them, and integrates them with information from versioning systems and issue trackers. ARENA was designed based on the manual analysis of 990 existing release notes. In order to evaluate the quality of the release notes automatically generated by ARENA, we performed four empirical studies involving a total of 56 participants (48 professional developers and eight students). The obtained results indicate that the generated release notes are very good approximations of the ones manually produced by developers and often include important information that is missing in the manually created release notes. Index Terms—Release notes, software documentation, software evolution Ç 1 INTRODUCTION RELEASE notes summarize the main changes that occurred in a software system since its previous release, such as, the addition of new features, bug fixes, changes to licenses this task by generating simplified release notes (e.g., the Atlas- sian OnDemand release note generator1 ), yet such notes are lim- ited to list closed issues that developers have manually 106 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 43, NO. 2, FEBRUARY 2017
  97. 97. CONCLUSION
  99. 99. FROM MSR 2021 CALL FOR PAPERS • Soundness of approach • Relevance to software engineering • Clarity of relation with related work • Quality of presentation • Quality of evaluation [for long papers] • Ability to replicate [for long papers] • Novelty METHODOLOGICAL INFRASTRUCTURE PERSPECTIVE EMPIRICAL TECHNOLOGICAL
  100. 100. FROM MSR 2021 CALL FOR PAPERS • Soundness of approach • Relevance to software engineering • Clarity of relation with related work • Quality of presentation • Quality of evaluation [for long papers] • Ability to replicate [for long papers] • Novelty METHODOLOGICAL INFRASTRUCTURE PERSPECTIVE EMPIRICAL TECHNOLOGICAL TAKEAWAYS Different types of contributions to MSR, beyond studies, are highly needed Dataset size and type depends on the study goals and research method Mining process must be documented and justified in detail
  101. 101. FROM MSR 2021 CALL FOR PAPERS • Soundness of approach • Relevance to software engineering • Clarity of relation with related work • Quality of presentation • Quality of evaluation [for long papers] • Ability to replicate [for long papers] • Novelty METHODOLOGICAL INFRASTRUCTURE PERSPECTIVE EMPIRICAL TECHNOLOGICAL TAKEAWAYS Different types of contributions to MSR, beyond studies, are highly needed Dataset size and type depends on the study goals and research method Mining process must be documented and justified in detail @mdipenta