Icse2011 build maintenance

Slides for An Empirical Study of Build Maintenance Effort.

Technology Sports

Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction. In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.

Tracing Software Build Processes to Uncover License Compliance Inconsistencies

Open Source Software (OSS) components form the basis for many software systems. While the use of OSS components accelerates development, client systems must comply with the license terms of the OSS components that they use. Failure to do so exposes client system distributors to possible litigation from copyright holders. Yet despite the importance of license compliance, tool support for license compliance assessment is lacking. In this paper, we propose an approach to extract and analyze the Concrete Build Dependency Graph (CBDG) of a software system by tracing system calls that occur at build-time. Through a case study of seven open source systems, we show that the extracted CBDGs: (1) accurately classify sources as included in or excluded from deliverables with 88%-100% precision and 98%-100% recall, and (2) can uncover license compliance inconsistencies in real software systems - two of which prompted code fixes in the CUPS and FFmpeg systems.

ICSE2011_SRC

The document discusses build system maintenance. It notes that build code is complex and requires 12% of a developer's time on average. Build bugs can affect end users. Four dimensions of build maintenance are discussed: size, evolution, coupling, and people involvement. Studies found that build churn is greater than source code churn, and that build maintenance is often dispersed across many team members rather than concentrated in a small team. Tool support is needed to help with build maintenance tasks.

Mining Co-Change Information to Understand when Build Changes are Necessary

As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.

130426 yujuan jiang - will my patch make it and how fast

Ptidej Team

The document discusses the Linux kernel patch submission and integration process. It begins by outlining three research questions: how many patches are merged, what type of patch is more likely to be merged, and what type of patch is accepted faster. It then describes the setup of a case study to analyze patch data, including information on patch emails, commits, and metrics. Charts are presented showing the percentage of accepted vs rejected patches each year and that reviewing time has sped up while integration time has slowed down. Decision trees are discussed for classifying patch outcomes.

Msr2010 mc intosh

SAIL_QU

The document discusses the evolution of ANT build systems over time. It examines two research questions: (1) Do source code and ANT build systems co-evolve? The analysis found that growth and reduction periods in build system size often coincide with source code, suggesting they co-evolve. (2) Does perceived build complexity evolve? Build length and depth were found to increase over time, indicating growing complexity. Other findings include correlations between build size and complexity metrics. Threats to validity and conclusions are also presented.

Buildtechs

This document summarizes the results of a large-scale empirical study on the relationship between build systems and build maintenance activity. The study analyzed over 800,000 open source projects to compare how different build technologies (e.g. Make, Autotools, Maven) affect build churn, source code coupling, and authorship over time. The key findings are that framework-based build systems tend to have higher build churn, tighter source code coupling, and decreasing authorship as projects migrate to more advanced technologies.

Unicode - What you need to know

This document discusses Unicode and its importance for character sets like Arabic, Sinhala, Tamil, and Chinese. It covers Unicode encoding and font faces, explaining that Unicode fonts allow all languages to be displayed together while non-Unicode fonts only support one language. The document also notes some implications of using Unicode, such as improved internationalization and interoperability across systems.

This document discusses a mapping study on the use of social metrics in software engineering prediction models. It aims to identify which social metrics have been used and whether they had a positive effect. The study found that social metrics were often classified under other dimensions and there was inconsistent terminology. It identified papers reporting on various social metrics and grouped them into categories and sub-categories. The results showed that most papers reported social metrics had a positive effect in prediction models, while some reported negative or neutral effects. The conclusions note more research is needed on social metrics in different contexts and using larger datasets.

Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...

Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.

Identifying Hotspots in Software Build Processes

The document discusses identifying hotspots, or files that frequently change and are costly to rebuild, in software build processes. It presents an approach that constructs a build dependency graph from the build system, analyzes the graph to determine file change frequency and rebuild costs, and detects hotspots using a quadrant plot that highlights files that change often and have high rebuild costs. Case studies on two open source projects found a small number of hotspot files accounted for a majority of rebuild time. Focusing refactoring on hotspots could significantly improve build performance.

Orchestrating Change: An Artistic Representation of Software Evolution

Several visualization tools have been proposed to highlight interesting software evolution phenomena. These tools help practitioners to navigate large and complex software systems, and also support researchers in studying software evolution. However, little work has explored the use of sound in the context of software evolution. In this paper, we propose the use of musical interpretation to support exploration of software evolution data. In order to generate music inspired by software evolution, we use parameter-based sonification, i.e., a mapping of dataset characteristics to sound. Our approach yields musical scores that can be played synthetically or by a symphony orchestra. In designing our approach, we address three challenges: (1) the generated music must be aesthetically pleasing, (2) the generated music must accurately reflect the changes that have occurred, and (3) a small group of musicians must be able to impersonate a large development team. We assess the feasibility of our approach using historical data from Eclipse, which yields promising results.

Tracing Software Build Processes to Uncover License Compliance Inconsistencie...

This document discusses using build systems to uncover license compliance inconsistencies. It describes how build systems can be traced to construct dependency graphs and annotate components with license information. An empirical study found the approach accurately discovered inconsistencies with 88-100% precision and 98-100% recall. The technique prompted code changes in two systems within a few days to resolve license issues uncovered.

Module System in Standard ML

The document discusses the module system of Standard ML. It begins with an introduction to modularity and why it is important for writing large programs. It then provides an overview of Standard ML, including its history, features like higher-order functions and polymorphism. The document focuses on how Standard ML implements modularity using modules. It explains how modules allow hiding implementation details while still allowing controlled sharing between modules. It provides examples of defining modules for stacks and lists to demonstrate abstract types.

UM ESTUDO EMPÍRICO DO USO DA COMUNICAÇÃO PARA CARACTERIZAR A OCORRÊNCIA DE DE...

[1] Um estudo empírico analisou a comunicação entre desenvolvedores e métricas de histórico de código para caracterizar dependências de mudança no projeto Ruby on Rails. [2] Os resultados mostraram que modelos de aprendizado de máquina podem caracterizar dependências fortes e fracas com alta precisão usando essas métricas. [3] As métricas mais relevantes foram densidade de comentários, centralidade dos desenvolvedores e métricas que medem a estrutura da rede de comunicação.

Identifying Hotspots in the PostgreSQL Build Process

Software developers rely on a fast and correct build system to compile their source code changes and produce modified deliverables for testing and deployment. The scale and complexity of the PostgreSQL build process makes build performance an important topic to discuss and address. In this talk, we will introduce a new build performance analysis technique that identifies "build hotspots", i.e., files that are slow to rebuild (by analyzing a build dependency graph), yet change often (by analyzing version control history). We will discuss the identified hotspots in the 9.2.4 release of PostgreSQL.

USING STRUCTURAL HOLES METRICS FROM COMMUNICATION NETWORKS TO PREDICT CHANGE ...

This document examines using structural hole metrics (SHM) from communication networks to predict change dependencies between software artifacts. It finds that SHM can predict change dependencies with an area under the curve over 0.7. Constraint and hierarchy SHM were most important for one project, while commits and updates were most important when including process metrics. The study provides initial evidence that SHM obtained from communication networks can predict change dependencies as suggested by Conway's Law. Future work could explore additional projects, metrics, and comparisons to other software aspects.

The Impact of Code Review Coverage and Participation on Software Quality

Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.

Qt Apresentação

Sonar Metrics

O que é BIG DATA e como pode influenciar nossas vidas

O documento discute o Big Data, definido como conjuntos de dados difíceis de capturar, armazenar, analisar e visualizar com tecnologias atuais. Apresenta estatísticas sobre a quantidade de dados gerados diariamente e como empresas como Google, Facebook e Netflix usam a análise de Big Data. Também aborda possíveis aplicações em diagnósticos médicos, educação e matching de parceiros, bem como riscos à privacidade.

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

Mind map of terminologies used in context of Generative AI

Kumud Singh

RESUME BUILDER APPLICATION Project for students

KAMESHS29

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

Viewers also liked

SOCIAL METRICS INCLUDED IN PREDICTION MODELS ON SOFTWARE ENGINEERING: A MAPPI...

Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...

Identifying Hotspots in Software Build Processes

Orchestrating Change: An Artistic Representation of Software Evolution

Tracing Software Build Processes to Uncover License Compliance Inconsistencie...

Module System in Standard ML

UM ESTUDO EMPÍRICO DO USO DA COMUNICAÇÃO PARA CARACTERIZAR A OCORRÊNCIA DE DE...

Identifying Hotspots in the PostgreSQL Build Process

USING STRUCTURAL HOLES METRICS FROM COMMUNICATION NETWORKS TO PREDICT CHANGE ...

The Impact of Code Review Coverage and Participation on Software Quality

Qt Apresentação

Sonar Metrics

O que é BIG DATA e como pode influenciar nossas vidas

Viewers also liked (13)

SOCIAL METRICS INCLUDED IN PREDICTION MODELS ON SOFTWARE ENGINEERING: A MAPPI...

Collecting and Leveraging a Benchmark of Build System Clones to Aid in Qualit...

Identifying Hotspots in Software Build Processes

Orchestrating Change: An Artistic Representation of Software Evolution

Tracing Software Build Processes to Uncover License Compliance Inconsistencie...

Module System in Standard ML

UM ESTUDO EMPÍRICO DO USO DA COMUNICAÇÃO PARA CARACTERIZAR A OCORRÊNCIA DE DE...

Identifying Hotspots in the PostgreSQL Build Process

USING STRUCTURAL HOLES METRICS FROM COMMUNICATION NETWORKS TO PREDICT CHANGE ...

The Impact of Code Review Coverage and Participation on Software Quality

Qt Apresentação

Sonar Metrics

O que é BIG DATA e como pode influenciar nossas vidas

Recently uploaded

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

名前です男

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Mind map of terminologies used in context of Generative AI

Kumud Singh

RESUME BUILDER APPLICATION Project for students

KAMESHS29

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...