Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.
Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction.
In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.
Mining Co-Change Information to Understand when Build Changes are NecessaryShane McIntosh
As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...Shane McIntosh
This document discusses using build systems to uncover license compliance inconsistencies. It describes how build systems can be traced to construct dependency graphs and annotate components with license information. An empirical study found the approach accurately discovered inconsistencies with 88-100% precision and 98-100% recall. The technique prompted code changes in two systems within a few days to resolve license issues uncovered.
The Impact of Code Review Coverage and Participation on Software QualityShane McIntosh
Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.
Identifying Hotspots in the PostgreSQL Build ProcessShane McIntosh
Software developers rely on a fast and correct build system to compile their source code changes and produce modified deliverables for testing and deployment. The scale and complexity of the PostgreSQL build process makes build performance an important topic to discuss and address.
In this talk, we will introduce a new build performance analysis technique that identifies "build hotspots", i.e., files that are slow to rebuild (by analyzing a build dependency graph), yet change often (by analyzing version control history). We will discuss the identified hotspots in the 9.2.4 release of PostgreSQL.
Tracing Software Build Processes to Uncover License Compliance InconsistenciesShane McIntosh
Open Source Software (OSS) components form the basis for many software systems. While the use of OSS components accelerates development, client systems must comply with the license terms of the OSS components that they use. Failure to do so exposes client system distributors to possible litigation from copyright holders. Yet despite the importance of license compliance, tool support for license compliance assessment is lacking. In this paper, we propose an approach to extract and analyze the Concrete Build Dependency Graph (CBDG) of a software system by tracing system calls that occur at build-time. Through a case study of seven open source systems, we show that the extracted CBDGs: (1) accurately classify sources as included in or excluded from deliverables with 88%-100% precision and 98%-100% recall, and (2) can uncover license compliance inconsistencies in real software systems - two of which prompted code fixes in the CUPS and FFmpeg systems.
The Bash Dashboard (Or: How to Use Bash for Data Analysis)Bram Adams
Bash can be used for data analytics tasks like preparing and exploring data. The document demonstrates various Bash commands for working with CSV files containing app data. These include commands for viewing headers, counting rows, filtering, sorting, joining files, and aggregating data. Bash allows string manipulation and piping output between commands to programmatically analyze datasets from the command line.
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsDaniel Barker
A core part of our IT transformation program is the implementation of deployment pipelines for every application. Attendees will learn how to build abstract pipelines that will allow multiple types of applications to fit the same basic pipeline structure. This has been a big win for injecting change and updating legacy applications.
Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction.
In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.
Mining Co-Change Information to Understand when Build Changes are NecessaryShane McIntosh
As a software project ages, its source code is modified to add new features, restructure existing ones, and fix defects. These source code changes often induce changes in the build system, i.e., the system that specifies how source code is translated into deliverables. However, since developers are often not familiar with the complex and occasionally archaic technologies used to specify build systems, they may not be able to identify when their source code changes require accompanying build system changes. This can cause build breakages that slow development progress and impact other developers, testers, or even users. In this paper, we mine the source and test code changes that required accompanying build changes in order to better understand this co-change relationship. We build random forest classifiers using language-agnostic and language-specific code change characteristics to explain when code-accompanying build changes are necessary based on historical trends. Case studies of the Mozilla C++ system, the Lucene and Eclipse open source Java systems, and the IBM Jazz proprietary Java system indicate that our classifiers can accurately explain when build co-changes are necessary with an AUC of 0.60-0.88. Unsurprisingly, our highly accurate C++ classifiers (AUC of 0.88) derive much of their explanatory power from indicators of structural change (e.g., was a new source file added?). On the other hand, our Java classifiers are less accurate (AUC of 0.60-0.78) because roughly 75% of Java build co-changes do not coincide with changes to the structure of a system, but rather are instigated by concerns related to release engineering, quality assurance, and general build maintenance.
Tracing Software Build Processes to Uncover License Compliance Inconsistencie...Shane McIntosh
This document discusses using build systems to uncover license compliance inconsistencies. It describes how build systems can be traced to construct dependency graphs and annotate components with license information. An empirical study found the approach accurately discovered inconsistencies with 88-100% precision and 98-100% recall. The technique prompted code changes in two systems within a few days to resolve license issues uncovered.
The Impact of Code Review Coverage and Participation on Software QualityShane McIntosh
Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.
Identifying Hotspots in the PostgreSQL Build ProcessShane McIntosh
Software developers rely on a fast and correct build system to compile their source code changes and produce modified deliverables for testing and deployment. The scale and complexity of the PostgreSQL build process makes build performance an important topic to discuss and address.
In this talk, we will introduce a new build performance analysis technique that identifies "build hotspots", i.e., files that are slow to rebuild (by analyzing a build dependency graph), yet change often (by analyzing version control history). We will discuss the identified hotspots in the 9.2.4 release of PostgreSQL.
Tracing Software Build Processes to Uncover License Compliance InconsistenciesShane McIntosh
Open Source Software (OSS) components form the basis for many software systems. While the use of OSS components accelerates development, client systems must comply with the license terms of the OSS components that they use. Failure to do so exposes client system distributors to possible litigation from copyright holders. Yet despite the importance of license compliance, tool support for license compliance assessment is lacking. In this paper, we propose an approach to extract and analyze the Concrete Build Dependency Graph (CBDG) of a software system by tracing system calls that occur at build-time. Through a case study of seven open source systems, we show that the extracted CBDGs: (1) accurately classify sources as included in or excluded from deliverables with 88%-100% precision and 98%-100% recall, and (2) can uncover license compliance inconsistencies in real software systems - two of which prompted code fixes in the CUPS and FFmpeg systems.
The Bash Dashboard (Or: How to Use Bash for Data Analysis)Bram Adams
Bash can be used for data analytics tasks like preparing and exploring data. The document demonstrates various Bash commands for working with CSV files containing app data. These include commands for viewing headers, counting rows, filtering, sorting, joining files, and aggregating data. Bash allows string manipulation and piping output between commands to programmatically analyze datasets from the command line.
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsDaniel Barker
A core part of our IT transformation program is the implementation of deployment pipelines for every application. Attendees will learn how to build abstract pipelines that will allow multiple types of applications to fit the same basic pipeline structure. This has been a big win for injecting change and updating legacy applications.
BenchFlow: A Platform for End-to-end Automation of Performance Testing and An...Vincenzo Ferme
BenchFlow is an open-source expert system providing a complete platform for automating performance tests and performance analysis. We know that not all the developers are performance experts, but in nowadays agile environment, they need to deal with performance testing and performance analysis every day. In BenchFlow, the users define objective-driven performance testing using an expressive and SUT-aware DSL implemented in YAML. Then BenchFlow automates the end-to-end process of executing the performance tests and providing performance insights, dealing with system under test deployment relying on Docker technologies, distributing simulated users load on different server, error handling, performance data collection and performance metrics and insights computation.
My talk for SPEC Research Group DevOps (https://research.spec.org/devopswg) about BenchFlow. Discover BenchFlow: https://github.com/benchflow
Nobody Knows What It’s Like To Be the Bad Man: The Development Process for th...Work-Bench
Delivered by Max Kuhn, Pfizer Global R&D, and Zachary Deane–Mayer, Cognius, at the inaugural New York R Conference in New York City at Work-Bench on Friday, April 44th, and Saturday, April 25th.
Towards Holistic Continuous Software Performance AssessmentVincenzo Ferme
In agile, fast and continuous development lifecycles, software performance analysis is fundamental to confidently release continuously improved software versions. Researchers and industry practitioners have identified the importance of integrating performance testing in agile development processes in a timely and efficient way. However, existing techniques are fragmented and not integrated taking into account the heterogeneous skills of the users developing polyglot distributed software, and their need to automate performance practices as they are integrated in the whole lifecycle without breaking its intrinsic velocity. In this paper we present our vision for holistic continuous software performance assessment, which is being implemented in the BenchFlow tool. BenchFlow enables performance testing and analysis practices to be pervasively integrated in continuous development lifecycle activities. Users can specify performance activities (e.g., standard performance tests) by relying on an expressive Domain Specific Language for objective-driven performance analysis. Collected performance knowledge can be thus reused to speed up performance activities throughout the entire process.
My talk from The International Workshop on Quality-aware DevOps (QUDOS 2017). Cite us: http://dl.acm.org/citation.cfm?id=3053636
Becoming a Plumber: Building Deployment Pipelines - RevConfDaniel Barker
A core part of our IT transformation program is the implementation of deployment pipelines for every application. Attendees will learn how to build abstract pipelines that will allow multiple types of applications to fit the same basic pipeline structure. This has been a big win for injecting change and updating legacy applications.
.Net Hijacking to Defend PowerShell BSidesSF2017 Amanda Rousseau
With the rise of attacks implementing PowerShell in the recent months, there hasn’t been a solid solution for monitoring or prevention. Currently Microsoft released the AMSI solution for PowerShell v5 however this can also be bypassed. This talk will focus on utilizing various stealthy runtime .NET hijacking techniques implemented for blue teamer defenses for PowerShell attacks. The paper will start with a light intro into .NET and PowerShell, then a deeper explanation of various attacker techniques which will be explained in the perspective of the blue teamer. Techniques include assembly modification, class and method injection, compiler profiling, and C based function hooking.
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
The document discusses how task granularity at different levels (e.g. commits, pull requests, work items) can impact analyses of co-evolution in software projects. It finds that analyzing at the commit-level can overlook relationships between tasks that span multiple commits. Work item level analysis is recommended to provide a more complete view of co-evolution, as median of 29% of work items consist of multiple commits, and analyzing at the commit level would miss 24% of co-changed files and inability to group 83% of related commits.
This webinar by Oleksandr Navka (Lead Software Engineer, Consultant, GlobalLogic) was delivered at Java Community Webinar #2 on September 17, 2020.
Webinar agenda:
- tools for testing,
- features of creating a context for testing Spring-applications,
- context caching to speed up integration testing
More details and presentation: https://www.globallogic.com/ua/about/events/java-community-webinar-2/
These slides were presented at the Streaming Media West conference in 2016. This talk is also a reference to the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog:
http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Malware Unicorn gives a presentation on reverse engineering (RE) and the common patterns seen in malware. She discusses how RE is the foundation for vulnerability research, malware analysis, exploit development, and more. The talk covers common malware techniques like packing, evasion, cryptography, and shellcode. For each technique, Malware Unicorn explains what to look for in disassembly and provides tips on using debuggers and static analysis to analyze malware that uses these techniques. The overall presentation provides an introduction to RE and guides attendees on identifying and understanding common malware routines through disassembly.
Building a Video Encoding Pipeline at The New York TimesFlávio Ribeiro
These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Identifying Hotspots in Software Build ProcessesShane McIntosh
The document discusses identifying hotspots, or files that frequently change and are costly to rebuild, in software build processes. It presents an approach that constructs a build dependency graph from the build system, analyzes the graph to determine file change frequency and rebuild costs, and detects hotspots using a quadrant plot that highlights files that change often and have high rebuild costs. Case studies on two open source projects found a small number of hotspot files accounted for a majority of rebuild time. Focusing refactoring on hotspots could significantly improve build performance.
Orchestrating Change: An Artistic Representation of Software EvolutionShane McIntosh
Several visualization tools have been proposed to highlight interesting software evolution phenomena. These tools help practitioners to navigate large and complex software systems, and also support researchers in studying software evolution. However, little work has explored the use of sound in the context of software evolution. In this paper, we propose the use of musical interpretation to support exploration of software evolution data. In order to generate music inspired by software evolution, we use parameter-based sonification, i.e., a mapping of dataset characteristics to sound. Our approach yields musical scores that can be played synthetically or by a symphony orchestra. In designing our approach, we address three challenges: (1) the generated music must be aesthetically pleasing, (2) the generated music must accurately reflect the changes that have occurred, and (3) a small group of musicians must be able to impersonate a large development team. We assess the feasibility of our approach using historical data from Eclipse, which yields promising results.
This document discusses Unicode and its importance for character sets like Arabic, Sinhala, Tamil, and Chinese. It covers Unicode encoding and font faces, explaining that Unicode fonts allow all languages to be displayed together while non-Unicode fonts only support one language. The document also notes some implications of using Unicode, such as improved internationalization and interoperability across systems.
UM ESTUDO EMPÍRICO DO USO DA COMUNICAÇÃO PARA CARACTERIZAR A OCORRÊNCIA DE DE...Igor Wiese
[1] Um estudo empírico analisou a comunicação entre desenvolvedores e métricas de histórico de código para caracterizar dependências de mudança no projeto Ruby on Rails. [2] Os resultados mostraram que modelos de aprendizado de máquina podem caracterizar dependências fortes e fracas com alta precisão usando essas métricas. [3] As métricas mais relevantes foram densidade de comentários, centralidade dos desenvolvedores e métricas que medem a estrutura da rede de comunicação.
SOCIAL METRICS INCLUDED IN PREDICTION MODELS ON SOFTWARE ENGINEERING: A MAPPI...Igor Wiese
This document discusses a mapping study on the use of social metrics in software engineering prediction models. It aims to identify which social metrics have been used and whether they had a positive effect. The study found that social metrics were often classified under other dimensions and there was inconsistent terminology. It identified papers reporting on various social metrics and grouped them into categories and sub-categories. The results showed that most papers reported social metrics had a positive effect in prediction models, while some reported negative or neutral effects. The conclusions note more research is needed on social metrics in different contexts and using larger datasets.
This document summarizes the results of a large-scale empirical study on the relationship between build systems and build maintenance activity. The study analyzed over 800,000 open source projects to compare how different build technologies (e.g. Make, Autotools, Maven) affect build churn, source code coupling, and authorship over time. The key findings are that framework-based build systems tend to have higher build churn, tighter source code coupling, and decreasing authorship as projects migrate to more advanced technologies.
The document discusses the module system of Standard ML. It begins with an introduction to modularity and why it is important for writing large programs. It then provides an overview of Standard ML, including its history, features like higher-order functions and polymorphism. The document focuses on how Standard ML implements modularity using modules. It explains how modules allow hiding implementation details while still allowing controlled sharing between modules. It provides examples of defining modules for stacks and lists to demonstrate abstract types.
USING STRUCTURAL HOLES METRICS FROM COMMUNICATION NETWORKS TO PREDICT CHANGE ...Igor Wiese
This document examines using structural hole metrics (SHM) from communication networks to predict change dependencies between software artifacts. It finds that SHM can predict change dependencies with an area under the curve over 0.7. Constraint and hierarchy SHM were most important for one project, while commits and updates were most important when including process metrics. The study provides initial evidence that SHM obtained from communication networks can predict change dependencies as suggested by Conway's Law. Future work could explore additional projects, metrics, and comparisons to other software aspects.
The document discusses build system maintenance. It notes that build code is complex and requires 12% of a developer's time on average. Build bugs can affect end users. Four dimensions of build maintenance are discussed: size, evolution, coupling, and people involvement. Studies found that build churn is greater than source code churn, and that build maintenance is often dispersed across many team members rather than concentrated in a small team. Tool support is needed to help with build maintenance tasks.
O documento discute o framework Qt, incluindo suas características, histórico, vantagens e módulos disponíveis. O Qt permite o desenvolvimento de aplicações multiplataforma e possui uma estrutura pronta para iniciar novos projetos.
BenchFlow: A Platform for End-to-end Automation of Performance Testing and An...Vincenzo Ferme
BenchFlow is an open-source expert system providing a complete platform for automating performance tests and performance analysis. We know that not all the developers are performance experts, but in nowadays agile environment, they need to deal with performance testing and performance analysis every day. In BenchFlow, the users define objective-driven performance testing using an expressive and SUT-aware DSL implemented in YAML. Then BenchFlow automates the end-to-end process of executing the performance tests and providing performance insights, dealing with system under test deployment relying on Docker technologies, distributing simulated users load on different server, error handling, performance data collection and performance metrics and insights computation.
My talk for SPEC Research Group DevOps (https://research.spec.org/devopswg) about BenchFlow. Discover BenchFlow: https://github.com/benchflow
Nobody Knows What It’s Like To Be the Bad Man: The Development Process for th...Work-Bench
Delivered by Max Kuhn, Pfizer Global R&D, and Zachary Deane–Mayer, Cognius, at the inaugural New York R Conference in New York City at Work-Bench on Friday, April 44th, and Saturday, April 25th.
Towards Holistic Continuous Software Performance AssessmentVincenzo Ferme
In agile, fast and continuous development lifecycles, software performance analysis is fundamental to confidently release continuously improved software versions. Researchers and industry practitioners have identified the importance of integrating performance testing in agile development processes in a timely and efficient way. However, existing techniques are fragmented and not integrated taking into account the heterogeneous skills of the users developing polyglot distributed software, and their need to automate performance practices as they are integrated in the whole lifecycle without breaking its intrinsic velocity. In this paper we present our vision for holistic continuous software performance assessment, which is being implemented in the BenchFlow tool. BenchFlow enables performance testing and analysis practices to be pervasively integrated in continuous development lifecycle activities. Users can specify performance activities (e.g., standard performance tests) by relying on an expressive Domain Specific Language for objective-driven performance analysis. Collected performance knowledge can be thus reused to speed up performance activities throughout the entire process.
My talk from The International Workshop on Quality-aware DevOps (QUDOS 2017). Cite us: http://dl.acm.org/citation.cfm?id=3053636
Becoming a Plumber: Building Deployment Pipelines - RevConfDaniel Barker
A core part of our IT transformation program is the implementation of deployment pipelines for every application. Attendees will learn how to build abstract pipelines that will allow multiple types of applications to fit the same basic pipeline structure. This has been a big win for injecting change and updating legacy applications.
.Net Hijacking to Defend PowerShell BSidesSF2017 Amanda Rousseau
With the rise of attacks implementing PowerShell in the recent months, there hasn’t been a solid solution for monitoring or prevention. Currently Microsoft released the AMSI solution for PowerShell v5 however this can also be bypassed. This talk will focus on utilizing various stealthy runtime .NET hijacking techniques implemented for blue teamer defenses for PowerShell attacks. The paper will start with a light intro into .NET and PowerShell, then a deeper explanation of various attacker techniques which will be explained in the perspective of the blue teamer. Techniques include assembly modification, class and method injection, compiler profiling, and C based function hooking.
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
The document discusses how task granularity at different levels (e.g. commits, pull requests, work items) can impact analyses of co-evolution in software projects. It finds that analyzing at the commit-level can overlook relationships between tasks that span multiple commits. Work item level analysis is recommended to provide a more complete view of co-evolution, as median of 29% of work items consist of multiple commits, and analyzing at the commit level would miss 24% of co-changed files and inability to group 83% of related commits.
This webinar by Oleksandr Navka (Lead Software Engineer, Consultant, GlobalLogic) was delivered at Java Community Webinar #2 on September 17, 2020.
Webinar agenda:
- tools for testing,
- features of creating a context for testing Spring-applications,
- context caching to speed up integration testing
More details and presentation: https://www.globallogic.com/ua/about/events/java-community-webinar-2/
These slides were presented at the Streaming Media West conference in 2016. This talk is also a reference to the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog:
http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Malware Unicorn gives a presentation on reverse engineering (RE) and the common patterns seen in malware. She discusses how RE is the foundation for vulnerability research, malware analysis, exploit development, and more. The talk covers common malware techniques like packing, evasion, cryptography, and shellcode. For each technique, Malware Unicorn explains what to look for in disassembly and provides tips on using debuggers and static analysis to analyze malware that uses these techniques. The overall presentation provides an introduction to RE and guides attendees on identifying and understanding common malware routines through disassembly.
Building a Video Encoding Pipeline at The New York TimesFlávio Ribeiro
These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Identifying Hotspots in Software Build ProcessesShane McIntosh
The document discusses identifying hotspots, or files that frequently change and are costly to rebuild, in software build processes. It presents an approach that constructs a build dependency graph from the build system, analyzes the graph to determine file change frequency and rebuild costs, and detects hotspots using a quadrant plot that highlights files that change often and have high rebuild costs. Case studies on two open source projects found a small number of hotspot files accounted for a majority of rebuild time. Focusing refactoring on hotspots could significantly improve build performance.
Orchestrating Change: An Artistic Representation of Software EvolutionShane McIntosh
Several visualization tools have been proposed to highlight interesting software evolution phenomena. These tools help practitioners to navigate large and complex software systems, and also support researchers in studying software evolution. However, little work has explored the use of sound in the context of software evolution. In this paper, we propose the use of musical interpretation to support exploration of software evolution data. In order to generate music inspired by software evolution, we use parameter-based sonification, i.e., a mapping of dataset characteristics to sound. Our approach yields musical scores that can be played synthetically or by a symphony orchestra. In designing our approach, we address three challenges: (1) the generated music must be aesthetically pleasing, (2) the generated music must accurately reflect the changes that have occurred, and (3) a small group of musicians must be able to impersonate a large development team. We assess the feasibility of our approach using historical data from Eclipse, which yields promising results.
This document discusses Unicode and its importance for character sets like Arabic, Sinhala, Tamil, and Chinese. It covers Unicode encoding and font faces, explaining that Unicode fonts allow all languages to be displayed together while non-Unicode fonts only support one language. The document also notes some implications of using Unicode, such as improved internationalization and interoperability across systems.
UM ESTUDO EMPÍRICO DO USO DA COMUNICAÇÃO PARA CARACTERIZAR A OCORRÊNCIA DE DE...Igor Wiese
[1] Um estudo empírico analisou a comunicação entre desenvolvedores e métricas de histórico de código para caracterizar dependências de mudança no projeto Ruby on Rails. [2] Os resultados mostraram que modelos de aprendizado de máquina podem caracterizar dependências fortes e fracas com alta precisão usando essas métricas. [3] As métricas mais relevantes foram densidade de comentários, centralidade dos desenvolvedores e métricas que medem a estrutura da rede de comunicação.
SOCIAL METRICS INCLUDED IN PREDICTION MODELS ON SOFTWARE ENGINEERING: A MAPPI...Igor Wiese
This document discusses a mapping study on the use of social metrics in software engineering prediction models. It aims to identify which social metrics have been used and whether they had a positive effect. The study found that social metrics were often classified under other dimensions and there was inconsistent terminology. It identified papers reporting on various social metrics and grouped them into categories and sub-categories. The results showed that most papers reported social metrics had a positive effect in prediction models, while some reported negative or neutral effects. The conclusions note more research is needed on social metrics in different contexts and using larger datasets.
This document summarizes the results of a large-scale empirical study on the relationship between build systems and build maintenance activity. The study analyzed over 800,000 open source projects to compare how different build technologies (e.g. Make, Autotools, Maven) affect build churn, source code coupling, and authorship over time. The key findings are that framework-based build systems tend to have higher build churn, tighter source code coupling, and decreasing authorship as projects migrate to more advanced technologies.
The document discusses the module system of Standard ML. It begins with an introduction to modularity and why it is important for writing large programs. It then provides an overview of Standard ML, including its history, features like higher-order functions and polymorphism. The document focuses on how Standard ML implements modularity using modules. It explains how modules allow hiding implementation details while still allowing controlled sharing between modules. It provides examples of defining modules for stacks and lists to demonstrate abstract types.
USING STRUCTURAL HOLES METRICS FROM COMMUNICATION NETWORKS TO PREDICT CHANGE ...Igor Wiese
This document examines using structural hole metrics (SHM) from communication networks to predict change dependencies between software artifacts. It finds that SHM can predict change dependencies with an area under the curve over 0.7. Constraint and hierarchy SHM were most important for one project, while commits and updates were most important when including process metrics. The study provides initial evidence that SHM obtained from communication networks can predict change dependencies as suggested by Conway's Law. Future work could explore additional projects, metrics, and comparisons to other software aspects.
The document discusses build system maintenance. It notes that build code is complex and requires 12% of a developer's time on average. Build bugs can affect end users. Four dimensions of build maintenance are discussed: size, evolution, coupling, and people involvement. Studies found that build churn is greater than source code churn, and that build maintenance is often dispersed across many team members rather than concentrated in a small team. Tool support is needed to help with build maintenance tasks.
O documento discute o framework Qt, incluindo suas características, histórico, vantagens e módulos disponíveis. O Qt permite o desenvolvimento de aplicações multiplataforma e possui uma estrutura pronta para iniciar novos projetos.
O que é BIG DATA e como pode influenciar nossas vidasElaine Naomi
O documento discute o Big Data, definido como conjuntos de dados difíceis de capturar, armazenar, analisar e visualizar com tecnologias atuais. Apresenta estatísticas sobre a quantidade de dados gerados diariamente e como empresas como Google, Facebook e Netflix usam a análise de Big Data. Também aborda possíveis aplicações em diagnósticos médicos, educação e matching de parceiros, bem como riscos à privacidade.
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsJonas Bonér
Abstract:
The demands and expectations for applications have changed dramatically in recent years. Applications today are deployed on a wide range of infrastructure; from mobile devices up to thousands of nodes running in the cloud—all powered by multi-core processors. They need to be rich and collaborative, have a real-time feel with millisecond response time and should never stop running. Additionally, modern applications are a mashup of external services that need to be consumed and composed to provide the features at hand.
We are seeing a new type of applications emerging to address these new challenges—these are being called Reactive Applications. In this talk we will discuss four key traits of Reactive; Responsive, Resilient, Elastic and Message-Driven—how they impact application design, how they interact, their supporting technologies and techniques, how to think when designing and building them—all to make it easier for you and your team to Go Reactive.
Intended Audience:
Programmers, architects, CIO/CTOs and everyone with a desire to challenge the status quo and expand their horizons on how to tackle the current and future challenges in the computing industry.
The document discusses common mistakes made when prioritizing speed over quality, such as de-emphasizing testing, releases, operations, insights, security, and knowledge. It recommends focusing on system design, configurations, limits, growth, processes, resources, and building resilience through redundancies and documentation. Testing the full system, having playbooks, and minding assumptions and dependencies are emphasized.
This study analyzed 10 large open source projects to understand build system maintenance effort. It found that build systems accounted for around 9% of total files on average. Build code evolved at a similar rate to source code, with some projects experiencing higher build churn. Changes to build and source code were often logically coupled, with some work items affecting both. Responsibility for build maintenance was usually distributed across developers rather than concentrated in a small team. The findings suggest build systems require significant effort to maintain and that tool support could help address this.
Empirical Evaluations in Software Engineering Research: A Personal PerspectiveSAIL_QU
1. The document discusses various pitfalls and issues with empirical studies in software engineering research, such as including correlated metrics, not handling imbalanced data properly, and not using appropriate validation techniques.
2. It notes that many early studies did not address these issues adequately. Proper techniques like resampling imbalanced data and using out-of-sample validation are important to avoid inaccurate results.
3. The presentation argues that researchers should focus more on deeper analysis and partnering with practitioners, rather than trying to generalize results or follow industry trends without proper evaluation. Trailblazing research and transparency should be encouraged.
The document provides an overview of a course on machine learning, including defining machine learning and artificial intelligence, discussing different applications of machine learning such as speech recognition, robotics, and computer vision, and outlining the topics that will be covered in the course such as classifiers, regression, neural networks, and learning theory. The course aims to provide students with the tools and foundations of machine learning including optimization, statistics, and computer science to solve problems in areas like natural language processing, computer vision, robotics, and medicine.
This document summarizes Tim Sheiner's presentation on how a UX designer can apply their design process to understand DevOps teams and help solve their problems. The UX designer uses personas, mental models, analogies and prototypes to communicate effectively with operations teams. They learn about the teams' perspectives through interviews and observation. Prototypes help establish a shared understanding of problems and iterate on solutions collaboratively and at low risk. The goal is for the UX designer to become a modeler who can effectively communicate and work with operations teams using models.
This document provides an introduction to microservices, including definitions of microservices and where they came from. It discusses core principles like independent services modeled around business domains. It covers reasons for using microservices like faster development cycles and team autonomy. Challenges, good practices, and deployment considerations for microservices are also outlined.
Efficient Query Processing Using Machine LearningDatabricks
This document discusses using machine learning models for efficient and reliable query processing over unstructured data. It presents challenges with directly using ML models for queries due to models being unreliable and expensive to run. The author's work addresses these challenges with two key ideas: (1) using proxy models to generate cheap approximations to reduce oracle model calls, and (2) sampling techniques to provide statistical guarantees on query accuracy while minimizing costs. The techniques are applied to different query types like selection, aggregation, and limit queries. Evaluation shows the methods outperform baselines in achieving accuracy targets with fewer oracle model evaluations. The work also aims to improve ML models by allowing users to specify when errors may be occurring.
The document discusses the challenges of adopting autonomic computing capabilities in existing large-scale systems. Some key challenges discussed include understanding the runtime behavior of complex systems, minimizing the footprint of changes, and gaining developer acceptance through proofs of concept and transparency. The authors describe their approach to addressing these challenges in a large, mission-critical system with millions of lines of code.
The Paved PaaS to Microservices at Netflix (IAS2017 Nanjing)Yunong Xiao
Traditionally, a tug of war has existed between service reliability and engineering velocity. Increasing speed to fuel product innovation has meant making tradeoffs in reliability.
Netflix standardizes common functionality, like service discovery, configuration, metrics, logging, and RPC across services. This frees teams to focus on the unique business value of their service. It also enables us to evolve and maintain platform components independently from individual services.
Even with a standard set of components, service owners still need to combine these disparate elements into a coherent platform. We reduce this friction by providing a preassembled platform where teams only need to provide their business logic, and not worry about assembling the service from scratch.
We can further streamline the service lifecycle by providing automation and tooling for development, testing, deployment and operations. We provide "one click" solutions to automatically generate the associated pipelines, machinery, and infrastructure that's required to run their service reliably in production.
These patterns, while described in a Netflix context, can be broadly applicable to increase both reliability and velocity of your microservices architecture.
(DVO205) Monitoring Evolution: Flying Blind to Flying by InstrumentAmazon Web Services
Today, AdRoll runs its infrastructure by instrumentation: constantly asking empirical questions, analyzing data for answers, and designing new features with instrumentation in mind to understand how functionality will work upon release. AdRoll’s development methodology did not start out this way, however. It took a cultural shift and many new tools and processes to adopt this approach. In this session, AdRoll and Datadog will discuss how to evolve your organization from a state of “flying blind” to a culture focused on monitoring and data-based decisions. Session sponsored by Datadog.
Bypassing Secure Boot using Fault InjectionRiscure
The Fault Injection attack surface of Secure Boot implementations is determined by the specifics of their design and implementation. Using a generic Secure Boot design we detail multiple vulnerabilities (~10) using examples in source code, disassembly and hardware. We will determine what the impact is of the target's design on its Fault Injection attack surface: from high-level architecture to low-level implementation details. Research originally presented in November 2016 at BlackHat Europe.
This document summarizes Chaos Engineering techniques used at T-Mobile for their Cloud Foundry platform. It introduces the tools Monarch and Turbulence++ that were developed to inject failures at the infrastructure and application levels. Examples of chaos attacks demonstrated include killing VMs, blocking network traffic, and crashing application instances. The tools help test the resiliency of the platform and applications deployed on it. Limitations and potential improvements discussed include merging the two tools and supporting multiple clusters.
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
Machine learning model fairness and interpretability are critical for data scientists, researchers and developers to explain their models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them.
In this session, Francesca will go over a few methods and tools that enable you to "unpack” machine learning models, gain insights into how and why they produce specific results, assess your AI systems fairness and mitigate any observed fairness issues.
Using open-source fairness and interpretability packages, attendees will learn how to:
- Explain model prediction by generating feature importance values for the entire model and/or individual data points.
- Achieve model interpretability on real-world datasets at scale, during training and inference.
- Use an interactive visualization dashboard to discover patterns in data and explanations at training time.
- Leverage additional interactive visualizations to assess which groups of users might be negatively impacted by a model and compare multiple models in terms of their fairness and performance.
This document summarizes Bram Adams' PhD dissertation on understanding the co-evolution of source code and build systems. It introduces the research topic, outlines tools developed to analyze build systems, studies the evolution of the Linux kernel build system over time, presents conceptual reasons for why source code and build systems co-evolve, discusses lessons learned from the PhD research process, and concludes by asking for questions.
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard
In Green Custard's 19th Friday talk, Zoltan explores the subject of Chaos Engineering
Topics covered:
- What is chaos engineering?
- Why would anyone do this?
- Availability
- Chaos engineering in practice
- The four golden signals
- Chaos engineering in practice
- Chaos Monkey
- The Simian Army
Green Custard is a custom software development consultancy. To discover more about their work and the team visit www.green-custard.com.
Walls, Pillars and Beams: A 3D Decomposition of Quality Anomalies (vissoft2016)Yuriy Tymchuk
Quality rules are used to capture important implementation and design decisions embedded in a software system’s architecture. They can automatically analyze software and assign quality grades to its components. To provide a meaningful evaluation of quality, rules have to stay up-to-date with the continuously evolving system that they describe. However one would encounter unexpected anomalies during a historical overview because the notion of quality is always changing, while the qualitative evolution analysis requires it to remain constant.
To understand the anomalies in a quality history of a real-world software system we use an immersive visualization that lays out the quality fluctuations in three dimensions based on two co-evolving properties: quality rules and source code. This helps us to identify and separate the impact caused by the changes of each property, and allows us to detect significant mistakes that happened during the development process.
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
1) The document describes a framework for using reinforcement learning and simulation to automatically learn near-optimal intrusion responses for large-scale IT infrastructures.
2) A key challenge is the high sample and computational complexity of scaling reinforcement learning to large infrastructures.
3) The framework addresses this by decomposing the infrastructure into additive subgames and exploiting the optimal substructure property to learn intrusion responses through scalable decomposition methods.
2010-03-31 - VU Amsterdam - Experiences testing safety critical systemsJaap van Ekris
1) Testing safety critical systems is challenging because software often contains errors and failures can have catastrophic consequences, so systems must be designed and tested to extremely high standards of reliability.
2) The document discusses standards like IEC 61508 that provide requirements for safety integrity levels and risk management in developing safety critical systems.
3) Rigorous verification techniques are needed including reviews, static analysis, unit testing with high code coverage, integration testing of components, system testing of full environments, and acceptance testing of real systems.
Similar to Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments (20)
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
What is Augmented Reality Image Trackingpavan998932
Augmented Reality (AR) Image Tracking is a technology that enables AR applications to recognize and track images in the real world, overlaying digital content onto them. This enhances the user's interaction with their environment by providing additional information and interactive elements directly tied to physical images.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
1. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
2. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Martin
Poehlmann
Elmar
Juergens
3. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Martin
Poehlmann
Elmar
Juergens
Audris
Mockus
4. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Bram
Adams
Ahmed E.
Hassan
Martin
Poehlmann
Elmar
Juergens
Audris
Mockus
5. Collecting and Leveraging a
Benchmark of Build System Clones
to Aid in Quality Assessments
Shane
McIntosh
@shane_mcintosh
shanemcintosh@acm.org
Bram
Adams
Ahmed E.
Hassan
Martin
Poehlmann
Elmar
Juergens
Audris
Mockus
Brigitte
Haupt
Christian
Wagner
21. “...nothing can be
said to be certain,
except death and
taxes”
- Benjamin Franklin
The Build “Tax”
An Empirical Study of Build
Maintenance Effort!
S. McIntosh, B. Adams, T. H. D.
Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source
changes require build
changes, too!
11
22. “...nothing can be
said to be certain,
except death and
taxes”
- Benjamin Franklin
The Build “Tax”
An Empirical Study of Build
Maintenance Effort!
S. McIntosh, B. Adams, T. H. D.
Nguyen, Y. Kamei, A. E. Hassan
[ICSE 2011]
Up to 27% of source
changes require build
changes, too!
How do practitioners cope
with build maintenance?
11
28. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
13
29. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
13
30. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
13
31. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
Clone coverageof 94%-99%
13
32. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
Clone coverageof 94%-99%
Inflation due to cloning of
10x-23x
13
33. Excessive cloning makes build
maintenance painful
!
!
!
!
!
30 custom business
applications
One monolithicbuild system
1.1 million lines
of build logic
Clone coverageof 94%-99%
Inflation due to cloning of
10x-23x
Build changes manuallyduplicated 30 times
13
39. 14
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
40. 14
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
45. 15
Large collection of open source
3,872 projects
2,597 C/C++ projects
Autotools CMake
1,275 Java projects
Ant Maven
15
46. 16
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
55. 19
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical
56. 19
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical>50% clone
coverage
is common
<30% clone
coverage
is common
57. 20
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical>50% clone
coverage
is common
<30% clone
coverage
is common
64. Manual analysis of a statistically
representative sample of clones
Autotools CMakeAnt Maven Total
56,521 71,543 23,723 3,746All clones 155,533
Sample!
(95%±5%)
382 382 378 349 1,491
Config.
Const.
Cert.
Pkg.
Depl.
32% 79% 22% 40%
64% 17% 56% 66%
12% 4% 13% 11%
25% 21% 21% 2%
11% 1% 9% 7%
Cloning shifts from
construction to
configuration
21
65. Manual analysis of a statistically
representative sample of clones
Autotools CMakeAnt Maven Total
56,521 71,543 23,723 3,746All clones 155,533
Sample!
(95%±5%)
382 382 378 349 1,491
Config.
Const.
Cert.
Pkg.
Depl.
32% 79% 22% 40%
64% 17% 56% 66%
12% 4% 13% 11%
25% 21% 21% 2%
11% 1% 9% 7%
Cloning shifts from
construction to
configuration
Construction is the
most heavily cloned
C/C++ build phase
21
66. Manual analysis of a statistically
representative sample of clones
Autotools CMakeAnt Maven Total
56,521 71,543 23,723 3,746All clones 155,533
Sample!
(95%±5%)
382 382 378 349 1,491
Config.
Const.
Cert.
Pkg.
Depl.
32% 79% 22% 40%
64% 17% 56% 66%
12% 4% 13% 11%
25% 21% 21% 2%
11% 1% 9% 7%
Cloning shifts from
construction to
configuration
Construction is the
most heavily cloned
C/C++ build phase
Rarely
cloned due
to CPack
21
67. 22
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical>50% clone
coverage
is common
<30% clone
coverage
is common
68. 22
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
>50% clone
coverage
is common
<30% clone
coverage
is common
69. 23
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
>50% clone
coverage
is common
<30% clone
coverage
is common
71. 24
Teams often migrate from one technology to another
Autotools CMakeAnt Maven
Could technology migration
help to reduce cloning?
72. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
73. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
More cloning in
Maven than Ant
74. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
More cloning in
Maven than Ant
High thresholds!
are very similar
75. Migration is not a silver bullet
●
●
●
●
●
●
0.39
0.47
0.52
0.70
0.75
0.82
0.00 0.00
0.04
0.22
0.30
0.47
0.15
0.25
0.36
0.68
0.77
0.84
0.00 0.00 0.00
0.18
0.26
0.39
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Proportion of Systems
CloneCoverage
Abnormality
Very high
High
Moderately high
Normal
Moderately low
Low
Very low
Technology
● Ant
Autotools
CMake
Maven
25
More cloning in
Maven than Ant
High thresholds!
are very similar
Q: How are they avoidingbuild cloning?
76. 26
Using abstraction mechanisms not!
provided by the build technologies
<!-- Define references to files containing common targets -->
<!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">
]>
...
<project name="bea" default="all">
<!-- Include the file containing common targets. -->
&modules -common;
</project >
Listing 1: Using XML entity expansion to
common build code in the Keel system.
77. 26
Using abstraction mechanisms not!
provided by the build technologies
<!-- Define references to files containing common targets -->
<!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">
]>
...
<project name="bea" default="all">
<!-- Include the file containing common targets. -->
&modules -common;
</project >
Listing 1: Using XML entity expansion to
common build code in the Keel system.
Store an external block ofXML in a macro
78. 26
Using abstraction mechanisms not!
provided by the build technologies
<!-- Define references to files containing common targets -->
<!DOCTYPE project [
<!ENTITY modules -common SYSTEM "../ modules -common.ent">
]>
...
<project name="bea" default="all">
<!-- Include the file containing common targets. -->
&modules -common;
</project >
Listing 1: Using XML entity expansion to
common build code in the Keel system.
Store an external block ofXML in a macro
Expand the macro
79. 27
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
>50% clone
coverage
is common
<30% clone
coverage
is common
80. 27
How much
cloning is
typical?
What type of
logic is being
cloned?
Configuration
details
Packaging
specifications
What can be
done to mitigate
cloning?
Measured Typical Cloning shifts
from const. to
config.
Mostly
construction
clones
Ant Maven
may reduce
cloning
Use of
“creative”
abstraction
>50% clone
coverage
is common
<30% clone
coverage
is common