167 - Productivity for proof engineering

•

0 likes•338 views

ESEM 2014

This document summarizes a study on proof productivity in formal verification projects. It analyzes data from 9 proof projects involving the seL4 microkernel. The study found that total project effort is strongly correlated with final proof size. Additionally, individual contributors' effort is also strongly correlated with their proof output size. While schedule pressure and team size may impact effort, the sample size was too small to draw significant conclusions. The study aims to improve understanding of cost effectiveness in formal proof engineering.

Software

NICTA Copyright 2014
Productivity for
Proof
Engineering
M. Staples, R. Jeffery,
J. Andronick, T. Murray, G. Klein,
R. Kolanski

History
• seL4 concluded successfully by end 2007
• 10,000 lines of C code
• 2.2 person years of effort
• L4.verified > 20 person years
• For cost effective proof engineering a key
consideration is proof productivity.
NICTA Copyright 2014
3

This study - Specs
• Retrospective 9 projects from L4.verified.
• All used Isabelle theorem prover.
• Three formal specifications of seL4 –
– Exec – models an executable representation of
seL4’s design
– Abstract – complete functional specification
– CapDL – capabilities (access rights) between
components
NICTA Copyright 2014
5

This study - Proofs
• Six proofs – Three refinement proofs –
– Code-to-exec,
– Exec-to-abstract,
– Abstract-to-CapDL.
• Two security proofs –
– Info.flow and
– Integrity
• CapDL policy proof.
NICTA Copyright 2014
6

Measures
• Effort – in person weeks
• Output – Lines of proof
• Other variables – maximum team size, schedule
pressure, overall difficulty, years experience with
Isabelle, formal methods or theorem proving, the
domain (operating systems).
NICTA Copyright 2014
7

The data
NICTA Copyright 2014
8
Final Size
(Kilo Lines of
proof)
Total Effort
(Person weeks)
Sched. Pressure Overall Diffic. Max Team
(Headcount)
CapDL Spec 2.14 27.5 AV LO 5
CapDL-policy proof 0.85 11.3 LO AV 1
Abstract-to-CapDL
Refinement
20.4 66 AV AV 5
Integrity 7.05 28.5 V. HI HI 4
Info.Flow 27.1 75.9 V.HI V.HI 8
Exec-to-Abstract
Refinement
96.6 368 HI V.HI 6
Code-to-Exec
Refinement
53.34 138 V.HI HI 6
Exec Spec Haskell 6.01 92 AV HI 1
Abstract Spec 4.9 15.3 AV AV 3

Effort – Size Plot for projects
NICTA Copyright 2014
9
!

Project relationships
• Total Project Effort = 9.98 + 3.35*Final Size
R2 = 0.914, p<0.001
• Possible outliers – large abstract refinement and
executable spec.
• Weak evidence that schedule pressure is
associated with decreased effort, and overall
difficulty and maximum team size with increased
effort. But small sample size and not significant
at 0.05. Experience not significant.
NICTA Copyright 2014
10

Effort – Size plot for individuals
NICTA Copyright 2014
11
!

Individual relationships
• 24 Individual contributions to five projects
• R2 = 0.93, p<0.001
NICTA Copyright 2014
12

Threats
• construct validity
– Limitations of lines of proof as a size measure (?)
– Subjective measures carefully defined
• external validity
– seL4 only therefore limited, but aids internal validity
– Generalization not known
• Internal validity
– Wherever possible measures were carefully defined
and reviewed by multiple persons
– Factors not measured?
NICTA Copyright 2014
13

Conclusions
• Proof engineering can bring the benefits of
formal verification to more software engineering
projects, but understanding cost effectiveness is
an issue.
• We find proof size and effort are strongly related
for projects and individuals in L4verified
• Significant opportunity for the empirical
community to help understand rework, tools and
techniques, proof patterns, reuse and so on in
proof engineering.
NICTA Copyright 2014
14

What's hot

To realize the enormous benefits possible with DevOps Continuous Delivery requires a strategic approach and adherence to best practices for leadership, culture, organization, process and technologies. Preferred product and systems architectures are described in this deck. The slide deck is a high level summary of an one day course being delivered at the DevOps India Summit in Banglaru August 30, 2018.

Engineering Continuous Delivery Architectures

Marc Hornbeek

Peter Kupec Resume 2020

Peter Kupec

Engineering Operations

Cybera Inc.

Gap Assessment for DevOps

Marc Hornbeek

Are your DevOps and Security teams friends or foes?

Reuven Harrison

The NEDAS Live Seminar Series: Assuring Critical Communications for Public Safety: Supporting the Effort to Verify Indoor Network Performance was held on December 7, 2017. The event presented by David Adams, Senior Product Manager, PCTEL RF Solutions and Jason Chambers, CSM & CPM Service Manager – SeaTac, Day Wireless Systems. The program provided an overview of the various requirements for assuring indoor coverage and performance including: -The leading organizations that provide specifications for network performance and testing -How the various jurisdictions are adopting and adapting these for their own use -What are the requirements and various ways they are applied -Key challenges to putting requirements in place and enforcing them -Effective solutions to overcome these challenges, including live demonstration -Case studies of successfully assuring performance This seminar was sponsored by PCTEL and DAY Wireless Systems. For more information on NEDAS programs, events and sponsorship opportunities, visit www.nedas.com

A recap of the PCTEL webinar hosted by NEDAS on December 7, 2017

Kishor_cv

Abhi

Mobile trends v3.0

Gareth conduit intellegens

The Advanced Materials Show

Test Driven Infrastructure with Serverspec and KitchenCI

Adedayo Akinpelu

Lido/Flight is a software build by Lufthansa Systems over the last 20+ years and is used by more than 80 airlines around the world. Its purpose is to calculate an optimal trajectory for a flight event. Due to the grown monolithic architecture, it had become very costly to maintain and enhance the software. We decided to change the architecture towards a microservices architecture and the way we build our software should follow the principals of Disciplined Agile Delivery. Mid of 2016 we have been through the first steps of our agile transformation and faced issues to rollout our software to the customers. Therefore, we decided to have a deeper look into DevOps practices and principles. Management agreed to run this as a project and I was assigned as project manager. Out of this role I will share the insights how we stated the project, what we learned during the project and what I would do different if I would have the chance to do it again.

Dev opscon 2019_kickstart_via_project

Rene Lippert

Aeckerle Resume

Kevin Aeckerle

Databasedemo3

Alex Jou

Case study on operating an Offshore Delivery Center (ODC)

Oak Systems

In recent years, topics such as artificial intelligence (AI), machine learning and other intelligent image processing technologies have been a major theme at radiology conferences across the world. Many keynotes and presentations have focused on the benefits of innovative medical imaging products or the algorithms themselves. But most have yet to address how these innovative approaches would be integrated into radiology workflows in practice. The simple fact is that deploying just one algorithmic product – AI or otherwise - at a single facility is extremely time-consuming for PACS administrators and IT departments, and then configuring this to fit a standard workflow takes even more time. Then imagine doing this for multiple solutions, and it quickly becomes clear why many innovative new technologies are not being deployed outside of major academic facilities with significant resources.

The Challenges of Integrating Algorithmic Solutions into Clinical Workflows

Greg Kingston

Quality Engineering roles continue to evolve and will be entirely different in the future. At Gannett | USA Today Network, the change has started by blurring the lines between Test Automation and DevOps daily tasks with Quality Engineering owning continuous integration (CI), defining CI best practices, building the CI pipeline, and being the quality gatekeeper of product releases. - Setting expectations for CI- - CI ownership as a community activity, not an individual one - Defining a continuous testing strategy - Designing repeatable and disposable CI architecture - Setting CI standards - Quality Engineering roles and responsibilities

Achieving CI Excellence with Quality Engineering

Greg Sypolt

Zero Emissions Construction Site

Cyril Stahl

DevOps X

ONUR FENAR

R is used in a vast ways. From pure ad-hoc by hobbysts to an organized and structured way in an enterprise. Each way of R usage brings different reproducibility challenges. Going through range of typical workflows we will show that understanding reproducibility must start with understanding your workflow. Presenting workflows we will show how we deal reproducibiilty challenges with open-source R Suite (http://rsuite.io) solution developed by us to support our large scale R development.

Know your R usage workflow to handle reproducibility challenges

Wit Jakuczun

What's hot (20)

Engineering Continuous Delivery Architectures

Peter Kupec Resume 2020

Engineering Operations

Gap Assessment for DevOps

Are your DevOps and Security teams friends or foes?

A recap of the PCTEL webinar hosted by NEDAS on December 7, 2017

Kishor_cv

Abhi

Mobile trends v3.0

Gareth conduit intellegens

Test Driven Infrastructure with Serverspec and KitchenCI

Dev opscon 2019_kickstart_via_project

Aeckerle Resume

Databasedemo3

Case study on operating an Offshore Delivery Center (ODC)

The Challenges of Integrating Algorithmic Solutions into Clinical Workflows

Achieving CI Excellence with Quality Engineering

Zero Emissions Construction Site

DevOps X

Know your R usage workflow to handle reproducibility challenges

Viewers also liked

Cyber security innovation_imho v4

W Fred Seigneur

The most important interface in a computer system is the instruction set architecture (ISA) as it connects software to hardware. So, given the prevalence of open standards for almost all other important interfaces, why is the ISA still proprietary? We argue that a free ISA is a necessary precursor to future hardware innovation, and there's no good technical reason not to have free, open ISAs just as we have free, open networking standards and free, open operating systems.

Riscv 20160507-patterson

Krste Asanovic

In this deck from the 2016 Stanford HPC Conference, Kurt Keville from R&D Labs at MIT presents: Introduction to RISC-V. "Today’s server systems provide many knobs which influence energy efficiency and performance. Some of these knobs control the behavior of the operating systems, whereas others control the behavior of the hardware itself. Choosing the optimal configuration of the knobs is critical for energy efficiency. In this talk recent research results will be presented, including examples of big data applications that consume less energy when dynamic tuning is employed." Kurt works on optimizing HPC codes for educational and institutional (R&D labs) purposes at MIT. He assesses new supercomputing hardware as part of his responsibilities. He has published in IEEE conferences and journals and he teaches embedded programming once a year. Kurt has a BS from West Point and an MS from MIT. Learn more: http://soc.mit.edu Sign up for our insideHPC Newsletter: http://insideHPC.com/newsletter

Introduction to RISC-V

inside-BigData.com

seL4 intro

microkerneldude

From L3 to seL4: What have we learnt in 20 years of L4 microkernels

microkerneldude

RISC-V Introduction

Yi-Hsiu Hsu

seL4 on RISC-V/lowRISC - ORCONF'15

Hesham Almatary

Microkernel design

microkerneldude

Viewers also liked (8)

Cyber security innovation_imho v4

Riscv 20160507-patterson

Introduction to RISC-V

seL4 intro

From L3 to seL4: What have we learnt in 20 years of L4 microkernels

RISC-V Introduction

seL4 on RISC-V/lowRISC - ORCONF'15

Microkernel design

Similar to 167 - Productivity for proof engineering

Making Model-Driven Verification Practical and Scalable: Experiences and Less...

Lionel Briand

How to test a Mainframe Application

Michael Erichsen

VASU_VALLABHUNI_INFOSYS

Vasu VALLABHUNI

AMD at ITC 2014

OptimalPlus

Scalable and Cost-Effective Model-Based Software Verification and Testing

Lionel Briand

2014 Asdenca - Capability-driven development of a soa platform, a case study

CaaS EU FP7 Project

What is TDD, and why is it giving traditional software development practices a run for their money? This presentation answers these questions, while focusing on a popular agile methodology, Extreme Programming (XP). It places a particular emphasis on the exploratory programming nature of XP and its testing practice, TDD. The paper also summarizes prior research on TDD and includes the results from a research survey conducted to compare TDD with traditional testing practices.

Test-Driven Development in the Corporate Workplace

Ahmed Owian

This slide show describes the difficulties in implementing Test-Driven Development (TDD) in the context of analytics and data engineering in development and maintenance phases. If we assumes that the objective of TDD is to reduce cycle time, improve developer productivity and improve production quality. It identifies 7 challenges from the analytics literature and a further 10 from interviews (n=14) and survey respondents (n=20) selected from analytics leaders. A key theme emerging as an output is that many of the challenges can be addressed through education and coaching, notably around data literacy for key stakeholders and executives

Why is TDD so hard for Data Engineering and Analytics Projects?

Phil Watt

tem7

guest69032c

Ph.D Annual report II

Matteo Avalle

Humana strives to help the communities we serve and our individual members achieve their best health – no small task in the past year! We had the opportunity to rethink our existing operations and reimagine what a collaborative ML platform for hundreds of data scientists might look like. The primary goal of our ML Platform, named FlorenceAI, is to automate and accelerate the delivery lifecycle of data science solutions at scale. In this presentation, we will walk through an end-to-end example of how to build a model at scale on FlorenceAI and deploy it to production. Tools highlighted include Azure Databricks, MLFlow, AppInsights, and Azure Data Factory. We will employ slides, notebooks and code snippets covering problem framing and design, initial feature selection, model design and experimentation, and a framework of centralized production code to streamline implementation. Hundreds of data scientists now use our feature store that has tens of thousands of features refreshed in daily and monthly cadences across several years of historical data. We already have dozens of models in production and also daily provide fresh insights for our Enterprise Clinical Operating Model. Each day, billions of rows of data are generated to give us timely information. We already have examples of teams operating orders of magnitude faster and at a scale not within reach using fixed on-premise resources. Given rapid adoption from a dozen pilot users to over 100 MAU in the first 5 months, we will also share some anecodotes about key early wins created by the platform. We want FlorenceAI to enable Humana’s data scientists to focus their efforts where they add the most value so we can continue to deliver high-quality solutions that remain fresh, relevant and fair in an ever changing world.

FlorenceAI: Reinventing Data Science at Humana

Databricks

Pedro e. grave de peralta resume 2016

Pedro Grave de Peralta, MBA

Pedro E. Grave de Peralta Resume 2016

Pedro Grave de Peralta, MBA

Why is Test Driven Development for Analytics or Data Projects so Hard?

Phil Watt

Oracle R12 Upgrade Lessons Learned

bpellot

Madhu_Resume

madhu latha pulimi

Uk Research Infrastructure Workshop E-infrastructure Juan Bicarregui

Innovate UK

EuroSPI O'Donnell Richardson Agile Methods in a Very Small Company Presentation

Michael O'Donnell

Replication and Benchmarking in Software Analytics

University of Zurich

Case Study on Advanced light weight torpedo (Software IV&V).pptx

Oak Systems

Similar to 167 - Productivity for proof engineering (20)

Making Model-Driven Verification Practical and Scalable: Experiences and Less...

How to test a Mainframe Application

VASU_VALLABHUNI_INFOSYS

AMD at ITC 2014

Scalable and Cost-Effective Model-Based Software Verification and Testing

2014 Asdenca - Capability-driven development of a soa platform, a case study

Test-Driven Development in the Corporate Workplace

Why is TDD so hard for Data Engineering and Analytics Projects?

tem7

Ph.D Annual report II

FlorenceAI: Reinventing Data Science at Humana

Pedro e. grave de peralta resume 2016

Pedro E. Grave de Peralta Resume 2016

Why is Test Driven Development for Analytics or Data Projects so Hard?

Oracle R12 Upgrade Lessons Learned

Madhu_Resume

Uk Research Infrastructure Workshop E-infrastructure Juan Bicarregui

EuroSPI O'Donnell Richardson Agile Methods in a Very Small Company Presentation

Replication and Benchmarking in Software Analytics

Case Study on Advanced light weight torpedo (Software IV&V).pptx

More from ESEM 2014

This presentation will challenge the application of metrics and other software engineering practices in commercial companies that do not have to comply with safety/ regulatory standards and thus can chose the SDLC approach that they feel more appropriate and cost effective for the intended purpose. Which are the practices that are really applied to make things happen under the tight constraints of time to market and profitability? A snapshot of the "hands-on" situation from the perspective of a large consulting company engaged with many customers in various markets and domains. Bio: Gualtiero Bazzana is Chairman of ITA-STQB, Head of Marketing WG for ISTQB and Managing Director of Alten Italia. He has been working in the IT domain since 20 years with a long lasting experience in the areas of testing, process improvement, quality. He has authored 50+ papers at international conferences on such subjects

Keynote 2 - The 20% of software engineering practices that contribute to 80% ...

ESEM 2014

Performing quantitative software analytics studies can be an immensely rewarding activity for scientists performing empirical research. However, such studies often pose numerous engineering challenges. The researcher must hunt down appropriate data sets, devise bespoke collection and processing tools, and optimise performance to match the size of the collected data. I will discuss principles and strategies that can be used to deal with these problems, and present examples of associated tools and techniques. Some particularly effective strategies associated with data set construction involve recursion, web searching, synthesis, probing, instrumentation, and the nurturing of alliances. On the processing front approaches include the opportunistic scavenging of tool front-ends, the exploratory development of pipelines, as well as the exploitation of tool interoperability, scripting languages, and their rich libraries. The required performance can be obtained through parallelism, stream processing, the judicious use of low-level facilities, and the choice of appropriate samples. I will finish the presentation with an overview of open problems and challenges in software analytics in vertical domains, data analysis, and under-represented stakeholders.

Keynote 1 - Engineering Software Analytics Studies

ESEM 2014

Context: General knowledge transfer is often considered a valuable effect or side-effect of pair programming, but even more important is its role for the success of the pair programming session itself: The partners often need to explain an idea to carry the process forward. Goal: Understand the mechanisms at work when knowledge is transferred during a pair programming session; provide practical advice for constructive behavior. Method: Qualitative data analysis of recordings of actual industrial pair programming sessions. Results: Some pairs are much more efficient in their knowledge transfer than others. These pairs manage to (1) not attempt to explain multiple things at once, (2) not lose sight of a topic, (3) clarify dicult points in stages. Conclusions: Pair programming requires skill beyond software development skill. To be able to identify knowledge needs and then push such knowledge to or pull it from the partner successfully is one aspect of such skill. We characterize a number of its elements.

33 - On Knowledge Transfer Skill in Pair Programming

ESEM 2014

Context: We investigate class grime, a form of design pattern decay, wherein classes of the pattern realization have extraneous attributes or methods, which obfuscate the intended design of a pattern. Goal: To expand the taxonomy of class grime using properties of class cohesion. Using this expanded taxonomy we explore the effect that forms of class grime have on pattern realization understandability. Method: A pilot study utilizing a formal experiment to explore the effects of class grime on design pattern understandability. The experiments used simulated injection of 8 types of class grime into design pattern realizations randomly selected from 16 design pattern types from a set of 6541 realizations from 520 distinct software systems. Results: We found that for each of the 8 identified class grime forms, understandability was negatively affected. Conclusion: This work serves as early communication of research for the validation of the extended taxonomy as well as the method of grime injection used in the experiment.

222 - Design Pattern Decay: The Case for Class Grime

ESEM 2014

Context: Since human power is an essential resource, the number of contributors in a software development community is one of the health indicators of an open source software (OSS) project. For maintaining and increasing the populations in software development communities, both attract- ing new contributors and retaining existing contributors are important. Goal: Our goal is understanding the current status of projects’ population, especially the different experienced contributors’ composition of the projects. Method: We propose software population pyramids, a graphical illustration of the distribution of various experience groups in a software development community. Results: From the study with OSS projects in GitHub, we found that the shapes of software population pyramids varies depending on the current status of OSS development communities. Conclusions: This paper present a software population pyramid of the distribution of various experience groups in a software community population. Our results can be considered as predictors of the near future of a project.

210 - Software Population Pyramids: The Current and the Future of OSS Develop...

ESEM 2014

Background: Particularly during and after research projects, technology transfer into practice plays an important role for academia to get technologies into use and for industry to improve their development. Objective: Our goal was to gain more and current knowledge about how technology transfer from software engineering (SE) research into industrial practice is accomplished best and how to measure the effectiveness of this transfer. Method: We conducted a study in the context of two German research projects, covering many different organizations from industry and academia. Results: This paper presents the design of the study and the survey performed. After introducing the concept of technology transfer we used and adapted, we present preliminary results. Conclusions: We observed that traditional means such as meetings or workshops are still the most widely used mediums for technology transfer in SE. We also discovered that, even though the duration of transfer depends on the object being transferred, the average duration is three years, which is far less than previously published (~18 years).

169 - Bridging the Gap: SE Technology Transfer into Practice - Study Design a...

ESEM 2014

Context: In the context of the research and development project ARAMiS, multiple partners from research and industry are collaborating in the development of new methods and technologies in the field of multicore systems. Goal: We designed and executed studies for evaluating the results of the ARAMiS sub-project responsible for requirements engineering: an artifact-based requirements engineering approach, its tooling, and a cross-domain scenario. Method: This evaluation was performed along with the dissemination of the results in the project. The evaluation included two studies aimed at collecting the opinions of the project participants regarding the requirements engineering results from the viewpoints of industry and research. Results: The mainly positive results showed us that the different parts of the requirements engineering approach in this project are being accepted. Conclusions: Nonetheless, especially the results for the scenario revealed some weaknesses, such as the so-called “ARAMiS gap”, i.e., a gap between the high-level requirements engineering artifacts and the detailed engineering artifacts.

196 - Evaluation in Practice: Artifact-based Requirements Engineering and Sc...

ESEM 2014

Context: Security requirements for software systems can be challenging to identify and are often overlooked during the requirements engineering process. Existing functional requirements of a system can imply the need for security requirements. Systems having similar security objectives (e.g., confidentiality) often also share security requirements that can be captured in the form of reusable templates and instantiated in the context of a system to specify security requirements. Goal: We seek to improve the security requirements elicitation process by automatically suggesting appropriate security requirement templates implied by existing functional requirements. Method: We conducted a controlled experiment involving 50 graduate students enrolled in a software security course to evaluate the use of automatically-suggested templates in eliciting implied security requirements. Participants were divided into treatment (automatically-suggested templates) and control groups (no templates provided). Results: Participants using our templates identified 42% of all the implied security requirements in the oracle as compared to the control group, which identified only 16% of the implied security requirements. Template usage increased the efficiency of security requirements identified per unit of time. Conclusion: Automatically-suggested templates helped participants (security non-experts) think about security implications for the software system and consider more security requirements than they would have otherwise. We found that participants need more incentive than just a participatory grade when completing the task. Further, we recommend to ensure task completeness, participants either need a step-driven (i.e., wizard) approach or progress indicators to identify remaining work.

42- Using Templates to Elicit Implied Security Requirements from Functional R...

ESEM 2014

Background: The International Software Benchmarking Standards Group (ISBSG) dataset makes it possible to estimate a project’s size, effort, duration, and cost. Aim: The aim was to analyze the ISBSG variables that have been used by researchers for software effort estimation from 2000, when the first papers were published, until the end of 2013. Method: A systematic mapping review was applied to over 167 papers obtained after the filtering process. From these, it was found that 133 papers produce effort estimation and only 107 list the independent variables used in the effort estimation models. Results: Seventy-one out of 118 ISBSG variables have been used at least once. There is a group of 20 variables that appear in more than 50% of the papers and include Functional Size (62%), Development Type (58%), Language Type (53%), and Development Platform (52%) following ISBSG recommendations. Sizing and Size attributes altogether represent the most relevant group along with Project attributes that includes 24 technical features of the project and the development platform. All in all, variables that have more missing values are used less frequently. Conclusions: This work presents a snapshot of the existing usage of ISBSG variables in software development estimation. Moreover, some insights are provided to guide future studies.

166 - ISBSG variables most frequently used for software effort estimation: A ...

ESEM 2014

Context: Onboarding is a process that helps newcomers become integrated members of their organisation. Successful onboarding programs can result in increased performance in conventional organisations, but there is little guidance on how to onboard new developers in Open Source Software (OSS) projects. Goal: In this study, we examine how mentoring and project characteristics influence the effectiveness and efficiency of the onboarding process. We study a collaboration program involving a total of nine Open Source Software projects and more than 120 students from dfferent universities around the world as part of Facebook’s Education Modernization Program. Method: We use quantitative measurements of source code repositories, issue tracking systems, and discussion fora to examine how newcomers become contributing members of their OSS projects. Results: We found that developers receiving deliberate onboarding support through mentoring were more active at an earlier stage than developers entering projects through conventional means. Also, we found that project size and lifetime influenced onboarding. Conclusion: Empirical decision support can contribute to a more effective onboarding process in OSS projects. Mentor support in critical stages can accelerate the process, but project maturity is also a significant factor that increases the effect of onboarding.

112 - The Role of Mentoring and Project Characteristics for Onboarding in Ope...

ESEM 2014

Context: Software release teams try to reduce the time needed for the transit of features or bug fixes from the development environment to the production, crossing all the quality gates. However, little is known about the factors that influence the time-to-production and how they might be controlled in order to speed up the release cycles. Goal:This paper examines step by step the release process of an industrial software organization aiming to identify factors that have a significant impact on the lead time and outcomes of the software releases. Method:Over 14 months of release data have been analyzed (246 releases from the isolated source code branches to the production environment). Results:We discuss three dimensions under which a series of factors could be addressed: technical, organizational, and interactional. We present our findings in terms of implications for release process improvements. Conclusions: Our analyzes reveal that testing is the most time consuming activities (86%) along with the need for more congruence among teams, especially in the context of parallel development.

224 - Factors Impacting Rapid Releases: An Industrial Case Study

ESEM 2014

Context: The low quality and small size of samples in empirical studies in software engineering hamper the interpretation and generalization of their results. Therefore, enlarging sample sizes and improving their quality represent an important research challenge. Goal: We aim to define a conceptual framework, including requirements for establishing adequate sources for sampling subjects in software engineering surveys. Method: We use previous experience on applying systematic sampling strategies combined with contemporary web technologies in previously executed surveys, to organize the conceptual framework. We analyze its application to different sources of sampling. Results: The framework was observed to be feasible after its application to nine different large-scale sources of sampling. Conclusions: The analyzed crowdsourcing tools do not support essential requirements to be considered sources of sampling, while free- lancing tools and professional social networks do.

215 Towards a Framework to Support Large Scale Sampling in Software Engineeri...

ESEM 2014

Context: Small and non-probabilistic samples represent relevant issues when discussing the external validity of empirical studies in Software Engineering. Goal: To investigate alternatives to improve the quality of samples (size, heterogeneity and level of confidence). Method: To replicate a survey on characteristics of agility in software processes by applying a systematic recruitment strategy over a professional social network. Results: It resulted in a sampling frame composed by 19 groups stratified according two perspectives: sharing of groups' members and main software engineering skills reported by the subjects. In total, 7,745 subjects were randomly recruited, resulting in 291 contributions. Conclusions: This sample was significantly larger, more heterogeneous and presents some strata with higher confidence levels than previous trials samples.

214 - Sampling Improvement in Software Engineering Surveys

ESEM 2014

Context- A common problem in Systematic Reviews in software engineering is that they provide very limited syntheses. Goal- In the search for alternatives of effective methods for synthesizing empirical evidence, in this paper, we explore the use of the Qualitative Metasummary method, which is a quantitatively oriented aggregation of mixed research findings. Method - We describe the use of qualitative metasummary through an example using 15 studies addressing antecedents of performance of software development teams. Qualitative metasummary includes extraction and grouping of findings, and calculation of frequency and intensity effect sizes. Results – The instance described in this paper produced a 10-factor model that effectively summarizes the current empirical knowledge on performance of software development teams. Then, we assessed the method in terms of ease of use, usefulness and reliability of results. Conclusion – The Qualitative Metasummary method offers rich indexes of experiences and events under investigation, focusing on the effects of a variable over other, which is consistent with the central interest of systematic reviews. However, its main limitations are (i) challenging comparability/integratability between primary studies, (ii) loss of detailed contextual information, (iii) and the great deal of effort demanded to synthesize larger sets of papers.

201 - Using Qualitative Metasummary to Synthesize Empirical Findings in Liter...

ESEM 2014

Context – Motivation and job satisfaction are not the same thing, and although business organisation research recognised this a long time ago, in Software Engineering research, we have not. As a result, thirty years of research on motivation in software engineering has produced knowledge on what makes software engineers generally happier, but not about how to increase their motivation. Goal – In this article, we aim to identify visible signs of a software engineer who is motivated to work. Method – We describe a field study in which 62 practitioners in Brazil reported their view of "motivation" in the context of their practical work. Data was collected by means of audio-recorded semi-structured interviews, and a thematic analysis was applied to identify the most relevant descriptors of motivation. Results – Our data reveal that (1) motivated Software Engineers are engaged, focused, and collaborative; and (2) the term "motivation" is used as an umbrella term to cover several distinct organizational behaviours that are not necessarily related to the individual ́s desire to work. Conclusions – Without a clear picture of the difference between these two concepts, work-based motivation programs may not be designed effectively to address either turnover or performance issues. Overall, this work indicates the need for a more effective conceptual system to investigate and encourage both job satisfaction and work motivation in software engineering research and practice.

130 - Motivated software engineers are engaged and focused, while satisfied o...

ESEM 2014

Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Goal: Our work explores using Apache Lucene for large- scale duplicate detection based on textual content. Also, we evaluate the previous claim that results are improved if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weight- ing of the title and the description affects the accuracy. Results: We show that Lucene obtains the best results when the defect report title is weighted three times higher than the description, a bigger difference than has been previously acknowledged. Conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection.

178 - A replicated study on duplicate detection: Using Apache Lucene to searc...

ESEM 2014

Context: Gaining an identity and building a good reputation are important motivations for Open Source Software (OSS) developers. It is unclear whether these motivations have any actual impact on OSS project success. Goal: To identify how an OSS developer’s reputation affects the outcome of his/her code review requests. Method: We conducted a social network analysis (SNA) of the code review data from eight popular OSS projects. Working on the assumption that core developers have better reputation than peripheral developers, we developed an approach, Core Identification using K-means (CIK) to divide the OSS developers into core and periphery groups based on six SNA centrality measures. We then compared the outcome of the code review process for members of the two groups. Results: The results suggest that the core developers receive quicker first feedback on their review request, complete the review process in shorter time, and are more likely to have their code changes accepted into the project codebase. Peripheral developers may have to wait 2 - 19 times (or 12 - 96 hours) longer than core developers for the review process of their code to complete. Conclusion: We recommend that projects allocate resources or create tool support to triage the code review requests to motivate prospective developers through quick feedback.

124 - Impact of Developer Reputation on Code Review Outcomes in OSS Projects:...

ESEM 2014

Context: One limitation of the empirical studies about test- driven development (TDD) is knowing whether the developers followed the advocated test-code-refactor cycle. Re- search dealt with the issue of process conformance only in terms of internal validity, while investigating the role of other confounding variables that might explain the contro- versial e↵ects of TDD. None of the research included process conformance as a fundamental part of the analysis. Goal: We aim to examine the impact of process conformance on the claimed effects of TDD on external quality, developers’ productivity and test quality. Method: We used data collected during a previous study to create regression models in which the level of process conformance was used to predict external quality, productivity, and tests thoroughness. Result: Based on our analysis of the available data (n = 22), we observe that neither quality (p value = 0.21), productivity (p value = 0.80), number of tests (p value = 0.39) nor coverage (pvalue = 0.09) was correlated with the level of TDD process conformance. Conclusion: While based on a small sample, we raise concerns about how TDD is interpreted. We also question whether the cost of strictly following TDD will pay-o↵ in terms of external quality, productivity, and tests thoroughness.

18 - Impact of Process Conformance on the Effects of Test-driven Development

ESEM 2014

Context: Real-time speech translation technology is today available but still lacks a complete understanding of how such technology may affect communication in global software projects. Goal: To investigate the adoption of combining speech recognition and machine translation in order to overcome language barriers among stakeholders who are remotely negotiating software requirements. Method: We performed an empirical simulation-based study including: Google Web Speech API and Google Translate service, two groups of four subjects, speaking Italian and Brazilian Portuguese, and a test set of 60 technical and non-technical utterances. Results: Our findings revealed that, overall: (i) a satisfactory accuracy in terms of speech recognition was achieved, although significantly affected by speaker and utterance differences; (ii) adequate translations tend to follow accurate transcripts, meaning that speech recognition is the most critical part for speech translation technology. Conclusions: Results provide a positive albeit initial evidence towards the possibility to use speech translation technologies to help globally distributed team members to communicate in their native languages.

65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...

ESEM 2014

Context: Software testing is a crucial step in most software development processes. Testing software is a key component to manage and assess the risk of shipping quality products to customers. But testing is also an expensive process and changes to the system need to be tested thoroughly which may take time. Thus, the quality of a software product depends on the quality of its underlying testing process and on the effectiveness and reliability of individual test cases. Goal: In this paper, we investigate the impact of the organizational structure of test owners on the reliability and effectiveness of the corresponding test cases. Prior empirical research on organizational structure has focused only on developer activity. We expand the scope of empirical knowledge by assessing the impact of organizational structure on testing activities. Method: We performed an empirical study on the Windows build verification test suites (BVT) and relate effectiveness and reliability measures of each test run to the complexity and size of the organizational sub-structure that enclose all owners of test cases executed. Results: Our results show, that organizational structure impacts both test effectiveness and test execution reliability. We are also able to predict effectiveness and reliability with fairly high precision and recall values. Conclusion: We suggest to review test suites with respect to their organizational composition. As indicated by the results of this study, this would increase the effectiveness and reliability, development speed and developer satisfaction.

52 - The Impact of Test Ownership and Team Structure on the Reliability and E...

ESEM 2014

More from ESEM 2014 (20)