Continuous Integration (CI) environments cope with the repeated integration of source code changes and provide
rapid feedback about the status of a software project. However, as the integration cycles become shorter, the amount of data increases, and the effort to find information in CI environments becomes substantial. In modern CI environments, the selection of measurements (e.g., build status, quality metrics) listed in a dashboard does only change with the intervention of a stakeholder (e.g., a project manager). In this paper, we want to address the shortcoming of static views with so-called Software Quality Assessment (SQA) profiles. SQA-Profiles are defined as rulesets and enable a dynamic composition of CI dashboards based on stakeholder activities in tools of a CI environment (e.g., version control system). We present a set of SQA-Profiles for project management committee (PMC) members: Bandleader, Integrator, Gatekeeper, and Onlooker. For this, we mined the commit and issue management activities of PMC members from 20 Apache projects. We implemented a framework to evaluate the performance of our rule-based SQA-Profiles in comparison to a machine learning approach. The results showed that project-independent SQA-Profiles can be used to automatically extract the profiles of PMC members with a precision of 0.92 and a recall of 0.78.
1. Software Quality Assessment (SQA) Profiles
Rule-Based Activity Profiles for Continuous Integration Environments
Department of Informatics
Martin Brandtner, Sebastian Müller, Philipp Leitner, and Harald C. Gall
University of Zurich, Switzerland
{brandtner, smueller, leitner, gall}@ifi.uzh.ch
IEEE SANER 2015, Montréal, Canada
4. Continuous Integration Environment
IEEE SANER 2015, Montréal, Canada 3
Issue tracker Version control
system
CI platform
Status
dashboard
Activity data
Data
recommendation
Stakeholder
profiles
5. Stakeholder Roles
• are defined by the
project
management
• may not reflect the
actual field of
activity
• do not change
during a project
IEEE SANER 2015, Montréal, Canada 4
• can change over
time
• based on actual
activity data
Stakeholder Profiles
Stakeholder Profiles≠
6. Research Question 1
Can activity data mined from the version
control system and issue tracking platform
be used for the extraction of profiles
within the Project Management Committee
role?
IEEE SANER 2015, Montréal, Canada 5
7. Research Question 2
What profiles of PMC members can be
extracted from the activity data, and how
can these profiles be described in a ruled-
based model?
IEEE SANER 2015, Montréal, Canada 6
8. Approach
1) Extraction of profiles from 20 projects by
clustering
2) Definition of a rule-based model to describe
the extracted profiles (SQA-Profiles)
3) Evaluation of the rule-based profile model
IEEE SANER 2015, Montréal, Canada 7
9. Profile Extraction by Clustering
IEEE SANER 2015, Montréal, Canada 8
VCS data
Issue data
VCS and Issue
data per
stakeholder
20 Apache projects
Clustering
4 Profiles
Activity data:
# Commits
# Merges
# Issue state changes
# Issue comments
# Issue assignee changes
# Issue priority changes
10. Profile Extraction by Rule Inference
Goal:
Rule-based and project-independent description of
activity profiles
Approach:
Attributes: commits, merges, issue state changes, etc.
Nominal scale for each attribute and project
Profiles defined based on attribute values
(e.g. commits: MEDIUM, merges: HIGH => Profile A)
IEEE SANER 2015, Montréal, Canada 9
11. Extracted SQA-Profiles – Integrator
Integration of source code contributions
High merging activity
At least one other attribute with medium activity
IEEE SANER 2015, Montréal, Canada 10
HH = At least one attribute with high activity / HM = At least one attribute with medium activity
SH = Set of all stakeholders / A = Set of all attributes
12. Extracted SQA-Profiles
Bandleader
Keeps the show running
High activity in each attribute
Gatekeeper
Decides when the status of an issues changes
High status change activity and moderate activity in
assignee changes or commits
Onlooker
Limited contributions (VCS and issue tracking)
At least one attribute with medium activity and at least
two attributes with low activity
IEEE SANER 2015, Montréal, Canada 11
15. Evaluation – Results
Profile PMC Member Non PMC Member
Bandleader 4 0
Integrator 9 4
Gatekeeper 11 9
Onlooker 20 20
IEEE SANER 2015, Montréal, Canada 14
Even non PMC members perform PMC
member activities
16. Summary and Outlook
IEEE SANER 2015, Montréal, Canada 15
RQ1: Can activity data be mined to extract profiles?
RQ2: What kind of profiles can be described?
17. Summary and Outlook
IEEE SANER 2015, Montréal, Canada 16
RQ1: Can activity data be mined to extract profiles?
RQ2: What kind of profiles can be described?
http://goo.gl/Jk01KR
Editor's Notes
Rule based activity profile definition for continuous integration environments.
Continuous Integration Platforms are widely used
Consists of a version control system and a issue tracking system
Important data source in the lifecycle of a software project.
Accessing data differs between stakeholders
Apache projects are led by PMC
Stakeholders with a PMC role manage the project and the community
For each of this activity, a PMC member looks at different data / views
Picture
Ann interested in one part
Dave different activity and interested in other part
Each stakeholder only in part of the data
Stakeholders would benefit from tailoring views and data
But tedious and time-consuming
Picture
Activity data mining to overcome this shortcoming
At the moment focus on the mining of activities
Data recommendation is future work
Why profiles and not roles?
Roles have limitations
Need a way to define what a stakeholder does
Stakeholder profiles
steps in detail.
20 apache projects between September 2013 and September 2014
Activity data: project name, the stakeholder, and activities: commits, merges, issue state changes, issue comments, issue assignee changes, issue priority changes From RW
130 stakeholders with PMC roles and 542 activity entry data
PICTURE
Set of profiles based on clustering: k-means
Goal: rule-based and project-independent description
PICTURE
- Approach: Nominal scale for each project and each attribute
This is the first profile Integrator
Integration of source code contributions
High merging activity and at least another attribute with high activity Change in the according issue
9 members in 9 different projects
Three more profiles
Bandleader: 3 in 3 projects
Gatekeeper: 12 in 9 projects
Onlooker: 106 in all projects
TP: stakeholder profile association that is in accordance with the classification of the baseline dataset
FP: is any stakeholder profile association that is not part of the baseline dataset
101 correctly, 9 wrong profile and 20 unclassified 92% precision and 78% recall
- Gatekeeper low precision: broad definition because of different strategies
Number of projects in which a certain profile was found, , categorized by the fact whether the stakeholder is a PMC member or not
Indicator that roles do not always reflect the actual activity
Indicator that information tailoring can not only be done based on role descriptions
- Profiles are a first step towards data tailoring and view composition
[show picture]
An actual prototype of this dynamic composition of views can be found at this URL.