@NLBSE_workshop nlbse2022.github.io/tools
NLBSE’22: Tool Competition
Oscar Chaparro
Rafael Kallis
College of
William & Mary
USA
Rafael Kallis
Consulting
Switzerland
@NLBSE_workshop nlbse2022.github.io/tools
The competition at a glance
Goal: develop more accurate models for issue classification
Baseline model: TicketTagger
Dataset: 800k+ issue reports from 127k+ GitHub projects
Competitors: 5 teams
@NLBSE_workshop nlbse2022.github.io/tools
Issue report classification
Issue
report
Classification
model
Bug
Enhancement
Question
Important task in issue management and prioritization
Extensive research in the field that uses NLP/ML
@NLBSE_workshop nlbse2022.github.io/tools
Baseline model: TicketTagger
Rafael Kallis et al., Ticket tagger: Machine learning driven issue classification, ICSME’19
Rafael Kallis et al., Predicting issue types on GitHub, Science of Computer Programming, 2021
Bug
Enhancement
Question
Issue title &
description
@NLBSE_workshop nlbse2022.github.io/tools
Benchmark dataset
800k+ issues from 127k+ GitHub projects
Closed issues with any of the 3 labels using Google BigQuery
• Label (aka issue type)
• Title and description
• URL (issue and repository), timestamp
• Author type (owner, contributor, etc.)
@NLBSE_workshop nlbse2022.github.io/tools
Benchmark dataset
50.0%
41.4%
8.6%
% of issues
Bug Enhancement Question
Training set: 90% (723k)
Testing set: 10% (80.5k)
@NLBSE_workshop nlbse2022.github.io/tools
Infrastructure
@NLBSE_workshop nlbse2022.github.io/tools
Competition rules
Training and fine-tuning on the training set
Model classification accuracy on the testing set
Preprocessing and manipulation on the training set
Feature engineering, data balancing, validation set, etc.
No balancing of the testing set
@NLBSE_workshop nlbse2022.github.io/tools
Metrics
Precision, recall, and
F1-score for each label
Micro average F1-score to
declare the winner
@NLBSE_workshop nlbse2022.github.io/tools
Competitors
Team 4
(Colavito et al.)
Team 3
(Trautsch & Herbold)
Team 2
(Bharadwaj & Kadam)
Team 5
(Izadi)
Team 1
(Siddiq & Santos)
@NLBSE_workshop nlbse2022.github.io/tools
Submitted tools at a glance
BERT*
XLNet
MLP
Log. Regression
Title
Description
Repository
Timestamp
Author
Text
normalization
Duplicate
removal
…
@NLBSE_workshop nlbse2022.github.io/tools
Classification results
0.872
0.865
0.859 0.858 0.857
0.818
0.79
0.8
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
Micro avg. F1-score
Team A Team B Team C Team D Team E Baseline
@NLBSE_workshop nlbse2022.github.io/tools
Results for the best model
0.84 0.874
0.72
0.897 0.885 0.879
0.72
0.664 0.691
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Precision Recall F1-score
Bug Enhancement Question
@NLBSE_workshop nlbse2022.github.io/tools
Competition ranking
Team 4
(Colavito et al.)
Team 3
(Trautsch & Herbold)
Team 1
(Siddiq & Santos)
3
@NLBSE_workshop nlbse2022.github.io/tools
Competition ranking
Team 2
(Bharadwaj & Kadam)
Team 5
(Izadi)
1
2
@NLBSE_workshop nlbse2022.github.io/tools
Tool presentations
Motivation for choosing models
Challenges during model training/evaluation
Features that most contributed to performance
Preprocessing pipeline and their effect on performance
Examples of failing and successful predicted cases
Ideas on a customized model to improve performance
@NLBSE_workshop nlbse2022.github.io/tools
Discussion panel
ADDITIONAL Q&A FEEDBACK ON
COMPETITION
IDEAS FOR THE
NEXT EDITION

NLBSE’22: Tool Competition