Software developers make programming mistakes that cause serious bugs for their customers. Existing work to detect problematic software focuses mainly on post hoc identification of correlations between bug fixes and code. We propose a new approach to address this problem — detect when soft- ware developers are experiencing difficulty while they work on their programming tasks, and stop them before they can introduce bugs into the code.
In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult. We can predict nominal task difficulty (easy/difficult) for a new developer with 64.99% precision and 64.58% recall, and for a new task with 84.38% precision and 69.79% recall. We can improve the Naive Bayes classifier’s performance if we trained it on just the eye-tracking data over the entire dataset, or by using a sliding window data collection schema with a 55 second time window. Our work brings the community closer to a viable and reliable measure of task difficulty that could power the next generation of programming sup- port tools.
Using psycho physiological sensors to assess task difficulty in software development
1. Thomas Fritz*, Andrew Begel°,
Sebastian C. Müller*,
Serap Yigit-Elliott†, and Manuela Züger*"
* University of Zurich, Switzerland"
° Microsoft Research, USA"
† Exponent, USA"
2. using
Graphics;
namespace
Study
{
public
class
Drawing
{
public
static
void
Main(string[]
args)
{
Circle
c
=
new
Circle();
Triangle
t1
=
new
Triangle();
Square
s
=
new
Square();
Triangle
t2
=
new
Triangle();
Graphics.draw(t2);
Graphics.draw(t1);
Graphics.draw(c);
Graphics.draw(s);
}
}
}
using
Graphics;
namespace
Study
{
public
class
Drawing
{
public
static
void
Main(string[]
args)
{
Object
objectA
=
new
Circle();
Object
objectK
=
new
Circle();
Object
objectX
=
new
Square();
Object
objectB
=
new
Triangle();
Graphics.draw(objectX);
Graphics.draw(objectA);
Graphics.draw(objectB);
Graphics.draw(objectK);
}
}
}
2
3. • Several research areas tackle this question:"
• CS Education"
• Psychology of Programming"
• Program Comprehension"
• And its implications:"
• Testing and Automatic Verification"
• Code Reviews"
• Mining Software Repositories"
" "
3
4. • Inspired by Lee et al. Micro interaction metrics for defect
prediction. FSE 2011."
• Programmers’ cognitive and emotional states are affected by
their code and work environment, which ultimately affects their
software."
§ Some typical emotions: frustrated, surprised, proud."
• Some signals of your body’s internal states:"
"
4
Nervous System! Brainwaves, sweat"
Eyes! Pupil size, blink rate"
Muscles! Heart rate variability, typing pressure, grip on mouse"
Affect! Facial recognition"
5. 1. Can we correlate developers’ cognitive and emotional states
with their perception of task difficulty?"
2. How well do these states predict long-term effects on
software (e.g. bugs, productivity)?"
When we detect that a developer is in the zone, we could
signal his teammates to delay non-critical interruptions."
We could refactor the cognitively difficult parts of the
codebase where developers lose the most productivity."
Armed with a task difficulty classifier, we could help stop
developers from making mistakes!"
"
5
6. 1. Can readings from psycho-physiological sensors (eye
tracking, EDA, EEG) accurately predict whether a task
is perceived to be difficult or easy?"
2. Which combination of sensors and features best
predict perceived difficulty?"
3. Can we use these measures to predict perceived
difficulty even as the developer works on the task?"
6
7. 7
15 professional software developers"
8 tasks with various levels of difficulty"
3 psycho-physiological sensors"
8 task ratings and 1 ranking of all tasks"
8. • Recruited from a pool of professional
developers in the greater Seattle area"
• 2+ years of professional SE experience"
• Recently programmed in C#"
• 14 male, 1 female"
• 27 to 60 years old"
8
9. 8 Tasks:!
(2 types)"
Variations:"
Cognitive Abilities:"
2 overlap tasks"
2
1
4
3
6 drawing order tasks"
Variable names (mnemonic vs. obfuscated)"
Loops with various complexity"
Nested ?: operator"
Randomly-ordered field assignments"
Working memory"
Spatial relations"
Math and Logic"
9
10. 10
using
Graphics;
namespace
Study
{
class
Drawing
{
public
static
void
Main(string[]
args)
{
Rectangle
t
=
new
Rectangle();
t.leftBottom
=
new
Point(2,2);
t.leftTop
=
new
Point(2,6);
t.rightTop
=
new
Point(6,6);
t.rightBottom
=
new
Point(6,2);
Graphics.draw(t);
Rectangle
s
=
new
Rectangle();
s.leftTop
=
new
Point(11,5);
s.leftBottom
=
new
Point(5,5);
s.rightBottom
=
new
Point(5,9);
s.rightTop
=
new
Point(11,9);
Graphics.draw(s);
}
}}
Do these rectangles
overlap?"
11. 11
using
Graphics;
namespace
Study
{
class
Drawing
{
public
static
void
Main(string[]
args)
{
Rectangle
t
=
new
Rectangle();
t.leftBottom
=
new
Point(2,2);
t.leftTop
=
new
Point(2,6);
t.rightTop
=
new
Point(6,6);
t.rightBottom
=
new
Point(6,2);
Graphics.draw(t);
Rectangle
s
=
new
Rectangle();
s.leftTop
=
new
Point(11,5);
s.leftBottom
=
new
Point(5,5);
s.rightBottom
=
new
Point(5,9);
s.rightTop
=
new
Point(11,9);
Graphics.draw(s);
}
}}
using
Graphics;
namespace
Study
{
class
Drawing
{
public
static
void
Main(string[]
args)
{
Rectangle
v
=
new
Rectangle();
v.leftTop
=
new
Point(1,8);
Rectangle
x
=
new
Rectangle();
x.rightBottom
=
new
Point(13,3);
x.rightTop
=
new
Point(13,10);
x.leftBottom
=
new
Point(7,3);
v.rightTop
=
new
Point(3,8);
x.leftTop
=
new
Point(7,10);
v.rightBottom
=
new
Point(3,5);
Graphics.draw(x);
v.leftBottom
=
new
Point(1,5);
Graphics.draw(v);
}
}}
Do these rectangles overlap?"
14. 1. Recorded participants’ task completion times."
2. After each task, participant filled out NASA Task
Load Index (TLX) survey."
3. At end of study, participant ranked tasks by
relative difficulty (1 – 8)."
14
17. • Task difficulty metrics were highly correlated. "
• NASA TLX vs. task difficulty ranking
Spearman: r[116] = 0.587, p < 0.01"
• Task difficulty ranking vs. task completion time
Spearman: r[116] = 0.724, p < 0.01"
• We created simplified metrics by nominalizing NASA TLX
and task difficulty ranking into Boolean easy/difficult."
• Correlation: Boolean NASA TLX score vs. Boolean task difficulty
Chi2(1, 116) = 57.954, p < 0.01 (accuracy 85%)"
• Triangulation between metrics helps validates our results."
17
24. 1. Can readings from psycho-physiological sensors (eye
tracking, EDA, EEG) accurately predict whether a task
is perceived to be difficult or easy?"
2. Which combination of sensors and features best
predict perceived difficulty?"
3. Can we use these measures to predict perceived
difficulty even as the developer works on the task?"
24