Community-Assisted
Software Engineering
Decision Making
Gregory Gay and Mats Heimdahl
University of Minnesota
AI in SE: A Success Story
Large, active field, with:
● Growing research community
● Numerous conferences and workshops,
such as MSR, PROMISE, RAISE
● Large data repositories
● History of collaboration between industry
and academia
2
We're already good at drawing useful
conclusions. We expect further algorithmic
improvements.
But...
We need to improve our data!
3
Problem 1:
We don't know what data we need.
Trying to solve complex problems. Make
guesses, then collect data.
Results in missing attributes, added noise.
4
Problem 2:
The data we have is often weak.
Solution quality depends on data quality.
Some commonly-used data sets infamous for
missing values, unhelpful attributes, poor
recording standards.
5
We should improve data standards, but..
We need to use the data we have.
Synergy of human feedback and AI to turn
static data models into dynamic models.
Bring a Wikipedia model to data sets.
6
Inspiration: Recommender Systems
7
Enhanced Feedback Loop
8
Recommendation:
MC/DC
Helpful?
Yes
New Values for
Existing Attributes:
Num. Boolean
Expressions: 219
Num. Numeric
Calculations: 73
New Attributes to
Collect (and Values):
Ratio of Boolean to
Numeric Calculations:
3:1
Data to Delete:
Projects 1, 3, 7
Why should we enhance our data?
These dynamic data models allow:
● Low start-up costs.
● Build body of evidence over time.
● Address data quality issues.
● Human-in-the-loop feedback.
9
Challenge 1:
How do we collect feedback?
10
Challenge 2:
How do we use feedback?
Fundamental trade-off between human curation
and automated AI learning.
When should attributes be filtered? Un-updated
data phased out? New data added?
11
Challenge 3:
Motivating Users
How do we motivate users to:
● Provide feedback.
● Add new data.
● Update old data.
12
Motivation requires:
1. Incentive.
2. Ease of use/contribution.
3. Utility from and trust in the model.
13
We propose feedback-driven dynamic
data models maintained by a synergy of
user-feedback and automated AI techniques.
We propose that dynamic data will allow for
low start-up costs, a stronger body of
evidence over time, and adaptations to
changing industrial conditions.
14
For discussion...
1. Is this even a good idea?
2. What can we do to solve data quality
issues? (other than just the idea suggested
here)
3. What kind of data would benefit from
dynamic adaptation?
4. How do we motivate users to provide
feedback, new data, and update old data?
15

Community-Assisted Software Engineering Decision Making

  • 1.
    Community-Assisted Software Engineering Decision Making GregoryGay and Mats Heimdahl University of Minnesota
  • 2.
    AI in SE:A Success Story Large, active field, with: ● Growing research community ● Numerous conferences and workshops, such as MSR, PROMISE, RAISE ● Large data repositories ● History of collaboration between industry and academia 2
  • 3.
    We're already goodat drawing useful conclusions. We expect further algorithmic improvements. But... We need to improve our data! 3
  • 4.
    Problem 1: We don'tknow what data we need. Trying to solve complex problems. Make guesses, then collect data. Results in missing attributes, added noise. 4
  • 5.
    Problem 2: The datawe have is often weak. Solution quality depends on data quality. Some commonly-used data sets infamous for missing values, unhelpful attributes, poor recording standards. 5
  • 6.
    We should improvedata standards, but.. We need to use the data we have. Synergy of human feedback and AI to turn static data models into dynamic models. Bring a Wikipedia model to data sets. 6
  • 7.
  • 8.
    Enhanced Feedback Loop 8 Recommendation: MC/DC Helpful? Yes NewValues for Existing Attributes: Num. Boolean Expressions: 219 Num. Numeric Calculations: 73 New Attributes to Collect (and Values): Ratio of Boolean to Numeric Calculations: 3:1 Data to Delete: Projects 1, 3, 7
  • 9.
    Why should weenhance our data? These dynamic data models allow: ● Low start-up costs. ● Build body of evidence over time. ● Address data quality issues. ● Human-in-the-loop feedback. 9
  • 10.
    Challenge 1: How dowe collect feedback? 10
  • 11.
    Challenge 2: How dowe use feedback? Fundamental trade-off between human curation and automated AI learning. When should attributes be filtered? Un-updated data phased out? New data added? 11
  • 12.
    Challenge 3: Motivating Users Howdo we motivate users to: ● Provide feedback. ● Add new data. ● Update old data. 12
  • 13.
    Motivation requires: 1. Incentive. 2.Ease of use/contribution. 3. Utility from and trust in the model. 13
  • 14.
    We propose feedback-drivendynamic data models maintained by a synergy of user-feedback and automated AI techniques. We propose that dynamic data will allow for low start-up costs, a stronger body of evidence over time, and adaptations to changing industrial conditions. 14
  • 15.
    For discussion... 1. Isthis even a good idea? 2. What can we do to solve data quality issues? (other than just the idea suggested here) 3. What kind of data would benefit from dynamic adaptation? 4. How do we motivate users to provide feedback, new data, and update old data? 15