• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Implications of Ceiling Effects in Defect Predictors
 

Implications of Ceiling Effects in Defect Predictors

on

  • 940 views

Implications of Ceiling Effects in Defect Predictors - PROMISE 2008

Implications of Ceiling Effects in Defect Predictors - PROMISE 2008

Statistics

Views

Total Views
940
Views on SlideShare
860
Embed Views
80

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 80

http://promisedata.org 80

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Implications of Ceiling Effects in Defect Predictors Implications of Ceiling Effects in Defect Predictors Presentation Transcript

    •  
    • Outline
      • Approach
      • Use More Data
      • Use Less Data
      • Use Even Less Data
      • Discussions
      • Examples
      • Conclusions
    • Approach
      • Other Research: Try changing data miners
        • Various data miners: no ground-breaking improvements
      • This Research: Try changing training data
        • Sub-sampling: Over/ Under/ Micro sampling
        • Hypothesis : Static Code Attributes have limited information content
        • Predictions:
          • Simple learners can extract limited information content
          • No need for more complex learners
          • Further progress needs increasing the information content in data
    • State-of-the-art Defect Predictor
      • Naive Bayes with simple log-filtering
      • Probability of detection (pd): 75%
      • Probability of false alarms (pf): 21%
      • Other data miners failed to achieve such performance:
        • Logistic regression
        • J48
        • OneR
        • Complex variants of Bayes
        • Various others available in WEKA...
    • How Much Data: Use more...
      • Experimental Rig:
        • Stratify
        • |Test|=100 samples
        • N={100, 200, 300,...}
        • |Training|=N*90% samples
        • Randomize and repeat 20 times
      • Plots of N vs. balance
    • Over/ Under Sampling: Use Less...
      • Software Datasets are not balanced
        • ~10% Defective
      • Target Class: Defective (modules)
      • Under Sampling:
        • Use all target class instances, say N
        • Pick N from other class
        • Learn theories on 2N instances
      • Over Sampling:
        • Use all from other class, say M (M>N)
        • Using N target class instances, populate M instances
        • Learn theories on 2M instances
    • Over/ Under Sampling: Use Less...
      • NB/none is still among the best
      • Sampling J48 does not out-perform NB
      • NB/none is equivalent with NB/ under
      • Under sampling does not harm classifier performance.
      • Theories can be learned from a very small sample of available data
    • Micro Sampling: Use Even Less...
      • Given N defective modules:
        • M = {25, 50, 75, ...} <= N
        • Select M defective and M defect-free modules.
        • Learn theories on 2M instances
      • Undersampling: M=N
      • 8/12 datasets -> M = 25
      • 1/12 datasets -> M = 75
      • 3/12 datasets -> M = {200, 575, 1025}
    • Discussions
      • Incremental Case Based Reasoning
      • Automatic Data Miners
      • When is CBR preferable to ADM?
        • Impractical in large number of cases
      • Our results suggest 50 samples are adequate.
      • CBR can perform as well as ADM.
      • One step further: CBR can perform better than ADM.
    • Example 1: Requirement Metrics
      • Does not mean “Use Requirement Docs” all the time!
      • Combine features from whatever sources available.
      • Explore whatever is not a black-box approach.
      • Consistent with prior research
      • SE should make use of domain specific knowledge!
      From: Text Mining To: NLP Subject: Semantics
    • Example 2: Simple Weighting
      • Combine features wisely!
      • Black-box Feature Selection -> NP-hard.
      • Information provided by black-box approach is not necessarily meaningful to humans.
      • Information provided by humans is meaningful for black-boxes.
      Check the validity of NB assumptions!
    • Example 3: WHICH Rule Learner
      • Current practice:
        • Learn predictors with criteria P
        • Assess predictors with criteria Q
        • In general: P≠Q
      • WHICH supports defining P≈Q
        • Learn what you will assess later.
      • micro20 means only 20+20 samples.
      • WHICH initially creates a sorted stack of all attribute ranges in isolation.
      • It then, based on score, randomly selects two rules from the stack, combines them, and places the new rule in the stack in sorted order.
      • It continues to do this until a stopping criterion is met.
      • WHICH supports both conjunction and disjunctions.
      • If a the two rules selected both contain different ranges from the same attribute, they are OR'd together instead of AND'd
      outlook=sunny AND rain=true outlook=overcast outlook = [ sunny OR overcast ] AND rain = true Example 3: WHICH Rule Learner
    • Example 4: NN-Sampling
      • Within vs. Cross Company Data
        • Substantial increase in pd...
        • ...with the cost of substantial increase in pf.
        • CC Data should only be used for mission critical projects
        • Companies should starve for local (WC) data
      • Why?
        • CC data contains a larger space of samples...
        • ...it also includes irrelavancies.
      • Howto decrese pf?
        • Remove irrelavancies by sampling from CC data.
    • Example 4: NN-Sampling
      • Same patterns in:
        • NASA MDP and
        • Turkish washing machines
    • Conclusions
      • Defect predictors are practical tools
      • Limited information content hypothesis
          • Simple learners can extract limited information content
          • No need for more complex learners
          • Further progress needs increasing the information content in data
      • Current research paradigm has reached its limits
      • Black-box methods lack the business knowledge
      • Human-in-the-loop CBR tools should take place
        • Practical: Small samples to examine
        • Instantaneous: ADM will run fast
        • Direction: Increase information content
      Promise data: OK. What about Promise tools? Increase in information content? Building predictors aligned with business goals.
    • Future Work
      • Benchmark Human-in-the-loop CBR against ADM.
      • Instead of which learner , ask which data.
      • Better sampling strategies?
    • Thanks...
      • Questions ?