final_ICSE '22 Presentaion_Sherry.pdf

Computer Science
Learning to Recognize Actionable Static Code
Warnings (is Intrinsically Easy)
– Journal First
1
Xueqi Yang, Jianfeng Chen, Rahul Yedida, Zhe Yu and Tim Menzies
ICSE ’22, 2022, Pittsburgh, PA, USA

Computer Science
Introduction
What is Static analysis?
• debugging without actually executing
programs
• analyzing code against coding guidelines
or rules
• Performed early in development and
before software testing
2

Computer Science
Introduction
What is Static analysis tool?
• Manually checking:
Time-consuming & expensive
• Leverage SA techniques to inspect
program for the occurrence of bug
patterns
3

Computer Science
Challenges
• High false positive rate
Why high false positive rate?
- SA tools are over-cautious
- 35% - 91% of warnings generated
can not be acted on [Heckman’
2011]
4

Computer Science
Empirical Study
Findbugs:
- One of the most commonly-used SA
tools
- Open-sourced SA tool for Java
programs
- Downloaded for over one million
times
5

Computer Science
Empirical Study
Findbugs:
Analysis Java code with 424 bug patterns
grouped into 9 types
- bad practice
- correctness
- experimental internationalization,
- malicious code vulnerability
- multithreaded correctness
- performance
- security
- dodgy code
6

Computer Science
Roadmap
● Wang et al. (EMSE’18)
○ Data and feature collection
○ Golden features
● Yang et al. (ESWA’21 and EMSE’21)
○ Incremental active learning
○ Deep learning and data simplicity
● Kang et al. (ICSE’22)
○ Data leakage(features and
instance)
○ Data refactoring
● Yedida et al. (targeted to TSE, preprint
available under request)
○ Boundary engineering, label
engineering, learner engineering
and instance engineering
7

Computer Science
Dataset
- Label: actionable &
unactionable [Liang’ 2010]
- Version control system &
issue tracking system
8
Grand truth:

Computer Science
Dataset
Feature extraction [Wang’
2018]:
- Slicing features from 9 SE
projects with a Java tool
- Feature name, category,
meaning, method
9

Computer Science
Dataset
8 categories of golden
features:
Warning combination (6)
Code characteristics (5)
Warning characteristics (4)
File history (3)
Code analysis (2)
Code history (2)
Warning history (1)
File characteristics (0)
10

Computer Science
My refuted EMSE ‘21
What happens when you write a paper
and it gets refuted by an ICSE paper?
11
- Response 1: go home and cry
- Response 2: sit down with your
critics to work out what to do next

Computer Science
Roadmap
● Wang et al. (EMSE’18)
○ Data and feature collection
○ Golden features
● Yang et al. (ESWA’21 and EMSE’21)
○ Incremental active learning
○ Deep learning and data simplicity
● Kang et al. (ICSE’22)
○ Data leakage(features and
instance)
○ Data refactoring
● Yedida et al. (targeted to TSE, preprint
available under request)
○ Boundary, label, learner, instance
engineering
12

Computer Science
Data refactoring
Kang et al.
● Instance leakage
○ Manually labeled 1,357
warnings and with 768
remained
● Feature leakage
○ 5 features(warning context in
method, file, for warning type, defect
likelihood, discretization of defect
likelihood)
13

Computer Science
Data refactoring
Kang et al.
● Instance leakage
○ Manually labeled 1,357
warnings and with 768
remained
● Feature leakage
○ 5 features(warning context in method,
file, for warning type, defect likelihood,
discretization of defect likelihood)
14

Computer Science
Data refactoring
Kang et al.
Applied off-the-shelf SVM model
● Data leakage
● Open issue: can’t get good
predictors
15

Computer Science
What happens when all these peoples worked together?
Collaboration in open science:
● RAISE Lab & SOAR Group
16
- What’s the
problem?
- What kang et al.
says about Yang
et al.?
- What new
results we came
up with?

Computer Science
Collaboration in open science
Collaboration of two labs:
● Under submission to TSE
○ Boundary engineering
(GHOSTing)
○ Label engineering
(SMOOTHing)
○ Learner engineering
○ Instance engineering
(SMOTEing)
17

Computer Science
Collaboration of two labs:
● Under submission to TSE
○ Boundary engineering
(GHOSTing)
○ Label engineering
(SMOOTHing)
○ Learner engineering
○ Instance engineering
(SMOTEing)
18

Computer Science
19
Prelim results of the
ablation study:

Computer Science
Reference
[Kang’ 2022] Kang, Hong Jin, Khai Loong Aw, and David Lo. "Detecting False Alarms from Automatic Static Analysis Tools: How Far are
We?." arXiv preprint arXiv:2202.05982 (2022).
[Yang’ 2021] Yang, Xueqi, et al. "Learning to recognize actionable static code warnings (is intrinsically easy)." Empirical Software
Engineering 26.3 (2021): 1-24.
[Yang’ 2021] Yang, Xueqi, et al. "Understanding static code warnings: An incremental AI approach." Expert Systems with Applications
167 (2021): 114134.
[Wang’ 2018] Wang, Junjie, Song Wang, and Qing Wang. "Is there a" golden" feature set for static warning identification? an
experimental evaluation." Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and
measurement. 2018.
[Heckman’ 2011] Heckman, Sarah, and Laurie Williams. "A systematic literature review of actionable alert identification techniques for
automated static code analysis." Information and Software Technology 53.4 (2011): 363-387.
[Levina’ 2004] Levina, Elizaveta, and Peter Bickel. "Maximum likelihood estimation of intrinsic dimension." Advances in neural
information processing systems 17 (2004).
20

final_ICSE '22 Presentaion_Sherry.pdf

More Related Content

Similar to final_ICSE '22 Presentaion_Sherry.pdf

Recently uploaded

final_ICSE '22 Presentaion_Sherry.pdf