Presentation of:
[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones (IWSC 2016), pp. 19-20, 2016.
Notice: re-uploaded on March 16, 2016. (Fix "IWSC05's" -> "IWSC15's" on page 5)
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
1. Introducing Parameter Sensitivity to
Dynamic Code-Clone Analysis Methods
Toshihiro Kamiya
Interdisciplinary Graduate School of Sci. & Eng., Shimane Univ.
kamiya@cis.shimane-u.ac.jp
10th Int'l Workshop on Software Clones, Osaka
2. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 2
Outline
●
What is a dynamic code-clone analysis?
– Detection
– Visualization
– Samples
●
Parameter sensitivity
– Possible alternative techniques
[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic
Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones
(IWSC 2016), pp. 19-20, 2016.
3. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 3
Dynamic code-clone analysis
●
Definition:
– Use dynamic information:
●
To detect code clones
●
To visualize such code clones
●
Aims/applications:
– Detect code clones between a code fragment and its restructured
(refactored) one
●
Observe evolution of code clones in clone management
– Find code clones w/ similarity in deep semantics (or behavior)
4. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 4
Detection method
●
Detection Steps
1. Collect execution trace(s) by running target program(s)
2. Find sub-sequences of the similar method invocations
3. Map such sub-sequences into code fragments
Toshihiro Kamiya, "An Execution-Semantic and Content-and-Context-Based Code-
Clone Detection and Analysis," IWSC 2015, pp. 1-7 (Mar. 6, 2015).
The details are described in
5. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 5
Detection method (implementation)
An implementation of step “2. Find sub-sequences of the similar method invocations”
●
Just AN implelentation. Could utilize another data structures/algorithms
2-1. Generate call tree from execution trace.
2-2. For each node of call tree, generate a SB data structure.
– String balloon incl.
●
A target node
●
Context (Location): path from root to the target node,
●
Contents: Set of nodes called by the target (both direct and indirect)
2-3. Find sets of SB having similar contents.
●
With a frequent item-set mining algorithm (hyper cubic decomposition [Uno03])
[Uno03] T. Uno, et al., An Efficient Algorithm for Enumerating Closed Patterns in
Transaction Databases, Discovery Science,LNCS 3245, pp. 16-31, 2003.
Revised from IWSC15's
6. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 6
Visualization method
●
Code fragments (of a clone class)
→ “root” nodes of sub-graphs in call
graph
●
Similarity
→ Methods called commonly in the
sub-graphs
●
Differences
→ Methods called solely in a sub-graph
main()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
get_extensions()print
map()
lambda() at line 8
os.path.
splitext()
7. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 7
A sample code clone – code fragments
Applied to two CLI HTTP-client tools
– prog 1: https://github.com/chrislongo/HttpShell
– prog 2: https://pypi.python.org/pypi/httpie
Inputs URL, outputs HTML text.
8. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 8
A sample code clone – code fragments
9. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 9
A sample code clone – code fragments
Calling the same function:
pygments.highlight()
10. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 10
A sample code clone – code fragments
Similar?
- Yes.
But why?
11. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 11
A sample code clone – call graph
.
. .
.
.
2../ColorFormatter/get_lexer
.
.. pygments.util//
get_bool_opt
1.pygments.formatters.terminal/
TerminalFormatter/__init__ .
.
StringIO/StringIO/write
pygments.lexer//streamer
.
.
.
pygments.lexers//
_load_lexers
pygments.lexer//
__call__
1.pygments.lexers/
/guess_lexer
.
.
re//_compile
1../AnsiLogger/
print_data
pygments//highlight
2../ColorFormatter/
format_body
...
...
...
.
.
pygments//format
pygments//lex
… have common
contents.
Because these method calls of
guess_lexer() and get_lexer() ...
12. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 12
●
But this example is the best one in an experiment.
●
Not always so lucky in general practice ...
13. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 13
A bad example from detection result
●
Code fragments calling utility functions are sometimes
detected as a code clone�
14. March 15, 2016
A bad example from detection result
●
Code fragments calling utility functions are sometimes
detected as a code clone ☹
– Code fragments of a clone class
●
●
●
15. A bad example from detection result
●
Code fragments calling utility functions are sometimes
detected as a code clone ☹
– Code fragments of a clone class
16. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 16
A bad example from detection result
●
Code fragments calling utility functions are sometimes detected as a
code clone ☹
– Code fragments of a clone class
●
cli.py (an entry point) from prog 2
●
_get_proxy_info() from prog 1
●
should_bypass_proxy() from prog 2
– Calling functions of regular exp. and assoc. array, i.e. utility functions
– Results in a false positive: cli.py and others
(True positive: _get_proxy_info() and should_bypass_proxy())
17. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 17
An idea: Parameter sensitivity
●
Execution trace also includes argument values of each method invocations �
●
Add argument value(s) to node labels
– re//_compile.’[ˆA-Za-z0-9.]+’ or
– re//_compile.’[ˆ-]+’ in place of re//_compile
to distinguish these calls of utility functions.
●
Need to introduce value semantics (may challenging )�
– ’[0-9]’ == ’d’ (when interpreted as regular exp.)
– 0xff == 255
18. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 18
Alternative techniques
●
Threshold about ratio of shared nodes
– Yet another parameter on clone detection ☹
●
Depends on stack depth ?�
●
Pre-defined, manual classification of “Utility” functions☹
– When target code including new(unknown) libraries
●
Considering order of method invocations
– Such as Smith-Waterman algorithm (applied to static clone detection in
[Marukami13])
– Yet another parameter of tool ☹
●
Depends on length of code fragments ?�
–[Marukami13] H. Murakami, K. Hotta, Y. Higo, H. Igaki, Gapped Code Clone
Detection with Lightweight Source Code Analysis, ICPC 2013, pp. 93-102, 2013.
19. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 19
Summary
●
A dynamic code-clone detection
– Based on frequent item-set mining of method invocations
●
Utility functions (methods) make false positive.
●
Possible solutions/open questions
– parameter sensitivity,
– threshold about ratio of shared nodes,
– manual classification of “Utility” functions,
– order of method invocations
20. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 20
21. March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 21
Another bad example