Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

Introducing Parameter Sensitivity to
Dynamic Code-Clone Analysis Methods
Toshihiro Kamiya
Interdisciplinary Graduate School of Sci. & Eng., Shimane Univ.
kamiya@cis.shimane-u.ac.jp
10th Int'l Workshop on Software Clones, Osaka

March 15, 2016 10th Int'l Workshop on Software Clones, Osaka 2
Outline
●
What is a dynamic code-clone analysis?
– Detection
– Visualization
– Samples
●
Parameter sensitivity
– Possible alternative techniques
[Position Paper] Toshihiro Kamiya, Introducing Parameter Sensitivity to Dynamic
Code-Clone Analysis Methods, Proc. 10th International Workshop on Software Clones
(IWSC 2016), pp. 19-20, 2016.

Dynamic code-clone analysis
●
Definition:
– Use dynamic information:
●
To detect code clones
●
To visualize such code clones
●
Aims/applications:
– Detect code clones between a code fragment and its restructured
(refactored) one
●
Observe evolution of code clones in clone management
– Find code clones w/ similarity in deep semantics (or behavior)

Detection method
●
Detection Steps
1. Collect execution trace(s) by running target program(s)
2. Find sub-sequences of the similar method invocations
3. Map such sub-sequences into code fragments
Toshihiro Kamiya, "An Execution-Semantic and Content-and-Context-Based Code-
Clone Detection and Analysis," IWSC 2015, pp. 1-7 (Mar. 6, 2015).
The details are described in

Detection method (implementation)
An implementation of step “2. Find sub-sequences of the similar method invocations”
●
Just AN implelentation. Could utilize another data structures/algorithms
2-1. Generate call tree from execution trace.
2-2. For each node of call tree, generate a SB data structure.
– String balloon incl.
●
A target node
●
Context (Location): path from root to the target node,
●
Contents: Set of nodes called by the target (both direct and indirect)
2-3. Find sets of SB having similar contents.
●
With a frequent item-set mining algorithm (hyper cubic decomposition [Uno03])
[Uno03] T. Uno, et al., An Efficient Algorithm for Enumerating Closed Patterns in
Transaction Databases, Discovery Science,LNCS 3245, pp. 16-31, 2003.
Revised from IWSC15's

Visualization method
●
Code fragments (of a clone class)
→ “root” nodes of sub-graphs in call
graph
●
Similarity
→ Methods called commonly in the
sub-graphs
●
Differences
→ Methods called solely in a sub-graph
main()
print_extensions
_w_for_stmt()
print_extensions
_w_map_func()
get_extensions()print
map()
lambda() at line 8
os.path.
splitext()

A sample code clone – code fragments
Applied to two CLI HTTP-client tools
– prog 1: https://github.com/chrislongo/HttpShell
– prog 2: https://pypi.python.org/pypi/httpie
Inputs URL, outputs HTML text.

Calling the same function:
pygments.highlight()

Similar?
- Yes.
But why?

A sample code clone – call graph
.
. .
.
.
2../ColorFormatter/get_lexer
.
.. pygments.util//
get_bool_opt
1.pygments.formatters.terminal/
TerminalFormatter/__init__ .
.
StringIO/StringIO/write
pygments.lexer//streamer
.
.
.
pygments.lexers//
_load_lexers
pygments.lexer//
__call__
1.pygments.lexers/
/guess_lexer
.
.
re//_compile
1../AnsiLogger/
print_data
pygments//highlight
2../ColorFormatter/
format_body
...
...
...
.
.
pygments//format
pygments//lex
… have common
contents.
Because these method calls of
guess_lexer() and get_lexer() ...

●
But this example is the best one in an experiment.
●
Not always so lucky in general practice ...

A bad example from detection result
●
Code fragments calling utility functions are sometimes
detected as a code clone�

March 15, 2016
●
detected as a code clone ☹
– Code fragments of a clone class
●
●
●

●
detected as a code clone ☹

●
Code fragments calling utility functions are sometimes detected as a
code clone ☹
●
cli.py (an entry point) from prog 2
●
_get_proxy_info() from prog 1
●
should_bypass_proxy() from prog 2
– Calling functions of regular exp. and assoc. array, i.e. utility functions
– Results in a false positive: cli.py and others
(True positive: _get_proxy_info() and should_bypass_proxy())

An idea: Parameter sensitivity
●
Execution trace also includes argument values of each method invocations �
●
Add argument value(s) to node labels
– re//_compile.’[ˆA-Za-z0-9.]+’ or
– re//_compile.’[ˆ-]+’ in place of re//_compile
to distinguish these calls of utility functions.
●
Need to introduce value semantics (may challenging )�
– ’[0-9]’ == ’d’ (when interpreted as regular exp.)
– 0xff == 255

Alternative techniques
●
Threshold about ratio of shared nodes
– Yet another parameter on clone detection ☹
●
Depends on stack depth ?�
●
Pre-defined, manual classification of “Utility” functions☹
– When target code including new(unknown) libraries
●
Considering order of method invocations
– Such as Smith-Waterman algorithm (applied to static clone detection in
[Marukami13])
– Yet another parameter of tool ☹
●
Depends on length of code fragments ?�
–[Marukami13] H. Murakami, K. Hotta, Y. Higo, H. Igaki, Gapped Code Clone
Detection with Lightweight Source Code Analysis, ICPC 2013, pp. 93-102, 2013.

Summary
●
A dynamic code-clone detection
– Based on frequent item-set mining of method invocations
●
Utility functions (methods) make false positive.
●
Possible solutions/open questions
– parameter sensitivity,
– threshold about ratio of shared nodes,
– manual classification of “Utility” functions,
– order of method invocations

Another bad example

●
format_headers() of prog2
●
print_data() of prog1

Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods

Similar to Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods (20)

More from Kamiya Toshihiro

More from Kamiya Toshihiro (14)

Recently uploaded

Recently uploaded (20)

Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods