Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
An Algorithm for Keyword Search on an Execution Path
1. An Algorithm
for Keyword Search
on an Execution Path
Toshihiro Kamiya
Future University Hakodate
kamiya@fun.ac.jp
2. Background #1: Code searching
Developers do search!
➤ To find reusable components for a function of a product
➤ To find similar code fragments before modifying a code
➤ To find code samples showing usage a given class or
component
CSMR-WCRE-2014 Era Track
2
3. Background #2: Emerging
fine-grained module technologies
More and more fine-grained modules are used.
●
Object/Closure
extract a data and its manipulation
●
Aspect
extract interests, a set of code invoked by a specific
condition or event
●
Dependency Injection
split code at each dependency
CSMR-WCRE-2014 Era Track
3
4. Problem: Searching on fine-grained
modules
Code search becomes difficult by
fine-grained modules
(Old days) the search result was
contained in a file
↓
(Now) is a set of several parts of several
files
Old days
This affects code-search methods in both
●
Algorithm
–
●
Now
“how to find”
Displaying/Visualizing
–
“how to show search results”
CSMR-WCRE-2014 Era Track
4
5. Solution: Keyword Search on an
Execution Path
●
●
Static analysis
Find the execution paths that include given keywords
●
●
●
From all possible execution paths of a target program
Idea: a compact data structure (And/Or/Call graph) of
execution paths + search algorithm on it
A prototype implementation
●
applied to up-to 183k lines of Java source code
Related work
●
●
Prospector[8]
PARSEWeb[9]
CSMR-WCRE-2014 Era Track
5
6. And/Or/Call Graph
●
●
A DAG contains all execution
paths in a compact form
Source code
Repetitive structure
➡ Selection among sequences
of 0-time repetition, 1-time
repetition,2-times repetition, ...
➡ Or node having And nodes as
children
s3
Selection structure ➡ Or node
–
s2
Sequence structure ➡ And node
–
–
Method call ➡ Call node
●
Tex
s1
s1;
s2;
s3;
is generated by the following
translation rules
–
Graphical form
if (...) {
st;
} else {
se;
}
st
se
interface I { m(); }
class
m()
}
class
m()
}
B implements I {
{...}
C implements I {
{...}
I i;
...
i.m();
B//m
C//m
Dynamic dispatching
CSMR-WCRE-2014 Era Track
6
10. Search Algorithm
●
●
Input: Keywords to identify nodes
Output: Connected sub-graphs including the
nodes identified with the keywords
“connected sub-graph” → continuous execution path
●
Heuristics
–
Find deepest nodes
← Assumption: small operation is easy to understand
–
Extract shallowest sub-graph(treecut)
← Assumption: deep method-invocation chain is difficult to
understand
CSMR-WCRE-2014 Era Track
10
11. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
getDay
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.
parseInt
parseInt
parseInt
Calendar//set
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
CSMR-WCRE-2014 Era Track
11
12. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
getDay
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.
parseInt
parseInt
parseInt
Calendar//set
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
summary
CSMR-WCRE-2014 Era Track
12
13. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
getDay
parseInt
parseInt
parseInt
Calendar//set
getToday
Calendar//getIntance
getDayOfWeek
For any node n and its any child node c
printf
S(n) ⊇ S(c).
summary
A root node has a summary of local
{ “Calendar//getInstance”,
maximum.
Calender//get
“Calendar//set”,“split”, “parseInt” }
CSMR-WCRE-2014 Era Track
13
14. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
getDay
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.
parseInt
parseInt
parseInt
Calendar//set
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
summary
{ “Calendar//getInstance”, “Calendar//get”,
“Calendar//set”, “getDay”, “getDayOfWeek”,
“split”, “parseInt”, “printf” }
CSMR-WCRE-2014 Era Track
14
15. Steps of search algorithm
(S1) finds query-fulfilling sub-trees of the (local)
maximum depths
–
by comparing summary of each node with the query
(S2) makes the shallowest treecut
–
by removing deeper leaf nodes until the treecut
does not fulfill the query anymore.
(S3) removes uncontributing leaf nodes
–
Uncontributing = its label does not match any of the
query keywords
CSMR-WCRE-2014 Era Track
15
16. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance
(S2) makes the shallowest
treecut
split
getDay
(S3) removes uncontributing leaf
nodes
parseInt
parseInt
parseInt
Calendar//set
main
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
CSMR-WCRE-2014 Era Track
16
17. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance
(S2) makes the shallowest
treecut
split
getDay
(S3) removes uncontributing leaf
nodes
parseInt
parseInt
parseInt
Calendar//set
main
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
{ “Calendar//getInstance”, “Calendar//get”,
“Calendar//set”, “getDay”, “getDayOfWeek”,
“split”, “parseInt”, “printf” }
CSMR-WCRE-2014 Era Track
17
18. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance
(S2) makes the shallowest
treecut
split
getDay
(S3) removes uncontributing leaf
nodes
parseInt
parseInt
parseInt
Calendar//set
main
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
CSMR-WCRE-2014 Era Track
18
19. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
(S2) makes the shallowest
treecut in each of the sub-trees
Query
{ “Calender//get”,“Calender//set” }
getDay
Calendar//set
main
(S3) removes uncontributing leaf
nodes
getDayOfWeek
Search result
CSMR-WCRE-2014 Era Track
Calender//get
main {
getDay {
Calendar//set
}
getDayOfWeek {
Calendar//get
}
}
19
20. Prototype tool
Implementation
●
Target: Java source
code
–
●
●
Limitations
●
Keywords
–
Analysis of Java's
dynamic dispatch
Written in 8k lines of
Python
Applied up-to 183kloc
product (jEdit)
–
●
Exception handling
–
●
Names of class or method
Text in string literal
Does not search in the
execution paths that throw
Entry points
–
–
main() and static initializers
Does not search for entry
points such as @Test
CSMR-WCRE-2014 Era Track
20
21. Java class files
(bytecode)
Dynamic-dispatch analysis
Type hierarchy
Method-body analysis
Method calls
Control flow
Indexing
Method signature
Dynamic-dispatch resolver
And/Or/Call graph
of method body
Node label
Whole-program graph building
Node summary building
And/Or/Call
graph
Node
summary
Line number
table
Query
Searching
Keyword-query search
Sub-graph /
Execution path
Formatting
Search result
CSMR-WCRE-2014 Era Track
21
22. Applied to jEdit
●
H/W
–
–
●
Indexing
–
–
●
CPU Xeon E5520 2.27GHz
32GiB mem.
48.8 sec. in elapsed time
644 MiB peak mem.
Searching
–
–
3.09 ∼ 72.2 (ave. 5.71)
sec. in elapsed time
up-to 1412 MiB peak mem.
CSMR-WCRE-2014 Era Track
22
23. Summary
●
Background
–
–
●
●
Problem: Searching on fine-grained modules
Solution: Keyword search on an execution Path
–
–
●
#1: Code searching
#2: Emerging of fine-grained module technologies
And/Or/Call graph, Label/summary
Search algorithm
Prototype implementation
Applied to jEdit
●
GitHub
–
https://github.com/tos-kamiya/agoat/
CSMR-WCRE-2014 Era Track
23