An Algorithm
for Keyword Search
on an Execution Path
Toshihiro Kamiya

Future University Hakodate
kamiya@fun.ac.jp
Background #1: Code searching
Developers do search!
➤ To find reusable components for a function of a product
➤ To find si...
Background #2: Emerging
fine-grained module technologies
More and more fine-grained modules are used.
●

Object/Closure
ex...
Problem: Searching on fine-grained
modules
Code search becomes difficult by
fine-grained modules
(Old days) the search res...
Solution: Keyword Search on an
Execution Path
●
●

Static analysis
Find the execution paths that include given keywords
●
...
And/Or/Call Graph
●

●

A DAG contains all execution
paths in a compact form

Source code

Repetitive structure
➡ Selectio...
Example
12

Calendar//getIntance
split

10

getDay
8
列 1
2
列 3

Calendar//set

main
列

6

4

parseInt
parseInt
parseInt

g...
Example
12

Calendar//getIntance
split

10

getDay
8
列 1
2
列 3

Calendar//set

main
列

6

4

parseInt
parseInt
parseInt

g...
Example
12

Calendar//getIntance
split

10

getDay
8
列 1
2
列 3

Calendar//set

main
列

6

4

parseInt
parseInt
parseInt

g...
Search Algorithm
●
●

Input: Keywords to identify nodes
Output: Connected sub-graphs including the
nodes identified with t...
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names ...
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names ...
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names ...
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names ...
Steps of search algorithm
(S1) finds query-fulfilling sub-trees of the (local)
maximum depths
–

by comparing summary of e...
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths

Query
{ “Calender//get”,“Calender//set” }
Cal...
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths

Query
{ “Calender//get”,“Calender//set” }
Cal...
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths

Query
{ “Calender//get”,“Calender//set” }
Cal...
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
(S2) makes the shallowest
treecut in each of t...
Prototype tool
Implementation
●
Target: Java source
code
–

●

●

Limitations
●
Keywords
–

Analysis of Java's
dynamic dis...
Java class files
(bytecode)

Dynamic-dispatch analysis
Type hierarchy

Method-body analysis
Method calls

Control flow

In...
Applied to jEdit
●

H/W
–
–

●

Indexing
–
–

●

CPU Xeon E5520 2.27GHz
32GiB mem.
48.8 sec. in elapsed time
644 MiB peak ...
Summary
●

Background
–
–

●
●

Problem: Searching on fine-grained modules
Solution: Keyword search on an execution Path
–...
Upcoming SlideShare
Loading in...5
×

An Algorithm for Keyword Search on an Execution Path

438

Published on

Toshihiro Kamiya, "An Algorithm for Keyword Search on an Execution Path", In Proc. CSMR-WCRE 2014, pp. 328-332, 2014-02-06.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
438
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An Algorithm for Keyword Search on an Execution Path

  1. 1. An Algorithm for Keyword Search on an Execution Path Toshihiro Kamiya Future University Hakodate kamiya@fun.ac.jp
  2. 2. Background #1: Code searching Developers do search! ➤ To find reusable components for a function of a product ➤ To find similar code fragments before modifying a code ➤ To find code samples showing usage a given class or component CSMR-WCRE-2014 Era Track 2
  3. 3. Background #2: Emerging fine-grained module technologies More and more fine-grained modules are used. ● Object/Closure extract a data and its manipulation ● Aspect extract interests, a set of code invoked by a specific condition or event ● Dependency Injection split code at each dependency CSMR-WCRE-2014 Era Track 3
  4. 4. Problem: Searching on fine-grained modules Code search becomes difficult by fine-grained modules (Old days) the search result was contained in a file ↓ (Now) is a set of several parts of several files Old days This affects code-search methods in both ● Algorithm – ● Now “how to find” Displaying/Visualizing – “how to show search results” CSMR-WCRE-2014 Era Track 4
  5. 5. Solution: Keyword Search on an Execution Path ● ● Static analysis Find the execution paths that include given keywords ● ● ● From all possible execution paths of a target program Idea: a compact data structure (And/Or/Call graph) of execution paths + search algorithm on it A prototype implementation ● applied to up-to 183k lines of Java source code Related work ● ● Prospector[8] PARSEWeb[9] CSMR-WCRE-2014 Era Track 5
  6. 6. And/Or/Call Graph ● ● A DAG contains all execution paths in a compact form Source code Repetitive structure ➡ Selection among sequences of 0-time repetition, 1-time repetition,2-times repetition, ... ➡ Or node having And nodes as children s3 Selection structure ➡ Or node – s2 Sequence structure ➡ And node – – Method call ➡ Call node ● Tex s1 s1; s2; s3; is generated by the following translation rules – Graphical form if (...) { st; } else { se; } st se interface I { m(); } class m() } class m() } B implements I { {...} C implements I { {...} I i; ... i.m(); B//m C//m Dynamic dispatching CSMR-WCRE-2014 Era Track 6
  7. 7. Example 12 Calendar//getIntance split 10 getDay 8 列 1 2 列 3 Calendar//set main 列 6 4 parseInt parseInt parseInt getToday getDayOfWeek Calendar//getIntance Calender//get printf 2 0 行 1 行 2 行 3 行 4 CSMR-WCRE-2014 Era Track 7
  8. 8. Example 12 Calendar//getIntance split 10 getDay 8 列 1 2 列 3 Calendar//set main 列 6 4 parseInt parseInt parseInt getToday getDayOfWeek Calendar//getIntance Calender//get printf 2 0 行 1 行 2 行 3 行 4 CSMR-WCRE-2014 Era Track 8
  9. 9. Example 12 Calendar//getIntance split 10 getDay 8 列 1 2 列 3 Calendar//set main 列 6 4 parseInt parseInt parseInt getToday getDayOfWeek Calendar//getIntance Calender//get printf 2 0 行 1 行 2 行 3 行 4 CSMR-WCRE-2014 Era Track 9
  10. 10. Search Algorithm ● ● Input: Keywords to identify nodes Output: Connected sub-graphs including the nodes identified with the keywords “connected sub-graph” → continuous execution path ● Heuristics – Find deepest nodes ← Assumption: small operation is easy to understand – Extract shallowest sub-graph(treecut) ← Assumption: deep method-invocation chain is difficult to understand CSMR-WCRE-2014 Era Track 10
  11. 11. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – getDay A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – For any node n and its any child node c S(n) ⊇ S(c). A root node has a summary of local maximum. parseInt parseInt parseInt Calendar//set getToday getDayOfWeek Calendar//getIntance Calender//get printf CSMR-WCRE-2014 Era Track 11
  12. 12. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – getDay A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – For any node n and its any child node c S(n) ⊇ S(c). A root node has a summary of local maximum. parseInt parseInt parseInt Calendar//set getToday getDayOfWeek Calendar//getIntance Calender//get printf summary CSMR-WCRE-2014 Era Track 12
  13. 13. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – getDay parseInt parseInt parseInt Calendar//set getToday Calendar//getIntance getDayOfWeek For any node n and its any child node c printf S(n) ⊇ S(c). summary A root node has a summary of local { “Calendar//getInstance”, maximum. Calender//get “Calendar//set”,“split”, “parseInt” } CSMR-WCRE-2014 Era Track 13
  14. 14. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – getDay A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – For any node n and its any child node c S(n) ⊇ S(c). A root node has a summary of local maximum. parseInt parseInt parseInt Calendar//set getToday getDayOfWeek Calendar//getIntance Calender//get printf summary { “Calendar//getInstance”, “Calendar//get”, “Calendar//set”, “getDay”, “getDayOfWeek”, “split”, “parseInt”, “printf” } CSMR-WCRE-2014 Era Track 14
  15. 15. Steps of search algorithm (S1) finds query-fulfilling sub-trees of the (local) maximum depths – by comparing summary of each node with the query (S2) makes the shallowest treecut – by removing deeper leaf nodes until the treecut does not fulfill the query anymore. (S3) removes uncontributing leaf nodes – Uncontributing = its label does not match any of the query keywords CSMR-WCRE-2014 Era Track 15
  16. 16. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths Query { “Calender//get”,“Calender//set” } Calendar//getIntance (S2) makes the shallowest treecut split getDay (S3) removes uncontributing leaf nodes parseInt parseInt parseInt Calendar//set main getToday getDayOfWeek Calendar//getIntance Calender//get printf CSMR-WCRE-2014 Era Track 16
  17. 17. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths Query { “Calender//get”,“Calender//set” } Calendar//getIntance (S2) makes the shallowest treecut split getDay (S3) removes uncontributing leaf nodes parseInt parseInt parseInt Calendar//set main getToday getDayOfWeek Calendar//getIntance Calender//get printf { “Calendar//getInstance”, “Calendar//get”, “Calendar//set”, “getDay”, “getDayOfWeek”, “split”, “parseInt”, “printf” } CSMR-WCRE-2014 Era Track 17
  18. 18. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths Query { “Calender//get”,“Calender//set” } Calendar//getIntance (S2) makes the shallowest treecut split getDay (S3) removes uncontributing leaf nodes parseInt parseInt parseInt Calendar//set main getToday getDayOfWeek Calendar//getIntance Calender//get printf CSMR-WCRE-2014 Era Track 18
  19. 19. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths (S2) makes the shallowest treecut in each of the sub-trees Query { “Calender//get”,“Calender//set” } getDay Calendar//set main (S3) removes uncontributing leaf nodes getDayOfWeek Search result CSMR-WCRE-2014 Era Track Calender//get main { getDay { Calendar//set } getDayOfWeek { Calendar//get } } 19
  20. 20. Prototype tool Implementation ● Target: Java source code – ● ● Limitations ● Keywords – Analysis of Java's dynamic dispatch Written in 8k lines of Python Applied up-to 183kloc product (jEdit) – ● Exception handling – ● Names of class or method Text in string literal Does not search in the execution paths that throw Entry points – – main() and static initializers Does not search for entry points such as @Test CSMR-WCRE-2014 Era Track 20
  21. 21. Java class files (bytecode) Dynamic-dispatch analysis Type hierarchy Method-body analysis Method calls Control flow Indexing Method signature Dynamic-dispatch resolver And/Or/Call graph of method body Node label Whole-program graph building Node summary building And/Or/Call graph Node summary Line number table Query Searching Keyword-query search Sub-graph / Execution path Formatting Search result CSMR-WCRE-2014 Era Track 21
  22. 22. Applied to jEdit ● H/W – – ● Indexing – – ● CPU Xeon E5520 2.27GHz 32GiB mem. 48.8 sec. in elapsed time 644 MiB peak mem. Searching – – 3.09 ∼ 72.2 (ave. 5.71) sec. in elapsed time up-to 1412 MiB peak mem. CSMR-WCRE-2014 Era Track 22
  23. 23. Summary ● Background – – ● ● Problem: Searching on fine-grained modules Solution: Keyword search on an execution Path – – ● #1: Code searching #2: Emerging of fine-grained module technologies And/Or/Call graph, Label/summary Search algorithm Prototype implementation Applied to jEdit ● GitHub – https://github.com/tos-kamiya/agoat/ CSMR-WCRE-2014 Era Track 23
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×