Components - Graph Based Detection of Library API Limitations

Graph-based Detection of
Library API Imitations

Chengnian Sun, Siau-Cheng Khoo, Shao Jie Zhang
National University of Singapore

1 October 6, 2011

Motivation – Software Libraries
 Common practice to employ 3rd-party software libraries
 Providing certain functionalities / hiding implementation details
  Improving productivity
 Well tested
  Enhancing program quality

 Application Programming Interfaces (APIs)
 Exported by libraries
 Ways for programmers to interact with libraries

2 October 6, 2011

Motivation – Problem
 APIs are not always effectively used by programmers
 Imitation: client code re-implements the behavior of library
APIs

 Reasons
 Unfamiliar with the library,
 Library evolution

 Cost
 Waste unnecessary resources, time and energy
 Error-prone, software maintenance issue

3 October 6, 2011

Motivation – Example from JBoss

4 October 6, 2011

Imitation (1): method.getInterceptors() == null ||
method.getInterceptors().length < 1

5 October 6, 2011


API: return (interceptors != null && interceptors.length > 0)

6 October 6, 2011


Refactor to: !method.hasAdvices()

7 October 6, 2011


Refactor to: !method.hasAdvices()

8 October 6, 2011

Motivation
 A library API imitation can be
 Not exactly the same
 Inter-procedural

9 October 6, 2011

Motivation
 A library API imitation can be
 Not exactly the same
 Inter-procedural
 Goal: to accurately detect such imitations

10 October 6, 2011

Detection of Library API Imitations
 Motivation
 Definitions
 Data Dependency Graph
 Trace & Subtrace
 Trace Subsumption
 Potential Imitation
 Algorithms
 Pre- & Post-processing
 Case Studies
 Conclusion

11 October 6, 2011

Definitions – Overview
 Employing Data Dependency Graphs (DDG) to represent
code
 Semantic representation
 Capturing data flows within a method
 Carrying a portion of control flow information

 A library DDG is trace-subsumed by a client DDG 
potential API imitation
 Relaxation of sub-graph isomorphism
 More efficient
 Minor-difference tolerant

12 October 6, 2011

Definitions – Data Dependency Graph
 DDG – a graphical representation of a method

 Vertices: basic statements (three address form)

 Edges v  u: direction represents data dependency
 vertex u is data dependent on vertex v
 a variable var
 defined at v
 used at u
 and there is an execution path P from v to u, and along P, the
var is not redefined.

13 October 6, 2011

Definitions – Trace & Subtrace
 A trace in a data dependency graph
 A path of vertices, <v1, v2, …, vm>
 The first vertex is an entry of the graph

14 October 6, 2011

 A trace in a data dependency graph T1 = <C, D, E>
 A path of vertices, <v1, v2, …, vm> T2 = <A, B, C, D, E, F>


 Given two traces T1 = <v1, v2, …, vm> and T2 = <u1, u2, …, un>,T1
is a subtrace of T2 (T1 ≤ T2) if there exists an integer i,
 0≤i≤n–m
 match(v1, u1 + i), match(v2, u2 + i), …, match(vm, um + i)

 Subtrace is a generalization of substring relation.

15 October 6, 2011

 A trace in a data dependency graph T1 = <C, D, E>
 A path of vertices, <v1, v2, …, vm> T2 = <A, B, C, D, E, F>
i=2

 Given two traces T1 = <v1, v2, …, vm> and T2 = <u1, u2, …, un>,T1
is a subtrace of T2 (T1 ≤ T2) if there exists an integer i,
 0≤i≤n–m
 match(v1, u1 + i), match(v2, u2 + i), …, match(vm, um + i)

 Subtrace is a generalization of substring relation.

16 October 6, 2011

Definitions – Trace Subsumption
 A data dependency graph Glib
 A data dependency graph Gclt
 Gclt trace subsumes Glib , if and only if
 for each trace there exists at least one trace
such that is a subtrace of

17 October 6, 2011

Definitions – Potential Imitation
 A client method Clt potentially imitates a library
method Lib, if

 A DDG Gclt of Clt, resulting from inlining zero or some
method calls into Clt

 A DDG Glib of Lib, resulting from inlining zero or some
method calls into Lib

 Gclt trace subsumes Glib

18 October 6, 2011

 Motivation
 Definitions
 Algorithms
 Overall Algorithm
 Trace Subsumption Checking
 Pre- & Post-processing
 Case Studies
 Conclusion

19 October 6, 2011

Algorithms – Overall Algorithm
 Input
 A library API Lib
 A client method Clt
 A set S of all method calls in both Lib and Clt
 Output true if Clt potentially imitates Lib
 Body
for each sub-set s of S {
Lib’ = a copy of Lib with calls in s inlined
Clt’ = a copy of Clt with calls in s inlined
if the DDG of Clt’ trace subsumes the DDG of Lib’
return true
}
return false;

20 October 6, 2011

Algorithms – Trace Subsumption
 Input
 A DDG of a library API Glib
 A DDG of a client method Gclt

 Output
 true if Gclt trace subsumes Glib

 Depth-first Search,
 Step-by-step checking

21 October 6, 2011

Algorithms – An Example
Stack:

Current:

22 October 6, 2011

Locating all vertices in client matching each entry of the library Stack: (A, {A, A})

Current:

23 October 6, 2011

Locating client vertices matching library A’s successor D Stack:

Current: (A, {A, A})

24 October 6, 2011

Locating client vertices matching library A’s successor D Stack: (D, {D})


25 October 6, 2011

Locating client vertices matching library A’s successor B Stack: (D, {D})


26 October 6, 2011

Locating client vertices matching library A’s successor B Stack: (B, {B})
(D, {D})


27 October 6, 2011

Locating client vertices matching B’s successor {} in library Stack: (D, {D})

Current: (B, {B})

28 October 6, 2011

Locating client vertices matching library D’s successor M Stack:

Current: (D, {D})

29 October 6, 2011

 Motivation
 Definitions
 Algorithms
 Pre-processing & Post-validation
 Case Studies
 Conclusion

30 October 6, 2011

Pre-processing Libraries
 Remove nullness checks
If (a ==) {
return Constant;
} else {
a.XXX();
}
 Remove assertions
if (…)
throw Exception();
…….
 Remove exception handlers
try {

} catch (…) {}

31 October 6, 2011

Post-validating Reported Imitations
 Reject the following two cases
 Unmatched Inlined Vertices in Client

 Matching All References to Library Locals

32 October 6, 2011

 Motivation
 Definitions
 Algorithms
 Pre-processing & Post-validation
 Case Studies
 Conclusion

33 October 6, 2011

Case Studies
 Evaluation measure

 Subjects – 10 open-source Java projects

 Testbed:
 Intel Core 2 Quad CPU 3.00GHz and 8GB memory

34 October 6, 2011

Case Studies – Two Experiments
 Detecting Imitations of Imported Libraries
 Testing all method pairs (lib, clt), where the declaring class of
lib is already imported in the client class
 Precision = 313 / 383 = 82%
 Runtime = 314 seconds

35 October 6, 2011

Case Studies – Two Experiments
 Detecting Imitations of Imported Libraries
 Testing all method pairs (lib, clt), where the declaring class of
lib is already imported in the client class
 Precision = 313 / 383 = 82%

 Detecting Imitations of Static Libraries
 Testing all method pairs (lib, clt), where lib is a public static
method
 Precision = 116 / 155 = 75%

36 October 6, 2011

Case Studies – Example of Static API

37 October 6, 2011

Conclusion
 A common practice to employ 3rd party software libraries

 Client code re-implements behavior of existing APIs

 An algorithm based on data dependency graphs to detect
complex imitations

 Average precision 82% & 75%

38 October 6, 2011

Thank you.

Q&A

39 October 6, 2011

Components - Graph Based Detection of Library API Limitations

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (20)

Similar to Components - Graph Based Detection of Library API Limitations

Similar to Components - Graph Based Detection of Library API Limitations (20)

Recently uploaded

Recently uploaded (20)

Components - Graph Based Detection of Library API Limitations