Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Improving the quality of large scale database-centric software systems by analyzing database access code
1. Improving the Quality of Large-Scale
Database-Centric Software Systems by
Analyzing Database Access Code
1
Tse-Hsun(Peter) Chen
Supervised by Ahmed E. Hassan
3. Inefficient data access
3
for each userId{
…
executeQuery(“select … where
u.id = userId”);
}
Code
SQL select … from user where u.id = 1
select … from user where u.id = 2
…
4. Common performance problems in
database-access code
4
DBMSCode
• Inefficient data access
• Unneeded data access
8. Cannot find problems by only looking
at the DBMS side
8
DBMS
select … from user where u.id = 1
select … from user where u.id = 2
…for each userId{
…
executeQuery(“
select … where u.id = userId”);
}
9. Database accesses are abstracted
9
DBMSCode
Abstraction
Layers
Problems become more complex and
frequent after adding abstraction layers
10. Accessing data using ORM incorrectly
10
@Entity
Class User{
…
}
for each userId{
User u = findUserById(userId);
u.getName();
}
A database
entity class
Objects
SQL
select … from user where u.id = 1
select … from user where u.id = 2
…
11. Research statement
There are common problems in how developers write
database-access code, and these problems are hard to
diagnose by only looking at the DBMS. In addition, due
to the use of different database abstraction layers, these
problems become even harder to find.
By finding problems in the database access code and
configuring the abstraction layers correctly, we can
significantly improve the performance of database-
centric systems.
11
12. 12
Detecting inefficient
data access code
Overview of the thesis
Detecting unneeded
data access
Finding overly-strict
isolation level
Finished work Future work
Under
submission
13. Detecting unneeded
data access
13
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
14. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
14
15. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
15
Focus on one type of inefficient
data access: one-by-one processing
16. Detecting one-by-one processing
using static analysis
16
First find all the functions that
read/write from/to DBMS
Class User{
getUserById()…
getUserAddress()…
}
Identify the positions of all loops
for each userId{
foo(userId)
}
Check if the loop contains any
database-accessing function
foo (userId){
getUserById(userId)
}
17. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
17
18. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
18
19. Assessing inefficient data access
impact by fixing the problem
Execution
Response time
for u in users{
update u
}
Code with
inefficient data access
users.updateAll()
Code without
inefficient data access
19
Execute test
suite 30 times
Response time
after the fix
Avg. %
improvement
Execution
Execute test
suite 30 times
20. Inefficient data access causes
large performance impacts
0%
20%
40%
60%
80%
100%
One-by-one processing
20
%improvementinresponsetime
21. Detecting unneeded
data access
21
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
22. 22
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
23. Intercepting system executions using
byte code instrumentation
ORM
DBMS
23
Intercept called
functions in the code
Intercept ORM
generated SQLs
Needed
data
Requested
data
diff
24. Mapping data access to called
functions
24
@Entity
@Table(name = “user”)
public class User{
…
@Column(name=“name”)
String userName;
@OneToMany
List<Team> teams;
public String getUserName(){
return userName;
}
public void addTeam(Team t){
teams.add(t);
}
User.java
We apply static analysis to
find which database
column a function is
reading/modifying
24
25. Identifying unneeded data access
from intercepted data
Only user name is needed in the
application logic
Needed user name in code
Requested id, name,
address, phone number
25
26. 26
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
27. Detecting unneeded
data access
27
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
28. A real life example of transaction
abstraction problem
28
All functions in foo will be
executed in transactions
@Transactional
public class foo{
int getFooVal(){
return foo.val;
}
…
}
This function does not need to be
executed in transactions
Where to put the transaction may affect
system behavior and performance
29. Plans for finding transaction problems
related abstraction
29
Bug
Reports
Categories of
transaction-
related bugs
We plan to empirically study bug reports
to find root causes and implement a
detection framework