Due to the emergence of cloud computing and big
data applications, modern software systems are becoming more dependent on the underlying database management systems (DBMSs) for data integrity and management. Since DBMSs are very complex and each technology has some implementation-specific differences, DBMSs are usually used as black boxes by software developers, which allow better adaption and abstraction of different database technologies. For example, Object-Relational Mapping (ORM) is one of the most popular database abstraction approaches that developers use. Using ORM, objects in Object-Oriented languages are mapped to records in the DBMS, and object manipulations are automatically translated to SQL queries. Despite ORM’s convenience, there exists impedance mismatches between the Object-Oriented paradigm and the
relational DBMSs. Such impedance mismatches may result in
developers writing inefficiently and/or incorrectly database access code. Thus, this thesis proposes several approaches to improve the quality of database-centric software systems by looking at the application source code. We focus on troubleshooting and detecting inefficient (i.e., performance problems) and incorrect (i.e., functional problems) database accesses in the source code, and we prioritize the detected problems based on severity. Through case studies on large commercial and open source systems, we plan to demonstrate the value of improving the quality of database-centric software systems from a new perspective – helping developers access the database more efficiently and accurately.
Unleash Your Potential - Namagunga Girls Coding Club
ICDE2015PhD - Improving the Quality of Large-Scale Database-Centric Software Systems by Analyzing Database Access Code
1. Improving the Quality of Large-Scale
Database-Centric Software Systems by
Analyzing Database Access Code
1
Tse-Hsun(Peter) Chen
Supervised by Ahmed E. Hassan
3. Inefficient data access
3
for each userId{
…
executeQuery(“select … where
u.id = userId”);
}
Code
SQL select … from user where u.id = 1
select … from user where u.id = 2
…
4. Common performance problems in
database-access code
4
DBMSCode
• Inefficient data access
• Unneeded data access
8. Cannot find problems by only looking
at the DBMS side
8
DBMS
select … from user where u.id = 1
select … from user where u.id = 2
…for each userId{
…
executeQuery(“
select … where u.id = userId”);
}
9. Database accesses are abstracted
9
DBMSCode
Abstraction
Layers
Problems become more complex and
frequent after adding abstraction layers
10. Accessing data using ORM incorrectly
10
@Entity
Class User{
…
}
for each userId{
User u = findUserById(userId);
u.getName();
}
A database
entity class
Objects
SQL
select … from user where u.id = 1
select … from user where u.id = 2
…
11. Research statement
There are common problems in how developers write
database-access code, and these problems are hard to
diagnose by only looking at the DBMS. In addition, due
to the use of different database abstraction layers, these
problems become even harder to find.
By finding problems in the database access code and
configuring the abstraction layers correctly, we can
significantly improve the performance of database-
centric systems.
11
12. 12
Detecting inefficient
data access code
Overview of the thesis
Detecting unneeded
data access
Finding overly-strict
isolation level
Finished work Future work
Under
submission
13. Detecting unneeded
data access
13
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
14. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
14
15. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
15
Focus on one type of inefficient
data access: one-by-one processing
16. Detecting one-by-one processing
using static analysis
16
First find all the functions that
read/write from/to DBMS
Class User{
getUserById()…
getUserAddress()…
}
Identify the positions of all loops
for each userId{
foo(userId)
}
Check if the loop contains any
database-accessing function
foo (userId){
getUserById(userId)
}
17. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
17
18. Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
18
19. Assessing inefficient data access
impact by fixing the problem
Execution
Response time
for u in users{
update u
}
Code with
inefficient data access
users.updateAll()
Code without
inefficient data access
19
Execute test
suite 30 times
Response time
after the fix
Avg. %
improvement
Execution
Execute test
suite 30 times
20. Inefficient data access causes
large performance impacts
0%
20%
40%
60%
80%
100%
One-by-one processing
20
%improvementinresponsetime
21. Detecting unneeded
data access
21
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
22. 22
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
23. Intercepting system executions using
byte code instrumentation
ORM
DBMS
23
Intercept called
functions in the code
Intercept ORM
generated SQLs
Needed
data
Requested
data
diff
24. Mapping data access to called
functions
24
@Entity
@Table(name = “user”)
public class User{
…
@Column(name=“name”)
String userName;
@OneToMany
List<Team> teams;
public String getUserName(){
return userName;
}
public void addTeam(Team t){
teams.add(t);
}
User.java
We apply static analysis to
find which database
column a function is
reading/modifying
24
25. Identifying unneeded data access
from intercepted data
Only user name is needed in the
application logic
Needed user name in code
Requested id, name,
address, phone number
25
26. 26
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
27. Detecting unneeded
data access
27
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
28. A real life example of transaction
abstraction problem
28
All functions in foo will be
executed in transactions
@Transactional
public class foo{
int getFooVal(){
return foo.val;
}
…
}
This function does not need to be
executed in transactions
Where to put the transaction may affect
system behavior and performance
29. Plans for finding transaction problems
related abstraction
29
Bug
Reports
Categories of
transaction-
related bugs
We plan to empirically study bug reports
to find root causes and implement a
detection framework