SlideShare a Scribd company logo
1 of 38
Improving the Quality of Large-Scale
Database-Centric Software Systems by
Analyzing Database Access Code
1
Tse-Hsun(Peter) Chen
Supervised by Ahmed E. Hassan
Common performance problems in
database-access code
2
DBMSCode
• Inefficient data access
Inefficient data access
3
for each userId{
…
executeQuery(“select … where
u.id = userId”);
}
Code
SQL select … from user where u.id = 1
select … from user where u.id = 2
…
Common performance problems in
database-access code
4
DBMSCode
• Inefficient data access
• Unneeded data access
Unneeded data access
5
DBMSCode
Require
user data in
the code
Actual request sent
Team TableUser Table
join
Common performance problems in
database-access code
6
DBMSCode
• Inefficient data access
• Unneeded data access
• Overly-strict isolation level
Overly-strict isolation level
7
DBMSCode
Reading only
user name
set read-write transaction
select … from user
Cannot find problems by only looking
at the DBMS side
8
DBMS
select … from user where u.id = 1
select … from user where u.id = 2
…for each userId{
…
executeQuery(“
select … where u.id = userId”);
}
Database accesses are abstracted
9
DBMSCode
Abstraction
Layers
Problems become more complex and
frequent after adding abstraction layers
Accessing data using ORM incorrectly
10
@Entity
Class User{
…
}
for each userId{
User u = findUserById(userId);
u.getName();
}
A database
entity class
Objects
SQL
select … from user where u.id = 1
select … from user where u.id = 2
…
Research statement
There are common problems in how developers write
database-access code, and these problems are hard to
diagnose by only looking at the DBMS. In addition, due
to the use of different database abstraction layers, these
problems become even harder to find.
By finding problems in the database access code and
configuring the abstraction layers correctly, we can
significantly improve the performance of database-
centric systems.
11
12
Detecting inefficient
data access code
Overview of the thesis
Detecting unneeded
data access
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
13
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
14
Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
15
Focus on one type of inefficient
data access: one-by-one processing
Detecting one-by-one processing
using static analysis
16
First find all the functions that
read/write from/to DBMS
Class User{
getUserById()…
getUserAddress()…
}
Identify the positions of all loops
for each userId{
foo(userId)
}
Check if the loop contains any
database-accessing function
foo (userId){
getUserById(userId)
}
Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
17
Inefficient data access detection
framework
Inefficient data access detection and
ranking framework
Ranked according to
performance impact
Ranked
inefficient
data access
code
Source
Code
detection
ranking
18
Assessing inefficient data access
impact by fixing the problem
Execution
Response time
for u in users{
update u
}
Code with
inefficient data access
users.updateAll()
Code without
inefficient data access
19
Execute test
suite 30 times
Response time
after the fix
Avg. %
improvement
Execution
Execute test
suite 30 times
Inefficient data access causes
large performance impacts
0%
20%
40%
60%
80%
100%
One-by-one processing
20
%improvementinresponsetime
Detecting unneeded
data access
21
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
22
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
Intercepting system executions using
byte code instrumentation
ORM
DBMS
23
Intercept called
functions in the code
Intercept ORM
generated SQLs
Needed
data
Requested
data
diff
Mapping data access to called
functions
24
@Entity
@Table(name = “user”)
public class User{
…
@Column(name=“name”)
String userName;
@OneToMany
List<Team> teams;
public String getUserName(){
return userName;
}
public void addTeam(Team t){
teams.add(t);
}
User.java
We apply static analysis to
find which database
column a function is
reading/modifying
24
Identifying unneeded data access
from intercepted data
Only user name is needed in the
application logic
Needed user name in code
Requested id, name,
address, phone number
25
26
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
Detecting unneeded
data access
Detecting unneeded
data access
27
Detecting inefficient
data access code
Overview of the thesis
Finding overly-strict
isolation level
Finished work Future work
Under
submission
A real life example of transaction
abstraction problem
28
All functions in foo will be
executed in transactions
@Transactional
public class foo{
int getFooVal(){
return foo.val;
}
…
}
This function does not need to be
executed in transactions
Where to put the transaction may affect
system behavior and performance
Plans for finding transaction problems
related abstraction
29
Bug
Reports
Categories of
transaction-
related bugs
We plan to empirically study bug reports
to find root causes and implement a
detection framework
30
31
32
33
34
35
36
37
38
http://petertsehsun.github.io

More Related Content

Viewers also liked

Estudio # 15 nuevo libro 2013
Estudio # 15 nuevo libro 2013Estudio # 15 nuevo libro 2013
Estudio # 15 nuevo libro 2013
jo_ceballos
 
แบบสำรวจและประวัติของ
แบบสำรวจและประวัติของแบบสำรวจและประวัติของ
แบบสำรวจและประวัติของ
Kunlanan Parakon
 

Viewers also liked (16)

Clauido sanches y erick guaman
Clauido sanches y erick guamanClauido sanches y erick guaman
Clauido sanches y erick guaman
 
Format_ApplicationForm_1
Format_ApplicationForm_1Format_ApplicationForm_1
Format_ApplicationForm_1
 
Hướng dẫn drift xe côn tay
Hướng dẫn drift xe côn tayHướng dẫn drift xe côn tay
Hướng dẫn drift xe côn tay
 
White Genocide In South Africa - Here Are The Names
White Genocide In South Africa - Here Are The NamesWhite Genocide In South Africa - Here Are The Names
White Genocide In South Africa - Here Are The Names
 
Bowel Cancer in NSW: The facts
Bowel Cancer in NSW: The factsBowel Cancer in NSW: The facts
Bowel Cancer in NSW: The facts
 
交通工具跟動物
交通工具跟動物交通工具跟動物
交通工具跟動物
 
Como melhoramos a performance dos testes automatizados com py.test e factoryboy
Como melhoramos a performance dos testes automatizados com py.test e factoryboyComo melhoramos a performance dos testes automatizados com py.test e factoryboy
Como melhoramos a performance dos testes automatizados com py.test e factoryboy
 
Estudio # 15 nuevo libro 2013
Estudio # 15 nuevo libro 2013Estudio # 15 nuevo libro 2013
Estudio # 15 nuevo libro 2013
 
Joshua Kevin, CEO and Founder of Talenta, Introducing Indonesian Market to Ko...
Joshua Kevin, CEO and Founder of Talenta, Introducing Indonesian Market to Ko...Joshua Kevin, CEO and Founder of Talenta, Introducing Indonesian Market to Ko...
Joshua Kevin, CEO and Founder of Talenta, Introducing Indonesian Market to Ko...
 
Créer son site web avec agence - 6 juin 2013 Lembeye
Créer son site web avec agence - 6 juin 2013 LembeyeCréer son site web avec agence - 6 juin 2013 Lembeye
Créer son site web avec agence - 6 juin 2013 Lembeye
 
Open data and innovation in my community
Open data and innovation in my communityOpen data and innovation in my community
Open data and innovation in my community
 
แบบสำรวจและประวัติของ
แบบสำรวจและประวัติของแบบสำรวจและประวัติของ
แบบสำรวจและประวัติของ
 
Informe de mi practica docente
Informe de mi practica docenteInforme de mi practica docente
Informe de mi practica docente
 
Visão computacional em embarcados
Visão computacional em embarcadosVisão computacional em embarcados
Visão computacional em embarcados
 
Sarita
SaritaSarita
Sarita
 
Mantra of Healthy lifespan by PR Shetty
Mantra of Healthy lifespan by PR ShettyMantra of Healthy lifespan by PR Shetty
Mantra of Healthy lifespan by PR Shetty
 

Similar to Improving the quality of large scale database-centric software systems by analyzing database access code

Icse2014 v3
Icse2014 v3Icse2014 v3
Icse2014 v3
SAIL_QU
 
Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...
Concordia University
 

Similar to Improving the quality of large scale database-centric software systems by analyzing database access code (20)

TSE 2016 - Finding and Evaluating the Performance Impact of Redundant Data Ac...
TSE 2016 - Finding and Evaluating the Performance Impact of Redundant Data Ac...TSE 2016 - Finding and Evaluating the Performance Impact of Redundant Data Ac...
TSE 2016 - Finding and Evaluating the Performance Impact of Redundant Data Ac...
 
Mechanisms for Database Intrusion Detection and Response
Mechanisms for Database Intrusion Detection and ResponseMechanisms for Database Intrusion Detection and Response
Mechanisms for Database Intrusion Detection and Response
 
Lecture 15-16.pdf
Lecture 15-16.pdfLecture 15-16.pdf
Lecture 15-16.pdf
 
ICSE2014 - Detecting Performance Anti-patterns for Applications Developed usi...
ICSE2014 - Detecting Performance Anti-patterns for Applications Developed usi...ICSE2014 - Detecting Performance Anti-patterns for Applications Developed usi...
ICSE2014 - Detecting Performance Anti-patterns for Applications Developed usi...
 
Icse2014 v3
Icse2014 v3Icse2014 v3
Icse2014 v3
 
Introduction to DBMS.pptx
Introduction to DBMS.pptxIntroduction to DBMS.pptx
Introduction to DBMS.pptx
 
Ista presentation-dga
Ista presentation-dgaIsta presentation-dga
Ista presentation-dga
 
Database security and security in networks
Database security and security in networksDatabase security and security in networks
Database security and security in networks
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Database Management system
Database Management systemDatabase Management system
Database Management system
 
DBMS PPT.pptx
DBMS PPT.pptxDBMS PPT.pptx
DBMS PPT.pptx
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
CH05-CompSec4e.pptx
CH05-CompSec4e.pptxCH05-CompSec4e.pptx
CH05-CompSec4e.pptx
 
Susi
Susi Susi
Susi
 
Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...Improving the Performance of Database-Centric Applications Through Program An...
Improving the Performance of Database-Centric Applications Through Program An...
 
Creating a Multi-Layered Secured Postgres Database
Creating a Multi-Layered Secured Postgres DatabaseCreating a Multi-Layered Secured Postgres Database
Creating a Multi-Layered Secured Postgres Database
 
Data Tracking: On the Hunt for Information about Your Database
Data Tracking: On the Hunt for Information about Your DatabaseData Tracking: On the Hunt for Information about Your Database
Data Tracking: On the Hunt for Information about Your Database
 
DBMS
DBMSDBMS
DBMS
 
JAM819 - Native API Deep Dive: Data Storage and Retrieval
JAM819 - Native API Deep Dive: Data Storage and RetrievalJAM819 - Native API Deep Dive: Data Storage and Retrieval
JAM819 - Native API Deep Dive: Data Storage and Retrieval
 
unit 1.pdf
unit 1.pdfunit 1.pdf
unit 1.pdf
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Improving the quality of large scale database-centric software systems by analyzing database access code

  • 1. Improving the Quality of Large-Scale Database-Centric Software Systems by Analyzing Database Access Code 1 Tse-Hsun(Peter) Chen Supervised by Ahmed E. Hassan
  • 2. Common performance problems in database-access code 2 DBMSCode • Inefficient data access
  • 3. Inefficient data access 3 for each userId{ … executeQuery(“select … where u.id = userId”); } Code SQL select … from user where u.id = 1 select … from user where u.id = 2 …
  • 4. Common performance problems in database-access code 4 DBMSCode • Inefficient data access • Unneeded data access
  • 5. Unneeded data access 5 DBMSCode Require user data in the code Actual request sent Team TableUser Table join
  • 6. Common performance problems in database-access code 6 DBMSCode • Inefficient data access • Unneeded data access • Overly-strict isolation level
  • 7. Overly-strict isolation level 7 DBMSCode Reading only user name set read-write transaction select … from user
  • 8. Cannot find problems by only looking at the DBMS side 8 DBMS select … from user where u.id = 1 select … from user where u.id = 2 …for each userId{ … executeQuery(“ select … where u.id = userId”); }
  • 9. Database accesses are abstracted 9 DBMSCode Abstraction Layers Problems become more complex and frequent after adding abstraction layers
  • 10. Accessing data using ORM incorrectly 10 @Entity Class User{ … } for each userId{ User u = findUserById(userId); u.getName(); } A database entity class Objects SQL select … from user where u.id = 1 select … from user where u.id = 2 …
  • 11. Research statement There are common problems in how developers write database-access code, and these problems are hard to diagnose by only looking at the DBMS. In addition, due to the use of different database abstraction layers, these problems become even harder to find. By finding problems in the database access code and configuring the abstraction layers correctly, we can significantly improve the performance of database- centric systems. 11
  • 12. 12 Detecting inefficient data access code Overview of the thesis Detecting unneeded data access Finding overly-strict isolation level Finished work Future work Under submission
  • 13. Detecting unneeded data access 13 Detecting inefficient data access code Overview of the thesis Finding overly-strict isolation level Finished work Future work Under submission
  • 14. Inefficient data access detection framework Inefficient data access detection and ranking framework Ranked according to performance impact Ranked inefficient data access code Source Code detection ranking 14
  • 15. Inefficient data access detection framework Inefficient data access detection and ranking framework Ranked according to performance impact Ranked inefficient data access code Source Code detection ranking 15 Focus on one type of inefficient data access: one-by-one processing
  • 16. Detecting one-by-one processing using static analysis 16 First find all the functions that read/write from/to DBMS Class User{ getUserById()… getUserAddress()… } Identify the positions of all loops for each userId{ foo(userId) } Check if the loop contains any database-accessing function foo (userId){ getUserById(userId) }
  • 17. Inefficient data access detection framework Inefficient data access detection and ranking framework Ranked according to performance impact Ranked inefficient data access code Source Code detection ranking 17
  • 18. Inefficient data access detection framework Inefficient data access detection and ranking framework Ranked according to performance impact Ranked inefficient data access code Source Code detection ranking 18
  • 19. Assessing inefficient data access impact by fixing the problem Execution Response time for u in users{ update u } Code with inefficient data access users.updateAll() Code without inefficient data access 19 Execute test suite 30 times Response time after the fix Avg. % improvement Execution Execute test suite 30 times
  • 20. Inefficient data access causes large performance impacts 0% 20% 40% 60% 80% 100% One-by-one processing 20 %improvementinresponsetime
  • 21. Detecting unneeded data access 21 Detecting inefficient data access code Overview of the thesis Finding overly-strict isolation level Finished work Future work Under submission
  • 22. 22 Detecting inefficient data access code Overview of the thesis Finding overly-strict isolation level Finished work Future work Under submission Detecting unneeded data access
  • 23. Intercepting system executions using byte code instrumentation ORM DBMS 23 Intercept called functions in the code Intercept ORM generated SQLs Needed data Requested data diff
  • 24. Mapping data access to called functions 24 @Entity @Table(name = “user”) public class User{ … @Column(name=“name”) String userName; @OneToMany List<Team> teams; public String getUserName(){ return userName; } public void addTeam(Team t){ teams.add(t); } User.java We apply static analysis to find which database column a function is reading/modifying 24
  • 25. Identifying unneeded data access from intercepted data Only user name is needed in the application logic Needed user name in code Requested id, name, address, phone number 25
  • 26. 26 Detecting inefficient data access code Overview of the thesis Finding overly-strict isolation level Finished work Future work Under submission Detecting unneeded data access
  • 27. Detecting unneeded data access 27 Detecting inefficient data access code Overview of the thesis Finding overly-strict isolation level Finished work Future work Under submission
  • 28. A real life example of transaction abstraction problem 28 All functions in foo will be executed in transactions @Transactional public class foo{ int getFooVal(){ return foo.val; } … } This function does not need to be executed in transactions Where to put the transaction may affect system behavior and performance
  • 29. Plans for finding transaction problems related abstraction 29 Bug Reports Categories of transaction- related bugs We plan to empirically study bug reports to find root causes and implement a detection framework
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37