Detec%ng	
  Performance	
  An%-­‐pa2erns	
  for	
  
Applica%ons	
  Developed	
  Using	
  
Object-­‐Rela%onal	
  Mapping	
  
1	
  
Mohamed	
  Nasser,	
  Parminder	
  Flora	
  
Tse-­‐Hsun(Peter)	
  Chen	
   Ahmed	
  E.	
  Hassan	
  Weiyi	
  Shang	
   Zhen	
  Ming	
  Jiang	
  
Databases	
  are	
  essen%al	
  in	
  large-­‐scale	
  
soBware	
  systems	
  
2	
  
Database	
  
3	
  
Applica%on	
  developers	
  work	
  with	
  objects	
  
More	
  intui*ve	
  if	
  we	
  can	
  
map	
  objects	
  directly	
  to	
  DB	
  
4	
  
Object-­‐Rela%onal	
  Mapping	
  eliminates	
  
the	
  gap	
  between	
  objects	
  and	
  SQL	
  
Database	
  
•  	
  Lots	
  of	
  boilerplate	
  code	
  	
  
•  	
  Need	
  to	
  manage	
  object-­‐DB	
  
transla%ons	
  manually	
  
	
  
Object	
  	
  
Classes	
  
Problem	
  of	
  using	
  raw	
  SQLs	
  
ORM	
  
Much	
  less	
  code	
  and	
  shorter	
  
development	
  %me	
  
ORM	
  is	
  widely	
  used	
  in	
  prac%ce	
  
5	
  
•  Java	
  Hibernate	
  has	
  more	
  than	
  8	
  
million	
  downloads	
  
•  In	
  2013,	
  15%	
  of	
  the	
  17,000	
  Java	
  
developer	
  jobs	
  require	
  ORM	
  
experience	
  (dice.com)	
  
Different	
  ORM	
  technologies	
  
An	
  example	
  class	
  with	
  ORM	
  code	
  
6	
  
@En%ty	
  
@Table(name	
  =	
  “user”)	
  
public	
  class	
  User{	
  
	
  @Column(name=“id”)	
  
	
  private	
  int	
  id;	
  
	
  
	
  @Column(name=“name”)	
  
	
  String	
  userName;	
  
	
  
@OneToMany(fetch=FetchType.EAGER)	
  
	
  List<Team>	
  teams;	
  
	
  
	
  public	
  void	
  setName(String	
  n){	
  
	
   	
  userName	
  =	
  n	
  
	
  }	
  
…	
  other	
  ge]er	
  and	
  se]er	
  methods	
  
User.java	
  User	
  class	
  is	
  
mapped	
  to	
  “user”	
  
table	
  in	
  DB	
  
id	
  is	
  mapped	
  to	
  the	
  
column	
  “id”	
  in	
  the	
  
user	
  table	
  
A	
  user	
  can	
  belong	
  
to	
  mul%ple	
  teams	
  
Eagerly	
  retrieve	
  
associated	
  teams	
  
when	
  retrieving	
  a	
  
user	
  object	
  
Accessing	
  the	
  database	
  using	
  ORM	
  
7	
  
User	
  u	
  =	
  findUserByID(1);	
  
ORM	
  
database	
  
select	
  u	
  from	
  user	
  
where	
  u.id	
  =	
  1;	
  
u.setName(“Peter”);	
  
update	
  user	
  set	
  
name=“Peter”	
  
where	
  user.id	
  =	
  1;	
  
Objects	
   SQLs	
  
Developers	
  may	
  not	
  be	
  aware	
  of	
  
database	
  access	
  
Wow!	
  I	
  don’t	
  
need	
  to	
  worry	
  
about	
  DB	
  code!	
  
ORM	
  code	
  with	
  	
  
performance	
  ang-­‐pa]erns	
  
8	
  
Bad	
  system	
  	
  
performance	
  
The	
  performance	
  difference	
  can	
  be	
  LARGE!	
  
Performance	
  an%-­‐pa2ern	
  
detec%on	
  framework	
  
Performance	
  ang-­‐pa]ern	
  detecgon	
  and	
  	
  
ranking	
  framework	
  	
  
Ranked	
  according	
  to	
  	
  
performance	
  impact	
  
Ranked	
  
Performance	
  
ang-­‐pa]erns	
  
Source	
  	
  
Code	
  
detecgon	
  
ranking	
  
9	
  
Performance	
  an%-­‐pa2ern	
  
detec%on	
  framework	
  
Performance	
  ang-­‐pa]ern	
  detecgon	
  and	
  	
  
ranking	
  framework	
  	
  
Ranked	
  according	
  to	
  	
  
performance	
  impact	
  
Ranked	
  
Performance	
  
ang-­‐pa]erns	
  
Source	
  	
  
Code	
  
detecgon	
  
ranking	
  
10	
  
Discuss	
  only	
  one	
  an%-­‐pa2ern	
  
in	
  the	
  presenta%on	
  
	
  (more	
  pa2erns	
  in	
  the	
  paper)	
  
ORM	
  excessive	
  data	
  an%-­‐pa2ern	
  
Class	
  User{	
  
	
   	
  @EAGER	
  
	
   	
  List<Team>	
  teams;	
  
}	
  
User	
  u	
  =	
  findUserById(1);	
  
u.getName();	
  
EOF	
  
11	
  
Objects	
  
SQL	
  
Eagerly	
  retrieve	
  
teams	
  from	
  DB	
  
User	
  Table	
   Team	
  Table	
  
join	
   Team	
  data	
  is	
  never	
  
used!	
  
Detec%ng	
  excessive	
  data	
  
using	
  sta%c	
  analysis	
  
12	
  
First	
  find	
  all	
  the	
  objects	
  that	
  
eagerly	
  retrieve	
  data	
  from	
  DB	
  
	
  	
  
Class	
  User{	
  
	
   	
  @EAGER	
  
	
   	
  List<Team>	
  teams;	
  
}	
  
Idengfy	
  all	
  the	
  data	
  usages	
  of	
  
objects	
  	
  
User	
  user	
  =	
  findUserByID(1);	
  
Check	
  if	
  the	
  retrieved	
  data	
  is	
  ever	
  
used	
  
user.getName();	
  
user	
   team	
  
user	
   team	
  
Performance	
  an%-­‐pa2ern	
  
detec%on	
  framework	
  
Performance	
  ang-­‐pa]ern	
  detecgon	
  and	
  	
  
ranking	
  framework	
  	
  
Ranked	
  according	
  to	
  	
  
performance	
  impact	
  
Ranked	
  
Performance	
  
ang-­‐pa]erns	
  
Source	
  	
  
Code	
  
detecgon	
  
ranking	
  
13	
  
Performance	
  an%-­‐pa2ern	
  
detec%on	
  framework	
  
Performance	
  ang-­‐pa]ern	
  detecgon	
  and	
  	
  
ranking	
  framework	
  	
  
Ranked	
  according	
  to	
  	
  
performance	
  impact	
  
Ranked	
  
Performance	
  
ang-­‐pa]erns	
  
Source	
  	
  
Code	
  
detecgon	
  
ranking	
  
14	
  
Performance	
  an%-­‐pa2erns	
  have	
  
different	
  impacts	
  
15	
  
User	
  user_in_1_team	
  =	
  findUserByID(1);	
  
Retrieving	
  1	
  user	
  and	
  1	
  team	
  
User	
  user_in_100_teams	
  =	
  findUserByID(100);	
  
Retrieving	
  1	
  user	
  and	
  100	
  teams!	
  
One	
  can	
  only	
  reveal	
  performance	
  
impact	
  by	
  execu%on	
  
Measuring	
  the	
  impact	
  using	
  repeated	
  
measurements	
  and	
  effect	
  sizes	
  
16	
  
We	
  use	
  effect	
  sizes	
  (Cohen’s	
  D)	
  to	
  measure	
  
the	
  performance	
  impact	
  
Effect	
  sizes	
  =	
  	
  
We	
  repeat	
  each	
  test	
  30	
  gmes	
  to	
  obtain	
  stable	
  
measurement	
  results	
  
Size	
  of	
  performance	
  impact	
  is	
  not	
  defined:	
  
Performance	
  measurements	
  are	
  unstable:	
  
Studied	
  systems	
  and	
  detec%on	
  results	
  
Large	
  open-­‐source	
  	
  
e-­‐commence	
  system	
  
>	
  1,700	
  files	
  
>	
  206K	
  LOC	
  
Enterprise	
  system	
  
>	
  3,000	
  files	
  
>	
  300K	
  LOC	
  
Spring	
  open-­‐source	
  system	
  
Online	
  system	
  for	
  a	
  pet	
  clinic	
  
51	
  files	
  
	
  	
  	
  	
  3.3K	
  LOC	
  
482	
  excessive	
  data	
   >	
  10	
  excessive	
  data	
  
	
  
10	
  excessive	
  data	
  
17	
  
Performance	
  impact	
  
Research	
  ques%ons	
  
Ranks	
  of	
  the	
  ang-­‐pa]erns	
  at	
  
different	
  scales	
  
18	
  
Performance	
  impact	
  
Research	
  ques%ons	
  
Ranks	
  of	
  the	
  ang-­‐pa]erns	
  at	
  
different	
  scales	
  
19	
  
Assessing	
  an%-­‐pa2ern	
  impact	
  by	
  
fixing	
  the	
  an%-­‐pa2erns	
  
Execugon	
  
Response	
  %me	
  user.getName()	
  
Code	
  with	
  	
  
ang-­‐pa]erns	
  
fetchType.set(LAZY)	
  
user.getName()	
  
Code	
  without	
  	
  
ang-­‐pa]erns	
  
20	
  
Execute	
  test	
  
suite	
  30	
  gmes	
  
Response	
  %me	
  
aOer	
  fixing	
  the	
  
anS-­‐paTerns	
  
Avg.	
  %	
  
improvement	
  and	
  
effect	
  sizes	
  
Execugon	
  
Execute	
  test	
  
suite	
  30	
  gmes	
  
Performance	
  an%-­‐pa2erns	
  have	
  
medium	
  to	
  large	
  effect	
  sizes	
  
0%	
  
20%	
  
40%	
  
60%	
  
80%	
  
100%	
  
Excessive	
  Data	
  
BL	
  
EA	
  
PC	
  
21	
  
%	
  improvement	
  in	
  response	
  gme	
  
large	
  	
  
effect	
  size	
  
large	
  	
  
effect	
  size	
   medium	
  
effect	
  size	
  
Performance	
  impact	
  
Research	
  ques%ons	
  
Ranks	
  of	
  the	
  ang-­‐pa]erns	
  at	
  
different	
  scales	
  
22	
  
Removing	
  an%-­‐pa2ern	
  	
  
improves	
  response	
  by	
  ~35%	
  
Performance	
  impact	
  
Research	
  ques%ons	
  
Ranks	
  of	
  the	
  ang-­‐pa]erns	
  at	
  
different	
  scales	
  
23	
  
Removing	
  an%-­‐pa2ern	
  	
  
improves	
  response	
  by	
  ~35%	
  
Performance	
  problems	
  usually	
  arise	
  
under	
  large	
  load	
  
24	
  
Performance	
  problems	
  revealed	
  at	
  
small	
  scales	
  may	
  be	
  more	
  serious	
  
25	
  
We	
  should	
  first	
  fix	
  the	
  an%-­‐pa2erns	
  that	
  
have	
  larger	
  effects	
  at	
  smaller	
  scales	
  
Input	
  scales	
  may	
  have	
  
exponenSal	
  effects	
  on	
  
performance	
  
Different	
  input	
  scales	
   Performance	
  at	
  
different	
  input	
  scales	
  
Comparing	
  ranked	
  an%-­‐pa2erns	
  at	
  
different	
  data	
  scales	
  
26	
  
Ranked	
  
Performance	
  
ang-­‐pa]erns	
  
from	
  small	
  data	
  
detecgon	
  
ranking	
  
Ranked	
  
Performance	
  
ang-­‐pa]erns	
  
from	
  large	
  data	
  
?	
  
Small	
  size	
  input	
  
Large	
  size	
  input	
  
An%-­‐pa2erns	
  have	
  large	
  effects	
  on	
  
performance	
  even	
  at	
  smaller	
  data	
  scales	
  
27	
  
0	
  
10	
  
20	
  
30	
  
40	
  
50	
  
0	
  
1	
  
2	
  
3	
  
4	
  
5	
  
Effect	
  size	
  
Effect	
  sizes	
  and	
  the	
  ranks	
  of	
  the	
  an%-­‐pa2erns	
  
are	
  consistent	
  in	
  different	
  data	
  scales	
  
Performance	
  impact	
  
Research	
  ques%ons	
  
Ranks	
  of	
  the	
  ang-­‐pa]erns	
  at	
  
different	
  scales	
  
28	
  
Removing	
  an%-­‐pa2ern	
  	
  
improves	
  response	
  by	
  ~35%	
  
Ranks	
  of	
  the	
  an%-­‐pa2erns	
  
are	
  consistent	
  in	
  different	
  
data	
  scales	
  
29	
  
30	
  
31	
  
32	
  
33	
  
tsehsun@cs.queensu.ca	
  

ICSE2014

  • 1.
    Detec%ng  Performance  An%-­‐pa2erns  for   Applica%ons  Developed  Using   Object-­‐Rela%onal  Mapping   1   Mohamed  Nasser,  Parminder  Flora   Tse-­‐Hsun(Peter)  Chen   Ahmed  E.  Hassan  Weiyi  Shang   Zhen  Ming  Jiang  
  • 2.
    Databases  are  essen%al  in  large-­‐scale   soBware  systems   2   Database  
  • 3.
    3   Applica%on  developers  work  with  objects   More  intui*ve  if  we  can   map  objects  directly  to  DB  
  • 4.
    4   Object-­‐Rela%onal  Mapping  eliminates   the  gap  between  objects  and  SQL   Database   •   Lots  of  boilerplate  code     •   Need  to  manage  object-­‐DB   transla%ons  manually     Object     Classes   Problem  of  using  raw  SQLs   ORM   Much  less  code  and  shorter   development  %me  
  • 5.
    ORM  is  widely  used  in  prac%ce   5   •  Java  Hibernate  has  more  than  8   million  downloads   •  In  2013,  15%  of  the  17,000  Java   developer  jobs  require  ORM   experience  (dice.com)   Different  ORM  technologies  
  • 6.
    An  example  class  with  ORM  code   6   @En%ty   @Table(name  =  “user”)   public  class  User{    @Column(name=“id”)    private  int  id;      @Column(name=“name”)    String  userName;     @OneToMany(fetch=FetchType.EAGER)    List<Team>  teams;      public  void  setName(String  n){      userName  =  n    }   …  other  ge]er  and  se]er  methods   User.java  User  class  is   mapped  to  “user”   table  in  DB   id  is  mapped  to  the   column  “id”  in  the   user  table   A  user  can  belong   to  mul%ple  teams   Eagerly  retrieve   associated  teams   when  retrieving  a   user  object  
  • 7.
    Accessing  the  database  using  ORM   7   User  u  =  findUserByID(1);   ORM   database   select  u  from  user   where  u.id  =  1;   u.setName(“Peter”);   update  user  set   name=“Peter”   where  user.id  =  1;   Objects   SQLs  
  • 8.
    Developers  may  not  be  aware  of   database  access   Wow!  I  don’t   need  to  worry   about  DB  code!   ORM  code  with     performance  ang-­‐pa]erns   8   Bad  system     performance   The  performance  difference  can  be  LARGE!  
  • 9.
    Performance  an%-­‐pa2ern   detec%on  framework   Performance  ang-­‐pa]ern  detecgon  and     ranking  framework     Ranked  according  to     performance  impact   Ranked   Performance   ang-­‐pa]erns   Source     Code   detecgon   ranking   9  
  • 10.
    Performance  an%-­‐pa2ern   detec%on  framework   Performance  ang-­‐pa]ern  detecgon  and     ranking  framework     Ranked  according  to     performance  impact   Ranked   Performance   ang-­‐pa]erns   Source     Code   detecgon   ranking   10   Discuss  only  one  an%-­‐pa2ern   in  the  presenta%on    (more  pa2erns  in  the  paper)  
  • 11.
    ORM  excessive  data  an%-­‐pa2ern   Class  User{      @EAGER      List<Team>  teams;   }   User  u  =  findUserById(1);   u.getName();   EOF   11   Objects   SQL   Eagerly  retrieve   teams  from  DB   User  Table   Team  Table   join   Team  data  is  never   used!  
  • 12.
    Detec%ng  excessive  data   using  sta%c  analysis   12   First  find  all  the  objects  that   eagerly  retrieve  data  from  DB       Class  User{      @EAGER      List<Team>  teams;   }   Idengfy  all  the  data  usages  of   objects     User  user  =  findUserByID(1);   Check  if  the  retrieved  data  is  ever   used   user.getName();   user   team   user   team  
  • 13.
    Performance  an%-­‐pa2ern   detec%on  framework   Performance  ang-­‐pa]ern  detecgon  and     ranking  framework     Ranked  according  to     performance  impact   Ranked   Performance   ang-­‐pa]erns   Source     Code   detecgon   ranking   13  
  • 14.
    Performance  an%-­‐pa2ern   detec%on  framework   Performance  ang-­‐pa]ern  detecgon  and     ranking  framework     Ranked  according  to     performance  impact   Ranked   Performance   ang-­‐pa]erns   Source     Code   detecgon   ranking   14  
  • 15.
    Performance  an%-­‐pa2erns  have   different  impacts   15   User  user_in_1_team  =  findUserByID(1);   Retrieving  1  user  and  1  team   User  user_in_100_teams  =  findUserByID(100);   Retrieving  1  user  and  100  teams!   One  can  only  reveal  performance   impact  by  execu%on  
  • 16.
    Measuring  the  impact  using  repeated   measurements  and  effect  sizes   16   We  use  effect  sizes  (Cohen’s  D)  to  measure   the  performance  impact   Effect  sizes  =     We  repeat  each  test  30  gmes  to  obtain  stable   measurement  results   Size  of  performance  impact  is  not  defined:   Performance  measurements  are  unstable:  
  • 17.
    Studied  systems  and  detec%on  results   Large  open-­‐source     e-­‐commence  system   >  1,700  files   >  206K  LOC   Enterprise  system   >  3,000  files   >  300K  LOC   Spring  open-­‐source  system   Online  system  for  a  pet  clinic   51  files          3.3K  LOC   482  excessive  data   >  10  excessive  data     10  excessive  data   17  
  • 18.
    Performance  impact   Research  ques%ons   Ranks  of  the  ang-­‐pa]erns  at   different  scales   18  
  • 19.
    Performance  impact   Research  ques%ons   Ranks  of  the  ang-­‐pa]erns  at   different  scales   19  
  • 20.
    Assessing  an%-­‐pa2ern  impact  by   fixing  the  an%-­‐pa2erns   Execugon   Response  %me  user.getName()   Code  with     ang-­‐pa]erns   fetchType.set(LAZY)   user.getName()   Code  without     ang-­‐pa]erns   20   Execute  test   suite  30  gmes   Response  %me   aOer  fixing  the   anS-­‐paTerns   Avg.  %   improvement  and   effect  sizes   Execugon   Execute  test   suite  30  gmes  
  • 21.
    Performance  an%-­‐pa2erns  have   medium  to  large  effect  sizes   0%   20%   40%   60%   80%   100%   Excessive  Data   BL   EA   PC   21   %  improvement  in  response  gme   large     effect  size   large     effect  size   medium   effect  size  
  • 22.
    Performance  impact   Research  ques%ons   Ranks  of  the  ang-­‐pa]erns  at   different  scales   22   Removing  an%-­‐pa2ern     improves  response  by  ~35%  
  • 23.
    Performance  impact   Research  ques%ons   Ranks  of  the  ang-­‐pa]erns  at   different  scales   23   Removing  an%-­‐pa2ern     improves  response  by  ~35%  
  • 24.
    Performance  problems  usually  arise   under  large  load   24  
  • 25.
    Performance  problems  revealed  at   small  scales  may  be  more  serious   25   We  should  first  fix  the  an%-­‐pa2erns  that   have  larger  effects  at  smaller  scales   Input  scales  may  have   exponenSal  effects  on   performance   Different  input  scales   Performance  at   different  input  scales  
  • 26.
    Comparing  ranked  an%-­‐pa2erns  at   different  data  scales   26   Ranked   Performance   ang-­‐pa]erns   from  small  data   detecgon   ranking   Ranked   Performance   ang-­‐pa]erns   from  large  data   ?   Small  size  input   Large  size  input  
  • 27.
    An%-­‐pa2erns  have  large  effects  on   performance  even  at  smaller  data  scales   27   0   10   20   30   40   50   0   1   2   3   4   5   Effect  size   Effect  sizes  and  the  ranks  of  the  an%-­‐pa2erns   are  consistent  in  different  data  scales  
  • 28.
    Performance  impact   Research  ques%ons   Ranks  of  the  ang-­‐pa]erns  at   different  scales   28   Removing  an%-­‐pa2ern     improves  response  by  ~35%   Ranks  of  the  an%-­‐pa2erns   are  consistent  in  different   data  scales  
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.