Uncovering Performance Problems in Java Applications with Reference Propagation Profiling
Upcoming SlideShare
Loading in...5
×
 

Uncovering Performance Problems in Java Applications with Reference Propagation Profiling

on

  • 220 views

Uncovering Performance Problems in Java Applications with Reference Propagation Profiling, ICSE'2012

Uncovering Performance Problems in Java Applications with Reference Propagation Profiling, ICSE'2012

Statistics

Views

Total Views
220
Views on SlideShare
219
Embed Views
1

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 1

http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Uncovering Performance Problems in Java Applications with Reference Propagation Profiling Uncovering Performance Problems in Java Applications with Reference Propagation Profiling Presentation Transcript

    • Uncovering  Performance  Problems  in  Java  Applica5ons  with  Reference  Propaga5on  Profiling   Dacong  Yan1,  Guoqing  Xu2,  Atanas  Rountev1     1  Ohio  State  University   2  University  of  California,  Irvine     PRESTO:  Program  Analyses  and  So5ware  Tools  Research  Group,  Ohio  State  University  
    • Overview   •  Performance  inefficiencies   –  O5en  exist  in  Java  applicaKons   –  Excessive  memory  usage   –  Long  running  Kmes,  even  for  simple  tasks   •  Challenges   –  Limited  compiler  opKmizaKons   –  Complicated  behavior   –  Large  libraries  and  frameworks   •  SoluKon:  manual  tuning  assisted  with  performance   analysis  tools  2  
    • An  Example   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t3  
    • An  Example   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t4  
    • An  Example   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t5  
    • An  Example   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t6  
    • An  Example   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t7  
    • An  Example   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t8  
    • ImplementaKon   •  Reference  propagaKon  profiling   –  Implemented  in  Jikes  RVM  3.1.1   –  Modify  the  runKme  compiler  for  code  instrumentaKon   –  Create  shadow  loca5ons  to  track  data  dependence   –  Instrument  method  calls  to  track  interprocedural   propaga5on   •  Overheads   –  Space:  2-­‐3×   –  Time:  30-­‐50×    9  
    • Reference  PropagaKon  Profiling   •  Intraprocedural  propagaKon   –  Shadows  for  every  memory  locaKon  (stack  and  heap)  to   record  last  assignment  that  writes  to  it   –  Update  shadows  and  the  graph  accordingly   Code Shadow Graph 6 a = new A; aʹ = RefAssign(6,6) 7 b = a; bʹ = RefAssign(6,7) 8 c = new C; cʹ = RefAssign(8, 8) 9 b.fld = c; b.fldʹ = RefAssign(8, 9)10  
    • Reference  PropagaKon  Profiling   •  Intraprocedural  propagaKon   –  Shadows  for  every  memory  locaKon  (stack  and  heap)  to   record  last  assignment  that  writes  to  it   –  Update  shadows  and  the  graph  accordingly   Code Shadow Graph 6 a = new A; aʹ = RefAssign(6,6) 7 b = a; bʹ = RefAssign(6,7) 8 c = new C; cʹ = RefAssign(8, 8) 9 b.fld = c; b.fldʹ = RefAssign(8, 9)11  
    • Reference  PropagaKon  Profiling   •  Intraprocedural  propagaKon   –  Shadows  for  every  memory  locaKon  (stack  and  heap)  to   record  last  assignment  that  writes  to  it   –  Update  shadows  and  the  graph  accordingly   Code Shadow Graph 6 a = new A; aʹ = RefAssign(6,6) 7 b = a; bʹ = RefAssign(6,7) 8 c = new C; cʹ = RefAssign(8, 8) 9 b.fld = c; b.fldʹ = RefAssign(8, 9)12  
    • Reference  PropagaKon  Profiling   •  Intraprocedural  propagaKon   –  Shadows  for  every  memory  locaKon  (stack  and  heap)  to   record  last  assignment  that  writes  to  it   –  Update  shadows  and  the  graph  accordingly   Code Shadow Graph 6 a = new A; aʹ = RefAssign(6,6) 7 b = a; bʹ = RefAssign(6,7) 8 c = new C; cʹ = RefAssign(8, 8) 9 b.fld = c; b.fldʹ = RefAssign(8, 9)13  
    • Reference  PropagaKon  Profiling   •  Intraprocedural  propagaKon   –  Shadows  for  every  memory  locaKon  (stack  and  heap)  to   record  last  assignment  that  writes  to  it   –  Update  shadows  and  the  graph  accordingly   Code Shadow Graph 6 a = new A; aʹ = RefAssign(6,6) 7 b = a; bʹ = RefAssign(6,7) 8 c = new C; cʹ = RefAssign(8, 8) 9 b.fld = c; b.fldʹ = RefAssign(8, 9)14  
    • Reference  PropagaKon  Profiling   •  Intraprocedural  propagaKon   –  Shadows  for  every  memory  locaKon  (stack  and  heap)  to   record  last  assignment  that  writes  to  it   –  Update  shadows  and  the  graph  accordingly   Code Shadow Graph 6 a = new A; aʹ = RefAssign(6,6) 7 b = a; bʹ = RefAssign(6,7) 8 c = new C; cʹ = RefAssign(8, 8) 9 b.fld = c; b.fldʹ = RefAssign(8, 9) •  Interprocedural  propagaKon   –  Per-­‐thread  scratch  space  save  and  restore  shadows  for  15   parameters  and  return  variables  
    • Client  Analyses  16  
    • Client  Analyses   •  Not-­‐assigned-­‐to-­‐heap  (NATH)  analysis   –  Locate  producer  nodes  that  do  not  reach  heap   propagaKon  nodes  (heap  reads  and  writes)   –  Variant:  mostly-­‐NATH  analysis  17  
    • Client  Analyses   •  Not-­‐assigned-­‐to-­‐heap  (NATH)  analysis   –  Locate  producer  nodes  that  do  not  reach  heap   propagaKon  nodes  (heap  reads  and  writes)   –  Variant:  mostly-­‐NATH  analysis   •  Cost-­‐benefit  imbalance  analysis   –  Detect  imbalance  between  the  cost  of  interesKng   operaKons,  and  the  benefits  they  produce   –  For  example,  analysis  of  write  read  imbalance  18  
    • Client  Analyses   •  Not-­‐assigned-­‐to-­‐heap  (NATH)  analysis   –  Locate  producer  nodes  that  do  not  reach  heap   propagaKon  nodes  (heap  reads  and  writes)   –  Variant:  mostly-­‐NATH  analysis   •  Cost-­‐benefit  imbalance  analysis   –  Detect  imbalance  between  the  cost  of  interesKng   operaKons,  and  the  benefits  they  produce   –  For  example,  analysis  of  write  read  imbalance   •  Analysis  of  never-­‐used  allocaKons   –  IdenKfy  producer  nodes  that  do  not  reach  the   consumer  node   –  Variant:  analysis  of  rarely-­‐used  allocaKons  19  
    • A  Real  Tuning  Session   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t20  
    • A  Real  Tuning  Session   1 class Vec { 2 double x, y; 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]);10 q[i] = t;11 t = in[i+1].sub(a[i-2]);12 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t21  
    • A  Real  Tuning  Session   1 class Vec { 2 double x, y; 1 2 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]); 110 q[i] = t;11 t = in[i+1].sub(a[i-2]); 212 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t22  
    • A  Real  Tuning  Session   1 class Vec { 2 double x, y; 1 2 3 sub(v) { 4 res=new Vec(x-v.x, y-v.y); 5 return res; 6 } 7 } 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]); 110 q[i] = t;11 t = in[i+1].sub(a[i-2]); 212 // use of fields of t13 } ……80 t=q[*];81 // use of fields of t23  
    • A  Real  Tuning  Session   1 class Vec { 1 class Vec { 2 double x, y; 2 double x, y; 3 sub(v) { 3 sub_rev(v, res) { 4 res=new Vec(x-v.x, y-v.y); 4 res.x = x-v.x; 5 return res; 5 res.y = y-v.y; 6 } 6 } 7 } tuning 7 } = new Vec; // reusable nt 8 for (i = 0; i < N; i++) { 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]); 9 t = in[i+2].sub(a[i-1]);10 q[i] = t; 10 q[i] = t;11 t = in[i+1].sub(a[i-2]); 11 in[i+1].sub_rev(a[i-2], nt);12 // use of fields of t 12 // use of fields of nt13 } 13 } …… ……80 t=q[*]; 80 t=q[*];81 // use of fields of t 81 // use of fields of t24  
    • A  Real  Tuning  Session   1 class Vec { 1 class Vec { 2 double x, y; 2 double x, y; 3 sub(v) { 3 sub_rev(v, res) { 4 res=new Vec(x-v.x, y-v.y); 4 res.x = x-v.x; 5 return res; 5 res.y = y-v.y; 6 } 6 } 7 } tuning 7 } = new Vec; // reusable nt 8 for (i = 0; i < N; i++) { 8 for (i = 0; i < N; i++) { 9 t = in[i+2].sub(a[i-1]); 9 t = in[i+2].sub(a[i-1]);10 q[i] = t; 10 q[i] = t;11 t = in[i+1].sub(a[i-2]); 11 in[i+1].sub_rev(a[i-2], nt);12 // use of fields of t 12 // use of fields of nt13 } 13 } …… ……80 t=q[*]; Reductions: 13% in running time and 80 t=q[*];81 // use of fields of t 73% in #allocated objectsof fields of t 81 // use25  
    • Examples  of  Inefficiency  Pa`erns   •  Temporary  objects  for  method  returns   –  ReducKons  for  euler:  13%  in  running  Kme  and  73%  in   #allocated  objects   •  Redundant  data  representaKon   – mst:  63%  and  40%   •  Unnecessary  eager  object  creaKon   – chart:  8%  and  8% – jflex:  3%  and  27%   •  Expensive  specializaKon  for  sanity  checks   – bloat:  10%  and  11%  26  
    • Conclusions •  Reference  propagaKon  profiling  in  Jikes  RVM   •  Understanding  reference  propagaKon  is  a  good   starKng  point  for  performance  tuning   •  Client  analyses  can  uncover  performance   inefficiencies,  and  lead  to  effecKve  tuning  soluKons  27  
    • Thank    you                                    28