Pimp my GC
Supersonic Scala !

@pingtimeout

Scala.IO – 24&25 oct 13
/me
●

Pierre Laporte

●

“Java Performance Tuning” Trainer

●

Perfs issues, logs GC eye-compliant
http://www.pingtimeout...
Agenda
●

42 minutes of
–
–

Fun (Practice)

–

Fun (Feedbacks)

–

Fun (Questions/Answers)

–
●

Fun (Theory)

Fun (Troll...
Disclaimer
●

●

●

Be critical with the information contained in this
talk
JVM Tuning is always made on a case-by-case
ba...
Weak Generational Hypothesis 101

@pingtimeout

Scala.IO – 24&25 oct 13
Theory – Weak Generational
Hypothesis

@pingtimeout

Scala.IO – 24&25 oct 13
Theory – Weak Generational
Hypothesis

@pingtimeout

Scala.IO – 24&25 oct 13
Theory – Weak Generational
Hypothesis
●

“Most objects die young”

●

Possible scales :
–

MB, GB, TB

–

Minutes, hours, ...
Examples – Weak Generational
Hypothesis
Total : 145 GB
Avg : 48 GB/j

GB

3j

@pingtimeout

Scala.IO – 24&25 oct 13
Examples – Weak Generational
Hypothesis
Total : 30 TB
Avg : 3TB/j

TB

10j

@pingtimeout

Scala.IO – 24&25 oct 13
Examples – Weak Generational
Hypothesis
●

35 GB/j
–
–

Play 2

–
●

Scala
Akka

3 TB/j
–

Java

–

Tomcat

–

Jax-RS / Sp...
Examples – Weak Generational
Hypothesis
Don't forget !
–

Be critical

–

Case-by-case
analysis

–

Please don't do that
-...
JVM Heap 101

@pingtimeout

Scala.IO – 24&25 oct 13
Theory – Memory pools

●

Java Heap – 2 memory pools
(Except for G1 GC)

●

Young Generation for... young objects

●

Old ...
Theory – Memory pools

@pingtimeout

Scala.IO – 24&25 oct 13
Theory – Memory pools

●

Young Generation = Eden + Survivors

●

Every object is created in Eden*
* : except when it is t...
Theory – Memory pools

@pingtimeout

Scala.IO – 24&25 oct 13
Why memory pools ?!
●

Always 2 GC per JVM*
* Except for G1 GC

●

Young GC
–
–

●

Cheap
Duration mostly ≈ O(Live data in...
Why memory pools ?!
Common GC
Name

Young Gen GC

Old Gen GC

“Parallel GC”

PSYoungGen

ParOldGen

ParNew

CMS

“CMS”
“G1...
GC Duration ?!

Prove it!

@pingtimeout

Scala.IO – 24&25 oct 13
App with small live set

@pingtimeout

Scala.IO – 24&25 oct 13
App with big live set

@pingtimeout

Scala.IO – 24&25 oct 13
Experiment 1
●

1st run (SmallLiveSet)
–

50 GB heap (-ms50g -mx50g)

–

49.9GB Young Gen (-Xmn49900m)

–

GC logs

@pingt...
Experiment 1
●

1st run (SmallLiveSet)
–

50 GB heap
●

–

49.9GB Young Gen
●

–

●

-ms50g -mx50g
-Xmn49900m

GC logs

Re...
Experiment 1 : Result
[PSYoungGen: 38329728K->6496K(44710400K)]
38329744K->6512K(46041600K),
0.0067050 secs] //...
●

38.3...
Experiment 2
●

2nd run (SmallLiveSet)
–

50 GB heap (-ms50g -mx50g)

–

10MB Young Gen (-Xmn10m)

–

GC logs

@pingtimeou...
Experiment 2
●

1st run (SmallLiveSet)
–

50 GB heap
●

–

10MB Young Gen
●

–

●

-ms50g -mx50g
-Xmn10m

GC logs

Result ...
Experiment 2 : Result
[Full GC
[PSYoungGen: 3072K->0K(7168K)]
[ParOldGen: 52418151K->30287K(52418560K)]
52421223K->30287K(...
Experiments 1->4, Wrap up
●

1st and 2nd runs with BigLiveSet
–

Ran out of time* :-(
*: Stopped measuring at Heap occupan...
Immutability

Is immutability a problem ?

@pingtimeout

Scala.IO – 24&25 oct 13
Immutability

●

What does this code do ?
(GC point of view ?)

@pingtimeout

Scala.IO – 24&25 oct 13
Immutability

@pingtimeout

Scala.IO – 24&25 oct 13
Immutability

●

What does this code do ?
–

Create more temporary objects that dies young

–

Respect Weak Generational H...
Immutability

●

Consequences compared to mutable state
–

GC will run more frequently

–

GC time will be short
O(Live da...
Tuning for immutability
●

Reduce YGC frequency (for ParallelGC and
CMS)
–

Identify allocation rate (MB/seconds)

–

Defi...
Tuning for immutability
●

Reduce YGC frequency (for ParallelGC and
CMS)
–

AR = 200 MB/s

–

Desired interval = 1 YGC eve...
Poney Pause

@pingtimeout

Scala.IO – 24&25 oct 13
G1 GC time !

@pingtimeout

Scala.IO – 24&25 oct 13
G1 GC
●

Idea
–

Split the heap in
2048 regions

–

Associate on-the-fly one
region to a memory pool

–

Increase/Shrink m...
G1 GC
●

Memory pools :
–
–

Old

–

●

Young (Eden, Survivors)
Humongous

Humongous:
–

Objects >= 50% region

http://www...
G1 GC
●

1 ½ GC algorithm:
–

Always collect Young Gen

–

Collect Old Gen if possible
●
●
●
●

●

Best regions only
Time ...
G1 GC Tuning
●

Define GC time budget
-XX:MaxGCPauseMillis=<N>
-XX:GCPauseIntervalMillis=<M>

●

Set Xms == Xmx

●

Drop a...
G1 GC Tuning
●

Enable GC logs
-Xloggc:gc.log
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintGCCause
-XX:+Pr...
G1 GC Tuning – Low hanging fruits
●

Eliminate Humongous allocations
–

Humongous regions collected only at Full GC

–

Or...
G1 GC Tuning – Low hanging fruits
●

Eliminate Humongous allocations
–

Humongous regions collected only at Full GC

–

Or...
G1 GC Tuning – Low hanging fruits
●

Eliminate Humongous
allocations
–
–

●

Track your big allocations
Kill'em !

Why ?
–...
G1 GC Tuning – Low hanging fruits
●

Get rid of “mixed collections”
–

Increase heap size

–

Set a higher threshold for m...
G1 GC Tuning – Low hanging fruits
●

Eliminate “Evacuation/Allocation failures”
–

They are our good old Full Gcs

[GC pau...
Summary
●

Performance is fun !

●

Understand what you do

●

Immutability is not an issue (by itself)
–

Bad code is.

●...
Thank you for listening !

For more information :
http://www.pingtimeout.fr
@pingtimeout
pierre@pingtimeout.fr

@pingtimeo...
Upcoming SlideShare
Loading in …5
×

Pimp my gc - Supersonic Scala

3,594 views

Published on

How to tune the Garbage Collector for Scala Applications ? Is immutability an issue for the GC ? How to tune the heap ? Well, the answers are here !

Published in: Technology, Business

Pimp my gc - Supersonic Scala

  1. 1. Pimp my GC Supersonic Scala ! @pingtimeout Scala.IO – 24&25 oct 13
  2. 2. /me ● Pierre Laporte ● “Java Performance Tuning” Trainer ● Perfs issues, logs GC eye-compliant http://www.pingtimeout.fr @pingtimeout pierre@pingtimeout.fr @pingtimeout Scala.IO – 24&25 oct 13
  3. 3. Agenda ● 42 minutes of – – Fun (Practice) – Fun (Feedbacks) – Fun (Questions/Answers) – ● Fun (Theory) Fun (Trolls) Because performance is fun ! @pingtimeout Scala.IO – 24&25 oct 13
  4. 4. Disclaimer ● ● ● Be critical with the information contained in this talk JVM Tuning is always made on a case-by-case basis. There is no magic, no special set of flags that produces good results on every project. The resemblance of any opinion, recommendation or comment made during this presentation to performance tuning advice is merely coincidental. @pingtimeout Scala.IO – 24&25 oct 13
  5. 5. Weak Generational Hypothesis 101 @pingtimeout Scala.IO – 24&25 oct 13
  6. 6. Theory – Weak Generational Hypothesis @pingtimeout Scala.IO – 24&25 oct 13
  7. 7. Theory – Weak Generational Hypothesis @pingtimeout Scala.IO – 24&25 oct 13
  8. 8. Theory – Weak Generational Hypothesis ● “Most objects die young” ● Possible scales : – MB, GB, TB – Minutes, hours, days @pingtimeout Scala.IO – 24&25 oct 13
  9. 9. Examples – Weak Generational Hypothesis Total : 145 GB Avg : 48 GB/j GB 3j @pingtimeout Scala.IO – 24&25 oct 13
  10. 10. Examples – Weak Generational Hypothesis Total : 30 TB Avg : 3TB/j TB 10j @pingtimeout Scala.IO – 24&25 oct 13
  11. 11. Examples – Weak Generational Hypothesis ● 35 GB/j – – Play 2 – ● Scala Akka 3 TB/j – Java – Tomcat – Jax-RS / Spring / Hibernate... @pingtimeout Scala.IO – 24&25 oct 13
  12. 12. Examples – Weak Generational Hypothesis Don't forget ! – Be critical – Case-by-case analysis – Please don't do that ----> @pingtimeout Scala.IO – 24&25 oct 13
  13. 13. JVM Heap 101 @pingtimeout Scala.IO – 24&25 oct 13
  14. 14. Theory – Memory pools ● Java Heap – 2 memory pools (Except for G1 GC) ● Young Generation for... young objects ● Old Generation for... old objects !!! Amazing, right ? @pingtimeout Scala.IO – 24&25 oct 13
  15. 15. Theory – Memory pools @pingtimeout Scala.IO – 24&25 oct 13
  16. 16. Theory – Memory pools ● Young Generation = Eden + Survivors ● Every object is created in Eden* * : except when it is too big to fit in Eden * : except in special cases for G1 GC @pingtimeout Scala.IO – 24&25 oct 13
  17. 17. Theory – Memory pools @pingtimeout Scala.IO – 24&25 oct 13
  18. 18. Why memory pools ?! ● Always 2 GC per JVM* * Except for G1 GC ● Young GC – – ● Cheap Duration mostly ≈ O(Live data in YG) Old GC – Expensive – Duration mostly ≈ O(Live data in OG) @pingtimeout Scala.IO – 24&25 oct 13
  19. 19. Why memory pools ?! Common GC Name Young Gen GC Old Gen GC “Parallel GC” PSYoungGen ParOldGen ParNew CMS “CMS” “G1 GC” @pingtimeout G1 Scala.IO – 24&25 oct 13
  20. 20. GC Duration ?! Prove it! @pingtimeout Scala.IO – 24&25 oct 13
  21. 21. App with small live set @pingtimeout Scala.IO – 24&25 oct 13
  22. 22. App with big live set @pingtimeout Scala.IO – 24&25 oct 13
  23. 23. Experiment 1 ● 1st run (SmallLiveSet) – 50 GB heap (-ms50g -mx50g) – 49.9GB Young Gen (-Xmn49900m) – GC logs @pingtimeout Scala.IO – 24&25 oct 13
  24. 24. Experiment 1 ● 1st run (SmallLiveSet) – 50 GB heap ● – 49.9GB Young Gen ● – ● -ms50g -mx50g -Xmn49900m GC logs Result : – 6ms YGC pauses to free 38GB of memory @pingtimeout Scala.IO – 24&25 oct 13
  25. 25. Experiment 1 : Result [PSYoungGen: 38329728K->6496K(44710400K)] 38329744K->6512K(46041600K), 0.0067050 secs] //... ● 38.329.728K data before GC in YG, 6.496K after ● YG size is 44.710.400K ● 38.329.744K data before GC in heap, 6.512K after ● Heap size is 46.041.600K ● Total pause time : 6.7ms @pingtimeout Scala.IO – 24&25 oct 13
  26. 26. Experiment 2 ● 2nd run (SmallLiveSet) – 50 GB heap (-ms50g -mx50g) – 10MB Young Gen (-Xmn10m) – GC logs @pingtimeout Scala.IO – 24&25 oct 13
  27. 27. Experiment 2 ● 1st run (SmallLiveSet) – 50 GB heap ● – 10MB Young Gen ● – ● -ms50g -mx50g -Xmn10m GC logs Result : – 322ms Full GC pauses to free 52GB of memory @pingtimeout Scala.IO – 24&25 oct 13
  28. 28. Experiment 2 : Result [Full GC [PSYoungGen: 3072K->0K(7168K)] [ParOldGen: 52418151K->30287K(52418560K)] 52421223K->30287K(52425728K)//... 0.3229410 secs] ● 52.418.151K data before GC in OG, 30.287K after ● OG size is 52.418.560K ● 52.421.223K data before GC in heap, 30.287K after ● Heap size is 52.425.728K ● Total pause time : 322.9ms @pingtimeout Scala.IO – 24&25 oct 13
  29. 29. Experiments 1->4, Wrap up ● 1st and 2nd runs with BigLiveSet – Ran out of time* :-( *: Stopped measuring at Heap occupancy ≈ 22GB ● GC Pauses : Live set Small Big @pingtimeout 6 millis 55 secs (Full GC)* Scala.IO – 24&25 oct 13 322 millis (Full GC) 250 secs (Full GC)*
  30. 30. Immutability Is immutability a problem ? @pingtimeout Scala.IO – 24&25 oct 13
  31. 31. Immutability ● What does this code do ? (GC point of view ?) @pingtimeout Scala.IO – 24&25 oct 13
  32. 32. Immutability @pingtimeout Scala.IO – 24&25 oct 13
  33. 33. Immutability ● What does this code do ? – Create more temporary objects that dies young – Respect Weak Generational Hypothesis @pingtimeout Scala.IO – 24&25 oct 13
  34. 34. Immutability ● Consequences compared to mutable state – GC will run more frequently – GC time will be short O(Live data in YG) @pingtimeout Scala.IO – 24&25 oct 13
  35. 35. Tuning for immutability ● Reduce YGC frequency (for ParallelGC and CMS) – Identify allocation rate (MB/seconds) – Define the GC interval (seconds between GCs) => Set Eden = Allocation rate * GC interval @pingtimeout Scala.IO – 24&25 oct 13
  36. 36. Tuning for immutability ● Reduce YGC frequency (for ParallelGC and CMS) – AR = 200 MB/s – Desired interval = 1 YGC every 4 seconds => Set Eden to 800 MB (Young to 1 GB) -Xmn1g @pingtimeout Scala.IO – 24&25 oct 13
  37. 37. Poney Pause @pingtimeout Scala.IO – 24&25 oct 13
  38. 38. G1 GC time ! @pingtimeout Scala.IO – 24&25 oct 13
  39. 39. G1 GC ● Idea – Split the heap in 2048 regions – Associate on-the-fly one region to a memory pool – Increase/Shrink memory pool at runtime http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All @pingtimeout Scala.IO – 24&25 oct 13
  40. 40. G1 GC ● Memory pools : – – Old – ● Young (Eden, Survivors) Humongous Humongous: – Objects >= 50% region http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All @pingtimeout Scala.IO – 24&25 oct 13
  41. 41. G1 GC ● 1 ½ GC algorithm: – Always collect Young Gen – Collect Old Gen if possible ● ● ● ● ● Best regions only Time budget large enough Preconditions “mixed” collection G1 is self-tuning http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All @pingtimeout Scala.IO – 24&25 oct 13
  42. 42. G1 GC Tuning ● Define GC time budget -XX:MaxGCPauseMillis=<N> -XX:GCPauseIntervalMillis=<M> ● Set Xms == Xmx ● Drop all other GC-related flags -Xmn, -XX:TenuringThreshold, -XX:NewRatio -XX:InitiatingHeapOccupancyPercent, … ● Don't try to outsmart the GC @pingtimeout Scala.IO – 24&25 oct 13
  43. 43. G1 GC Tuning ● Enable GC logs -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCCause -XX:+PrintAdaptiveSizePolicy ● Wait and see @pingtimeout Scala.IO – 24&25 oct 13
  44. 44. G1 GC Tuning – Low hanging fruits ● Eliminate Humongous allocations – Humongous regions collected only at Full GC – Or when empty [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 0 bytes, allocation request: 79012360 bytes, threshold: 47185920 bytes (45.00 %), source: concurrent humongous allocation] [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: G1 Humongous Allocation] @pingtimeout Scala.IO – 24&25 oct 13
  45. 45. G1 GC Tuning – Low hanging fruits ● Eliminate Humongous allocations – Humongous regions collected only at Full GC – Or when empty 2013-10-21T19:23:48.758+0200: [GC pause (G1 Humongous Allocation) (young) (initial-mark) Desired survivor size 1572864 bytes, new threshold 15 (max 15) , 0.0015120 secs] @pingtimeout Scala.IO – 24&25 oct 13
  46. 46. G1 GC Tuning – Low hanging fruits ● Eliminate Humongous allocations – – ● Track your big allocations Kill'em ! Why ? – Fragments the heap – Can cause evacuations failures @pingtimeout Scala.IO – 24&25 oct 13
  47. 47. G1 GC Tuning – Low hanging fruits ● Get rid of “mixed collections” – Increase heap size – Set a higher threshold for mixed collections -XX:InitiatingHeapOccupancyPercent=<N> ● Why ? – Some phases of G1 are STW (like “baaaaad”) – G1 goal : find the best candidates among all old regions @pingtimeout Scala.IO – 24&25 oct 13
  48. 48. G1 GC Tuning – Low hanging fruits ● Eliminate “Evacuation/Allocation failures” – They are our good old Full Gcs [GC pause (G1 Evacuation Pause) (young) //... [Full GC (Allocation Failure) 5860M->2690M(7000M), 0.9824032 secs] @pingtimeout Scala.IO – 24&25 oct 13
  49. 49. Summary ● Performance is fun ! ● Understand what you do ● Immutability is not an issue (by itself) – Bad code is. ● GC Duration ≈ O(Live data) ● G1 is self-tuning – Try it :-) @pingtimeout Scala.IO – 24&25 oct 13
  50. 50. Thank you for listening ! For more information : http://www.pingtimeout.fr @pingtimeout pierre@pingtimeout.fr @pingtimeout Scala.IO – 24&25 oct 13

×