0
Senior Software Engineer, Etsy.comLIVING WITH GARBAGEGregg Donovan
3.5Years Solr & Lucene at Etsy.com3 years Solr & Lucene atTheLadders.com
8+ million members
20 million items
800k+ active sellers
8+ billion pageviews per month
CodeAsCraft.etsy.com
Understanding GCMonitoring GCDebugging Memory LeaksDesign for Partial Availability
public class BuzzwordDetector {static String[] prefixes = { "synergy", "win-win" };static String[] myArgs = { "clown syner...
New():ref <- allocate()if ref = null /* Heap is full */collect()ref <- allocate()if ref = null /* Heap is still full */err...
markFromRoots():initialise(worklist)for each fld in Rootsref <- *fldif ref != null && not isMarked(ref)setMarked(ref)add(w...
Trivia:Who invented the firstGC and Mark-and-Sweep?
Weak GenerationalHypothesis
Where do objects in common Solrapplication live?AtomicReaderContext?SolrIndexSearcher?SolrRequest?
GC Terminology:Concurrent vs Parallel
JVM Collectors
Serial
Trivia: How doesSystem.identityHashCode() work?
Throughput
CMS
Garbage First (G1)
Continuously Concurrent Compacting Collector (C4)
IBM, Dalvik, etc.?
Why Throughput?
Monitoring
GC time per Solr request
...import java.lang.management.*;...public static long getCollectionTime() {long collectionTime = 0;for (GarbageCollectorM...
Visual GC
export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCAppl...
2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213PSAdaptiveSizePolicy::compute_g...
GC Log Analyzers?GCHistoGCViewergarbagecat
Graphing with Logstergithub.com/etsy/logster
GC Dashboardgithub.com/etsy/dashboard
YourKit.com
Designing for Partial Availability
JVMTI GC Hook?
How can a client ignore GC-ing hosts?
Server lies to clients about availabilityTCP socket receive bufferTCP write buffer
“Banner” protocol1. Connect via TCP2.Wait ~1-10ms3. Either receive magic four byte header or try another host4. Only send ...
0xC0DEA5CF
What if GC happensmid-request?
Backup requests
Jeff Dean: Achieving RapidResponse Time in LargeOnline Services
Solr sharding?Right now, only as fast as the slowest shard.
“Make a reliable wholeout of unreliable parts.”
Memory Leaks
Solr API hooks forcustom codeQParserPlugin SearchComponentSolrRequestHandler SolrEventListenerSolrCache ValueSourceParsere...
PSA: Are you sure youneed custom code?
CoreContainer#getCore()RefCounted<SolrIndexSearcher>
SolrIndexSearcher generation marking withYourKit triggers
Miscellaneous Topics
System.gc()?
-XX:+UseCompressedOops
-XX:+UseNUMA
Paging
#!/usr/bin/env bash# This script is designed to be run every minute by cron.host=$(hostname -s)psout=$(ps h -p `cat /var/r...
Solution 1: Buy more RAMIdeally enough RAM to:Keep index in OS file buffersAND ensure no paging ofVM memoryAND whatever els...
echo “0” > /proc/sys/vm/swappiness
mlock()/mlockall()
echo “-17” > /proc/$PID/oom_adjMercy from the OOM Killer
Huge Pages
-XX:+AlwaysPreTouch
Possible Future Directions
Many small VMs instead of one large VMmicrosharding
In-memory Lucene codecsI.e. custom DirectPostingsFormat
Off-heap memory with sun.misc.Unsafe?
Try G1 again
Try C4 again
Resources
gchandbook.org
bit.ly/mmgcbMark Miller’s GC Bootcamp
bit.ly/gilteneGilTene: Understanding JavaGarbage Collection
bit.ly/cpumemoryUlrich Drepper: What Every Programmer ShouldKnow About Memory
github.com/pingtimeout/jvm-options
Read the JVM Source(Not as scary as it sounds.)hg.openjdk.java.net/jdk7/jdk7
Mechanical Sympathy Google Groupbit.ly/mechsym
CONTACTGregg Donovangregg@etsy.com
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Upcoming SlideShare
Loading in...5
×

Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013

1,556

Published on

Understanding the impact of garbage collection, both at a single node and a cluster level, is key to developing high-performance, high-availability Solr and Lucene applications. After a brief overview of garbage collection theory, we will review the design and use of the various collectors in the JVM.

At a single-node level, we will explore GC monitoring -- how to understand GC logs, how to monitor what % of your Solr request time is spend on GC, how to use VisualGC, YourKit, and other tools, and what to log and monitor. We will review GC tuning and how to measure success.

At a cluster-level, we will review how to design for partial availability -- how to avoid sending requests to a GCing node and how to be resilient to mid-request GC pauses.For application development, we will review common memory leak scenarios in custom Solr and Lucene application code and how to detect them.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,556
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013"

  1. 1. Senior Software Engineer, Etsy.comLIVING WITH GARBAGEGregg Donovan
  2. 2. 3.5Years Solr & Lucene at Etsy.com3 years Solr & Lucene atTheLadders.com
  3. 3. 8+ million members
  4. 4. 20 million items
  5. 5. 800k+ active sellers
  6. 6. 8+ billion pageviews per month
  7. 7. CodeAsCraft.etsy.com
  8. 8. Understanding GCMonitoring GCDebugging Memory LeaksDesign for Partial Availability
  9. 9. public class BuzzwordDetector {static String[] prefixes = { "synergy", "win-win" };static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };public static void main(String[] args) {args = myArgs;int buzzwords = 0;for (int i = 0; i < args.length; i++) {String lc = args[i].toLowerCase();for (int j = 0; j < prefixes.length; j++) {if (lc.contains(prefixes[j])) {buzzwords++;}}}System.out.println("Found " + buzzwords + " buzzwords");}}
  10. 10. New():ref <- allocate()if ref = null /* Heap is full */collect()ref <- allocate()if ref = null /* Heap is still full */error "Out of memory"return refatomic collect():markFromRoots()sweep(HeapStart, HeapEnd)From Garbage Collection Handbook
  11. 11. markFromRoots():initialise(worklist)for each fld in Rootsref <- *fldif ref != null && not isMarked(ref)setMarked(ref)add(worklist, ref)mark()initialise(worklist):worklist <- emptymark():while not isEmpty(worklist)ref <- remove(worklist) /* ref is marked */for each fld in Pointers(ref)child <- *fldif (child != null && not isMarked(child)setMarked(child)add(worklist, child)From Garbage Collection Handbook
  12. 12. Trivia:Who invented the firstGC and Mark-and-Sweep?
  13. 13. Weak GenerationalHypothesis
  14. 14. Where do objects in common Solrapplication live?AtomicReaderContext?SolrIndexSearcher?SolrRequest?
  15. 15. GC Terminology:Concurrent vs Parallel
  16. 16. JVM Collectors
  17. 17. Serial
  18. 18. Trivia: How doesSystem.identityHashCode() work?
  19. 19. Throughput
  20. 20. CMS
  21. 21. Garbage First (G1)
  22. 22. Continuously Concurrent Compacting Collector (C4)
  23. 23. IBM, Dalvik, etc.?
  24. 24. Why Throughput?
  25. 25. Monitoring
  26. 26. GC time per Solr request
  27. 27. ...import java.lang.management.*;...public static long getCollectionTime() {long collectionTime = 0;for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()){collectionTime += mbean.getCollectionTime();}return collectionTime;}Available via JMX
  28. 28. Visual GC
  29. 29. export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintSafepointStatistics -Xloggc:/var/log/search/gc.log"
  30. 30. 2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live:22190682112AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672avg_young_live: 7340911616AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost:0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space:16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672desired_eden_size: 8321564672AdaptiveSizeStop: collection: 213[PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36sys=0.03, real=8.77 secs]Heap after GC invocations=213 (full 210):PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000,0x00007ff0dd000000)eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000)from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000)to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000)ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000,0x00007fee47ab0000)object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000)PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000,0x00007fe91d000000)object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000)}
  31. 31. GC Log Analyzers?GCHistoGCViewergarbagecat
  32. 32. Graphing with Logstergithub.com/etsy/logster
  33. 33. GC Dashboardgithub.com/etsy/dashboard
  34. 34. YourKit.com
  35. 35. Designing for Partial Availability
  36. 36. JVMTI GC Hook?
  37. 37. How can a client ignore GC-ing hosts?
  38. 38. Server lies to clients about availabilityTCP socket receive bufferTCP write buffer
  39. 39. “Banner” protocol1. Connect via TCP2.Wait ~1-10ms3. Either receive magic four byte header or try another host4. Only send query after receiving header from server
  40. 40. 0xC0DEA5CF
  41. 41. What if GC happensmid-request?
  42. 42. Backup requests
  43. 43. Jeff Dean: Achieving RapidResponse Time in LargeOnline Services
  44. 44. Solr sharding?Right now, only as fast as the slowest shard.
  45. 45. “Make a reliable wholeout of unreliable parts.”
  46. 46. Memory Leaks
  47. 47. Solr API hooks forcustom codeQParserPlugin SearchComponentSolrRequestHandler SolrEventListenerSolrCache ValueSourceParseretc.FieldType
  48. 48. PSA: Are you sure youneed custom code?
  49. 49. CoreContainer#getCore()RefCounted<SolrIndexSearcher>
  50. 50. SolrIndexSearcher generation marking withYourKit triggers
  51. 51. Miscellaneous Topics
  52. 52. System.gc()?
  53. 53. -XX:+UseCompressedOops
  54. 54. -XX:+UseNUMA
  55. 55. Paging
  56. 56. #!/usr/bin/env bash# This script is designed to be run every minute by cron.host=$(hostname -s)psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)min_flt=$(echo $psout | awk {print $1}) # minor page faultsmaj_flt=$(echo $psout | awk {print $2}) # major page faultsepoch_s=$(date +%s)echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | ncgraphite.etsycorp.com 2003echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | ncgraphite.etsycorp.com 2003
  57. 57. Solution 1: Buy more RAMIdeally enough RAM to:Keep index in OS file buffersAND ensure no paging ofVM memoryAND whatever else happens on the box~$5-10/GB
  58. 58. echo “0” > /proc/sys/vm/swappiness
  59. 59. mlock()/mlockall()
  60. 60. echo “-17” > /proc/$PID/oom_adjMercy from the OOM Killer
  61. 61. Huge Pages
  62. 62. -XX:+AlwaysPreTouch
  63. 63. Possible Future Directions
  64. 64. Many small VMs instead of one large VMmicrosharding
  65. 65. In-memory Lucene codecsI.e. custom DirectPostingsFormat
  66. 66. Off-heap memory with sun.misc.Unsafe?
  67. 67. Try G1 again
  68. 68. Try C4 again
  69. 69. Resources
  70. 70. gchandbook.org
  71. 71. bit.ly/mmgcbMark Miller’s GC Bootcamp
  72. 72. bit.ly/gilteneGilTene: Understanding JavaGarbage Collection
  73. 73. bit.ly/cpumemoryUlrich Drepper: What Every Programmer ShouldKnow About Memory
  74. 74. github.com/pingtimeout/jvm-options
  75. 75. Read the JVM Source(Not as scary as it sounds.)hg.openjdk.java.net/jdk7/jdk7
  76. 76. Mechanical Sympathy Google Groupbit.ly/mechsym
  77. 77. CONTACTGregg Donovangregg@etsy.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×