Your SlideShare is downloading. ×
Designing for garbage collection
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Designing for garbage collection

642
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
642
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Designing for Garbage Collection Gregg Donovan Senior Software Engineer Etsy.com Wednesday, July 31, 13
  • 2. 3.5Years Search Engineering at Etsy.com 5 years Search & Web Engineering atTheLadders.com Wednesday, July 31, 13
  • 3. Wednesday, July 31, 13
  • 4. 25+ million members Wednesday, July 31, 13
  • 5. 20+ million items Wednesday, July 31, 13
  • 6. 900k+ active sellers Wednesday, July 31, 13
  • 7. 60+ million monthly unique visitors Wednesday, July 31, 13
  • 8. Wednesday, July 31, 13
  • 9. Wednesday, July 31, 13
  • 10. Wednesday, July 31, 13
  • 11. Wednesday, July 31, 13
  • 12. Wednesday, July 31, 13
  • 13. Wednesday, July 31, 13
  • 14. Wednesday, July 31, 13
  • 15. CodeAsCraft.etsy.com Wednesday, July 31, 13
  • 16. Wednesday, July 31, 13
  • 17. Understanding GC Wednesday, July 31, 13
  • 18. Understanding GC Monitoring GC Wednesday, July 31, 13
  • 19. Understanding GC Monitoring GC Debugging Memory Leaks Wednesday, July 31, 13
  • 20. Understanding GC Monitoring GC Debugging Memory Leaks Design for Partial Availability Wednesday, July 31, 13
  • 21. Wednesday, July 31, 13
  • 22. public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" }; public static void main(String[] args) { args = myArgs; int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); } } Wednesday, July 31, 13
  • 23. New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd) From Garbage Collection Handbook Wednesday, July 31, 13
  • 24. markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child) From Garbage Collection Handbook Wednesday, July 31, 13
  • 25. Trivia:Who invented the first GC and Mark-and-Sweep? Wednesday, July 31, 13
  • 26. Weak Generational Hypothesis Wednesday, July 31, 13
  • 27. Where do objects in your application live? Wednesday, July 31, 13
  • 28. GC Terminology: Concurrent vs Parallel Wednesday, July 31, 13
  • 29. JVM Collectors Wednesday, July 31, 13
  • 30. Serial Wednesday, July 31, 13
  • 31. Throughput Wednesday, July 31, 13
  • 32. CMS Wednesday, July 31, 13
  • 33. Garbage First (G1) Wednesday, July 31, 13
  • 34. Continuously Concurrent Compacting Collector (C4) Wednesday, July 31, 13
  • 35. IBM, Dalvik, etc.? Wednesday, July 31, 13
  • 36. Why Throughput? Wednesday, July 31, 13
  • 37. Questions so far? Wednesday, July 31, 13
  • 38. Monitoring Wednesday, July 31, 13
  • 39. GC time per request Wednesday, July 31, 13
  • 40. ... import java.lang.management.*; ... public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; } Available via JMX Wednesday, July 31, 13
  • 41. Wednesday, July 31, 13
  • 42. Visual GC Wednesday, July 31, 13
  • 43. Wednesday, July 31, 13
  • 44. Wednesday, July 31, 13
  • 45. export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintSafepointStatistics -Xloggc:/var/log/search/gc.log" Wednesday, July 31, 13
  • 46. Wednesday, July 31, 13
  • 47. 2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112 AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616 AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98 PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672 AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K- >29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000) } Wednesday, July 31, 13
  • 48. GC Log Analyzers? GCHisto GCViewer garbagecat github.com/Netflix/gcviz Wednesday, July 31, 13
  • 49. Graphing with Logster github.com/etsy/logster Wednesday, July 31, 13
  • 50. Wednesday, July 31, 13
  • 51. GC Dashboard github.com/etsy/dashboard Wednesday, July 31, 13
  • 52. Wednesday, July 31, 13
  • 53. YourKit.com Wednesday, July 31, 13
  • 54. Designing for Partial Availability Wednesday, July 31, 13
  • 55. JVMTI GC Hook? Wednesday, July 31, 13
  • 56. How can a client ignore GC-ing hosts? Wednesday, July 31, 13
  • 57. Server lies to clients about availability TCP socket receive buffer TCP write buffer Wednesday, July 31, 13
  • 58. “Banner” protocol 1. Connect via TCP 2.Wait ~1-10ms 3. Either receive magic four byte header or try another host 4. Only send query after receiving header from server Wednesday, July 31, 13
  • 59. 0xC0DEA5CF Wednesday, July 31, 13
  • 60. public function open() { $this->handle_ = @fsockopen($this->host_, $this->port_, $errno, $errstr, $this->connectTimeout_ / 1000.0); try { stream_set_timeout($this->handle_, 0, $banner_timeout * 1000); $read_start = microtime(true); $data = $this->readAll(4); $read_time = (microtime(true) - $read_start) * 1000; // micros to millis $arr = unpack('N', $data); $value = $arr[1]; if ($value !== 0xC0DEA5CF) { StatsD::increment("search.baddata.{$short_hostname}.{$this->getPort()}"); throw new TTransportException("[$value] does match banner [0xC0DEA5CF]"); } } catch (Exception $e) { $this->close(); // this won't necessarily be closed by clients throw new TTransportException($message, self::BANNER_TIMEOUT_CODE); } } Wednesday, July 31, 13
  • 61. private static class BannerSendingTProcessorFactory extends TProcessorFactory { private final TProcessor base; public BannerSendingTProcessorFactory(TProcessor base) { super(base); this.base = base; } @Override public TProcessor getProcessor(TTransport trans) { return new BannerTProcessor(base, (TSocket) trans); } } private static final class BannerTProcessor implements TProcessor { private final TProcessor base; private final TSocket tsocket; private BannerTProcessor(TProcessor base, TSocket tsocket) { this.base = checkNotNull(base); this.tsocket = checkNotNull(tsocket); } @Override public boolean process(TProtocol in, TProtocol out) throws TException { this.tsocket.write(TBannerUtil.BANNER, 0, 4); this.tsocket.flush(); return this.base.process(in, out); } } Wednesday, July 31, 13
  • 62. What if GC happens mid-request? Wednesday, July 31, 13
  • 63. Backup requests Wednesday, July 31, 13
  • 64. Jeff Dean: Achieving Rapid Response Time in Large Online Services Wednesday, July 31, 13
  • 65. Sharding? Naive approach: only as fast as the slowest shard. Wednesday, July 31, 13
  • 66. “Make a reliable whole out of unreliable parts.” Wednesday, July 31, 13
  • 67. Memory Leaks Wednesday, July 31, 13
  • 68. SolrIndexSearcher generation marking with YourKit triggers Wednesday, July 31, 13
  • 69. Wednesday, July 31, 13
  • 70. Questions so far? Wednesday, July 31, 13
  • 71. Miscellaneous Topics Wednesday, July 31, 13
  • 72. System.gc()? Wednesday, July 31, 13
  • 73. -XX:+UseCompressedOops Wednesday, July 31, 13
  • 74. -XX:+UseNUMA Wednesday, July 31, 13
  • 75. Paging Wednesday, July 31, 13
  • 76. #!/usr/bin/env bash # This script is designed to be run every minute by cron. host=$(hostname -s) psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null) min_flt=$(echo $psout | awk '{print $1}') # minor page faults maj_flt=$(echo $psout | awk '{print $2}') # major page faults epoch_s=$(date +%s) echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003 echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003 Wednesday, July 31, 13
  • 77. Solution 1: Buy more RAM Ideally enough RAM to: Keep data in OS file buffers AND ensure no paging ofVM memory AND whatever else happens on the box ~$5-10/GB Wednesday, July 31, 13
  • 78. echo “0” > /proc/sys/vm/swappiness Wednesday, July 31, 13
  • 79. mlock()/mlockall() github.com/LucidWorks/mlockall-agent Wednesday, July 31, 13
  • 80. echo “-17” > /proc/$PID/oom_adj Mercy from the OOM Killer Wednesday, July 31, 13
  • 81. Huge Pages Wednesday, July 31, 13
  • 82. -XX:+AlwaysPreTouch Wednesday, July 31, 13
  • 83. Future Directions Wednesday, July 31, 13
  • 84. Many small VMs instead of one large VM microsharding Wednesday, July 31, 13
  • 85. Off-heap memory with sun.misc.Unsafe? Wednesday, July 31, 13
  • 86. Try G1 again Wednesday, July 31, 13
  • 87. Try C4 again Wednesday, July 31, 13
  • 88. Resources Wednesday, July 31, 13
  • 89. gchandbook.org Wednesday, July 31, 13
  • 90. Wednesday, July 31, 13
  • 91. bit.ly/mmgcb Mark Miller’s GC Bootcamp Wednesday, July 31, 13
  • 92. bit.ly/giltene GilTene: Understanding Java Garbage Collection Wednesday, July 31, 13
  • 93. bit.ly/cpumemory Ulrich Drepper: What Every Programmer Should Know About Memory Wednesday, July 31, 13
  • 94. github.com/pingtimeout/jvm-options Wednesday, July 31, 13
  • 95. Read the JVM Source (Not as scary as it sounds.) hg.openjdk.java.net/jdk7/jdk7 Wednesday, July 31, 13
  • 96. Mechanical Sympathy Google Group bit.ly/mechsym Wednesday, July 31, 13
  • 97. Questions? Thanks for coming! Gregg Donovan gregg@etsy.com Wednesday, July 31, 13