Living With Garbage
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Living With Garbage

on

  • 1,160 views

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.

Statistics

Views

Total Views
1,160
Views on SlideShare
1,156
Embed Views
4

Actions

Likes
1
Downloads
8
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Living With Garbage Presentation Transcript

  • 1. LIVING WITH GARBAGE! Gregg Donovan Senior Software Engineer
 etsy.com
  • 2. 4 Years Solr & Lucene at etsy.com 3 years Solr & Lucene at TheLadders.com
  • 3. 10+ million members
  • 4. 24+ million items
  • 5. 1mm+ active sellers
  • 6. 10+ billion pageviews per month
  • 7. CodeAsCraft.etsy.com
  • 8. Understanding GC Monitoring GC Debugging Memory Leaks Design for Partial Availability
  • 9. public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" }; ! public static void main(String[] args) { args = myArgs; ! } } int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords");
  • 10. New(): ref <- allocate() if ref = null collect() ref <- allocate() if ref = null error "Out of memory" return ref /* Heap is full */ /* Heap is still full */ atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd) From Garbage Collection Handbook
  • 11. markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child) /* ref is marked */ From Garbage Collection Handbook
  • 12. Trivia: Who invented the first GC and Mark-and-Sweep?
  • 13. Weak Generational Hypothesis
  • 14. Where do objects in common Solr application live? SolrRequest? AtomicReaderContext? SolrIndexSearcher?
  • 15. GC Terminology: Concurrent vs Parallel
  • 16. JVM Collectors
  • 17. Serial
  • 18. Trivia: How does System.identityHashCode() work?
  • 19. Throughput
  • 20. CMS
  • 21. Garbage First (G1)
  • 22. Continuously Concurrent Compacting Collector (C4)
  • 23. IBM, Dalvik, etc.?
  • 24. Why Throughput?
  • 25. Questions so far?
  • 26. Monitoring
  • 27. GC time per Solr request
  • 28. Available via JMX ... import java.lang.management.*; ... ! { public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) collectionTime += mbean.getCollectionTime(); } } return collectionTime;
  • 29. Visual GC
  • 30. export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintSafepointStatistics -Xloggc:/var/log/search/gc.log"
  • 31. 2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112 AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616 AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98 PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672 AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K>29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000) }
  • 32. GC Log Analyzers? GCHisto GCViewer garbagecat
  • 33. Graphing with Logster github.com/etsy/logster
  • 34. GC Dashboard github.com/etsy/dashboard
  • 35. YourKit.com
  • 36. Designing for Partial Availability
  • 37. JVMTI GC Hook?
  • 38. How can a client ignore GC-ing hosts?
  • 39. Server lies to clients about availability TCP socket receive buffer TCP write buffer
  • 40. “Banner” protocol 1. Connect via TCP 2. Wait ~1-10ms 3. Either receive magic four byte header or try another host 4. Only send query after receiving header from server
  • 41. 0xC0DEA5CF
  • 42. What if GC happens mid-request?
  • 43. Backup requests
  • 44. Jeff Dean: Achieving Rapid Response Time in Large Online Services
  • 45. Solr sharding? Right now, only as fast as the slowest shard.
  • 46. “Make a reliable whole out of unreliable parts.”
  • 47. Memory Leaks
  • 48. Solr API hooks for custom code QParserPlugin SearchComponent SolrRequestHandler SolrEventListener SolrCache ValueSourceParser FieldType etc.
  • 49. PSA: Are you sure you need custom code?
  • 50. RefCounted<SolrIndexSearcher> CoreContainer#getCore()
  • 51. SolrIndexSearcher generation marking with YourKit triggers
  • 52. Questions so far?
  • 53. Miscellaneous Topics
  • 54. System.gc()?
  • 55. -XX:+UseCompressedOops
  • 56. -XX:+UseNUMA
  • 57. Paging
  • 58. #!/usr/bin/env bash ! # This script is designed to be run every minute by cron. ! host=$(hostname -s) ! psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null) min_flt=$(echo $psout | awk '{print $1}') # minor page faults maj_flt=$(echo $psout | awk '{print $2}') # major page faults ! epoch_s=$(date +%s) ! echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003 echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003
  • 59. Solution 1: Buy more RAM ~$5-10/GB Ideally enough RAM to: Keep index in OS file buffers AND ensure no paging of VM memory AND whatever else happens on the box
  • 60. echo “0” > /proc/sys/vm/swappiness
  • 61. mlock()/mlockall() github.com/LucidWorks/mlockall-agent
  • 62. Mercy from the OOM Killer echo “-17” > /proc/$PID/oom_adj
  • 63. Huge Pages
  • 64. -XX:+AlwaysPreTouch
  • 65. Possible Future Directions
  • 66. Many small VMs instead of one large VM microsharding
  • 67. In-memory Lucene codecs I.e. custom DirectPostingsFormat
  • 68. Off-heap memory with sun.misc.Unsafe?
  • 69. Try G1 again
  • 70. Try C4 again
  • 71. Resources
  • 72. gchandbook.org
  • 73. Mark Miller’s GC Bootcamp bit.ly/mmgcb
  • 74. Gil Tene: Understanding Java Garbage Collection bit.ly/giltene
  • 75. Ulrich Drepper: What Every Programmer Should Know About Memory bit.ly/cpumemory
  • 76. github.com/pingtimeout/jvm-options
  • 77. Read the JVM Source (Not as scary as it sounds.) hg.openjdk.java.net/jdk7/jdk7
  • 78. Mechanical Sympathy Google Group bit.ly/mechsym
  • 79. Thanks for coming! Questions? gregg@etsy.com