LIVING WITH
GARBAGE!
Gregg Donovan
Senior Software Engineer

etsy.com
4 Years Solr & Lucene at etsy.com
3 years Solr & Lucene at TheLadders.com
10+ million members
24+ million items
1mm+ active sellers
10+ billion pageviews per month
CodeAsCraft.etsy.com
Understanding GC
Monitoring GC
Debugging Memory Leaks
Design for Partial Availability
public class BuzzwordDetector {
static String[] prefixes = { "synergy", "win-win" };
static String[] myArgs = { "clown syn...
New():
ref <- allocate()
if ref = null
collect()
ref <- allocate()
if ref = null
error "Out of memory"
return ref

/* Heap...
markFromRoots():
initialise(worklist)
for each fld in Roots
ref <- *fld
if ref != null && not isMarked(ref)
setMarked(ref)...
Trivia: Who invented the first
GC and Mark-and-Sweep?
Weak Generational
Hypothesis
Where do objects in common Solr
application live?
SolrRequest?
AtomicReaderContext?
SolrIndexSearcher?
GC Terminology:
Concurrent vs Parallel
JVM Collectors
Serial
Trivia: How does
System.identityHashCode() work?
Throughput
CMS
Garbage First (G1)
Continuously Concurrent Compacting Collector (C4)
IBM, Dalvik, etc.?
Why Throughput?
Questions so far?
Monitoring
GC time per Solr request
Available via JMX
...
import java.lang.management.*;
...
!

{

public static long getCollectionTime() {
long collectionTim...
Visual GC
export GC_DEBUG="-verbose:gc 
-XX:+PrintGCDateStamps 
-XX:+PrintHeapAtGC 
-XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGC...
2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213
PSAdaptiveSizePolicy::compute_...
GC Log Analyzers?
GCHisto
GCViewer
garbagecat
Graphing with Logster
github.com/etsy/logster
GC Dashboard
github.com/etsy/dashboard
YourKit.com
Designing for Partial Availability
JVMTI GC Hook?
How can a client ignore GC-ing hosts?
Server lies to clients about availability
TCP socket receive buffer
TCP write buffer
“Banner” protocol
1. Connect via TCP
2. Wait ~1-10ms
3. Either receive magic four byte header or try another host
4. Only ...
0xC0DEA5CF
What if GC happens
mid-request?
Backup requests
Jeff Dean: Achieving Rapid
Response Time in Large
Online Services
Solr sharding?
Right now, only as fast as the slowest shard.
“Make a reliable whole
out of unreliable parts.”
Memory Leaks
Solr API hooks for
custom code
QParserPlugin

SearchComponent

SolrRequestHandler

SolrEventListener

SolrCache

ValueSour...
PSA: Are you sure you
need custom code?
RefCounted<SolrIndexSearcher>
CoreContainer#getCore()
SolrIndexSearcher generation marking with
YourKit triggers
Questions so far?
Miscellaneous Topics
System.gc()?
-XX:+UseCompressedOops
-XX:+UseNUMA
Paging
#!/usr/bin/env bash
!

# This script is designed to be run every minute by cron.
!

host=$(hostname -s)
!

psout=$(ps h -p...
Solution 1: Buy more RAM
~$5-10/GB
Ideally enough RAM to:	

Keep index in OS file buffers	

AND ensure no paging of VM memo...
echo “0” > /proc/sys/vm/swappiness
mlock()/mlockall()
github.com/LucidWorks/mlockall-agent
Mercy from the OOM Killer
echo “-17” > /proc/$PID/oom_adj
Huge Pages
-XX:+AlwaysPreTouch
Possible Future Directions
Many small VMs instead of one large VM

microsharding
In-memory Lucene codecs

I.e. custom DirectPostingsFormat
Off-heap memory with sun.misc.Unsafe?
Try G1 again
Try C4 again
Resources
gchandbook.org
Mark Miller’s GC Bootcamp
bit.ly/mmgcb
Gil Tene: Understanding Java
Garbage Collection
bit.ly/giltene
Ulrich Drepper: What Every Programmer Should
Know About Memory
bit.ly/cpumemory
github.com/pingtimeout/jvm-options
Read the JVM Source
(Not as scary as it sounds.)

hg.openjdk.java.net/jdk7/jdk7
Mechanical Sympathy Google Group

bit.ly/mechsym
Thanks for coming!
Questions?
gregg@etsy.com
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Living With Garbage
Upcoming SlideShare
Loading in...5
×

Living With Garbage

1,208

Published on

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,208
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Living With Garbage

  1. 1. LIVING WITH GARBAGE! Gregg Donovan Senior Software Engineer
 etsy.com
  2. 2. 4 Years Solr & Lucene at etsy.com 3 years Solr & Lucene at TheLadders.com
  3. 3. 10+ million members
  4. 4. 24+ million items
  5. 5. 1mm+ active sellers
  6. 6. 10+ billion pageviews per month
  7. 7. CodeAsCraft.etsy.com
  8. 8. Understanding GC Monitoring GC Debugging Memory Leaks Design for Partial Availability
  9. 9. public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" }; ! public static void main(String[] args) { args = myArgs; ! } } int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords");
  10. 10. New(): ref <- allocate() if ref = null collect() ref <- allocate() if ref = null error "Out of memory" return ref /* Heap is full */ /* Heap is still full */ atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd) From Garbage Collection Handbook
  11. 11. markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child) /* ref is marked */ From Garbage Collection Handbook
  12. 12. Trivia: Who invented the first GC and Mark-and-Sweep?
  13. 13. Weak Generational Hypothesis
  14. 14. Where do objects in common Solr application live? SolrRequest? AtomicReaderContext? SolrIndexSearcher?
  15. 15. GC Terminology: Concurrent vs Parallel
  16. 16. JVM Collectors
  17. 17. Serial
  18. 18. Trivia: How does System.identityHashCode() work?
  19. 19. Throughput
  20. 20. CMS
  21. 21. Garbage First (G1)
  22. 22. Continuously Concurrent Compacting Collector (C4)
  23. 23. IBM, Dalvik, etc.?
  24. 24. Why Throughput?
  25. 25. Questions so far?
  26. 26. Monitoring
  27. 27. GC time per Solr request
  28. 28. Available via JMX ... import java.lang.management.*; ... ! { public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) collectionTime += mbean.getCollectionTime(); } } return collectionTime;
  29. 29. Visual GC
  30. 30. export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintSafepointStatistics -Xloggc:/var/log/search/gc.log"
  31. 31. 2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112 AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616 AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98 PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672 AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K>29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000) }
  32. 32. GC Log Analyzers? GCHisto GCViewer garbagecat
  33. 33. Graphing with Logster github.com/etsy/logster
  34. 34. GC Dashboard github.com/etsy/dashboard
  35. 35. YourKit.com
  36. 36. Designing for Partial Availability
  37. 37. JVMTI GC Hook?
  38. 38. How can a client ignore GC-ing hosts?
  39. 39. Server lies to clients about availability TCP socket receive buffer TCP write buffer
  40. 40. “Banner” protocol 1. Connect via TCP 2. Wait ~1-10ms 3. Either receive magic four byte header or try another host 4. Only send query after receiving header from server
  41. 41. 0xC0DEA5CF
  42. 42. What if GC happens mid-request?
  43. 43. Backup requests
  44. 44. Jeff Dean: Achieving Rapid Response Time in Large Online Services
  45. 45. Solr sharding? Right now, only as fast as the slowest shard.
  46. 46. “Make a reliable whole out of unreliable parts.”
  47. 47. Memory Leaks
  48. 48. Solr API hooks for custom code QParserPlugin SearchComponent SolrRequestHandler SolrEventListener SolrCache ValueSourceParser FieldType etc.
  49. 49. PSA: Are you sure you need custom code?
  50. 50. RefCounted<SolrIndexSearcher> CoreContainer#getCore()
  51. 51. SolrIndexSearcher generation marking with YourKit triggers
  52. 52. Questions so far?
  53. 53. Miscellaneous Topics
  54. 54. System.gc()?
  55. 55. -XX:+UseCompressedOops
  56. 56. -XX:+UseNUMA
  57. 57. Paging
  58. 58. #!/usr/bin/env bash ! # This script is designed to be run every minute by cron. ! host=$(hostname -s) ! psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null) min_flt=$(echo $psout | awk '{print $1}') # minor page faults maj_flt=$(echo $psout | awk '{print $2}') # major page faults ! epoch_s=$(date +%s) ! echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003 echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003
  59. 59. Solution 1: Buy more RAM ~$5-10/GB Ideally enough RAM to: Keep index in OS file buffers AND ensure no paging of VM memory AND whatever else happens on the box
  60. 60. echo “0” > /proc/sys/vm/swappiness
  61. 61. mlock()/mlockall() github.com/LucidWorks/mlockall-agent
  62. 62. Mercy from the OOM Killer echo “-17” > /proc/$PID/oom_adj
  63. 63. Huge Pages
  64. 64. -XX:+AlwaysPreTouch
  65. 65. Possible Future Directions
  66. 66. Many small VMs instead of one large VM microsharding
  67. 67. In-memory Lucene codecs I.e. custom DirectPostingsFormat
  68. 68. Off-heap memory with sun.misc.Unsafe?
  69. 69. Try G1 again
  70. 70. Try C4 again
  71. 71. Resources
  72. 72. gchandbook.org
  73. 73. Mark Miller’s GC Bootcamp bit.ly/mmgcb
  74. 74. Gil Tene: Understanding Java Garbage Collection bit.ly/giltene
  75. 75. Ulrich Drepper: What Every Programmer Should Know About Memory bit.ly/cpumemory
  76. 76. github.com/pingtimeout/jvm-options
  77. 77. Read the JVM Source (Not as scary as it sounds.) hg.openjdk.java.net/jdk7/jdk7
  78. 78. Mechanical Sympathy Google Group bit.ly/mechsym
  79. 79. Thanks for coming! Questions? gregg@etsy.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×