• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Living With Garbage
 

Living With Garbage

on

  • 956 views

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.

"Living With Garbage" talk by Gregg Donovan at the NYC Search and Discovery Meetup on 12/12/2013.

Statistics

Views

Total Views
956
Views on SlideShare
952
Embed Views
4

Actions

Likes
1
Downloads
8
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Living With Garbage Living With Garbage Presentation Transcript

    • LIVING WITH GARBAGE! Gregg Donovan Senior Software Engineer
 etsy.com
    • 4 Years Solr & Lucene at etsy.com 3 years Solr & Lucene at TheLadders.com
    • 10+ million members
    • 24+ million items
    • 1mm+ active sellers
    • 10+ billion pageviews per month
    • CodeAsCraft.etsy.com
    • Understanding GC Monitoring GC Debugging Memory Leaks Design for Partial Availability
    • public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" }; ! public static void main(String[] args) { args = myArgs; ! } } int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords");
    • New(): ref <- allocate() if ref = null collect() ref <- allocate() if ref = null error "Out of memory" return ref /* Heap is full */ /* Heap is still full */ atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd) From Garbage Collection Handbook
    • markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child) /* ref is marked */ From Garbage Collection Handbook
    • Trivia: Who invented the first GC and Mark-and-Sweep?
    • Weak Generational Hypothesis
    • Where do objects in common Solr application live? SolrRequest? AtomicReaderContext? SolrIndexSearcher?
    • GC Terminology: Concurrent vs Parallel
    • JVM Collectors
    • Serial
    • Trivia: How does System.identityHashCode() work?
    • Throughput
    • CMS
    • Garbage First (G1)
    • Continuously Concurrent Compacting Collector (C4)
    • IBM, Dalvik, etc.?
    • Why Throughput?
    • Questions so far?
    • Monitoring
    • GC time per Solr request
    • Available via JMX ... import java.lang.management.*; ... ! { public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) collectionTime += mbean.getCollectionTime(); } } return collectionTime;
    • Visual GC
    • export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintSafepointStatistics -Xloggc:/var/log/search/gc.log"
    • 2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112 AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616 AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98 PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672 AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K>29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000) }
    • GC Log Analyzers? GCHisto GCViewer garbagecat
    • Graphing with Logster github.com/etsy/logster
    • GC Dashboard github.com/etsy/dashboard
    • YourKit.com
    • Designing for Partial Availability
    • JVMTI GC Hook?
    • How can a client ignore GC-ing hosts?
    • Server lies to clients about availability TCP socket receive buffer TCP write buffer
    • “Banner” protocol 1. Connect via TCP 2. Wait ~1-10ms 3. Either receive magic four byte header or try another host 4. Only send query after receiving header from server
    • 0xC0DEA5CF
    • What if GC happens mid-request?
    • Backup requests
    • Jeff Dean: Achieving Rapid Response Time in Large Online Services
    • Solr sharding? Right now, only as fast as the slowest shard.
    • “Make a reliable whole out of unreliable parts.”
    • Memory Leaks
    • Solr API hooks for custom code QParserPlugin SearchComponent SolrRequestHandler SolrEventListener SolrCache ValueSourceParser FieldType etc.
    • PSA: Are you sure you need custom code?
    • RefCounted<SolrIndexSearcher> CoreContainer#getCore()
    • SolrIndexSearcher generation marking with YourKit triggers
    • Questions so far?
    • Miscellaneous Topics
    • System.gc()?
    • -XX:+UseCompressedOops
    • -XX:+UseNUMA
    • Paging
    • #!/usr/bin/env bash ! # This script is designed to be run every minute by cron. ! host=$(hostname -s) ! psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null) min_flt=$(echo $psout | awk '{print $1}') # minor page faults maj_flt=$(echo $psout | awk '{print $2}') # major page faults ! epoch_s=$(date +%s) ! echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003 echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | nc graphite.etsycorp.com 2003
    • Solution 1: Buy more RAM ~$5-10/GB Ideally enough RAM to: Keep index in OS file buffers AND ensure no paging of VM memory AND whatever else happens on the box
    • echo “0” > /proc/sys/vm/swappiness
    • mlock()/mlockall() github.com/LucidWorks/mlockall-agent
    • Mercy from the OOM Killer echo “-17” > /proc/$PID/oom_adj
    • Huge Pages
    • -XX:+AlwaysPreTouch
    • Possible Future Directions
    • Many small VMs instead of one large VM microsharding
    • In-memory Lucene codecs I.e. custom DirectPostingsFormat
    • Off-heap memory with sun.misc.Unsafe?
    • Try G1 again
    • Try C4 again
    • Resources
    • gchandbook.org
    • Mark Miller’s GC Bootcamp bit.ly/mmgcb
    • Gil Tene: Understanding Java Garbage Collection bit.ly/giltene
    • Ulrich Drepper: What Every Programmer Should Know About Memory bit.ly/cpumemory
    • github.com/pingtimeout/jvm-options
    • Read the JVM Source (Not as scary as it sounds.) hg.openjdk.java.net/jdk7/jdk7
    • Mechanical Sympathy Google Group bit.ly/mechsym
    • Thanks for coming! Questions? gregg@etsy.com