Your SlideShare is downloading. ×
0
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Azul yandexjune010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Azul yandexjune010

2,457

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,457
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • © 2007 Azul Systems, Inc. | Confidential
  • GIL: So here’s how we’re going to cover this talk: We’ll start with some background and definitions. We’ll then dive into inter-play between metrics, and what things are sensitive to. We’ll then discuss a couple of commercially available GC examples. And for the main takeways, we’ll share observations and recommendations about measuring GC and it’s effects.
  • © 2007 Azul Systems, Inc. | Confidential
  • © 2007 Azul Systems, Inc. | Confidential
  • © 2007 Azul Systems, Inc. | Confidential
  • © 2008 Azul Systems, Inc.
  • © 2008 Azul Systems, Inc.
  • © 2007 Azul Systems, Inc. | Confidential
  • © 2007 Azul Systems, Inc. | Confidential
  • © 2007 Azul Systems, Inc. | Confidential
  • © 2007 Azul Systems, Inc. | Confidential
  • Transcript

    • 1. Gil Tene CTO & co-founder, Azul Systems June 26, 2010 Azul Tech Talk Yandex, June 2010
    • 2. Agenda <ul><li>Background </li></ul><ul><li>Memory stagnation </li></ul><ul><li>Java & Virtualization </li></ul><ul><li>[Concurrent] Garbage Collection deep-dive </li></ul><ul><li>Some [more] Azul Zing Platform details </li></ul><ul><li>Talk-Talk / Q & A </li></ul>
    • 3. Gil Tene CTO & co-founder, Azul Systems June 26, 2010 Memory Stagnation
    • 4. Gil Tene CTO & co-founder, Azul Systems June 26, 2010 2GB: the new 640K
    • 5. Memory. <ul><li> How many of you use heap sizes: </li></ul><ul><li> Larger than ½ GB? </li></ul><ul><li> Larger than 1 GB? </li></ul><ul><li> Larger than 2 GB? </li></ul><ul><li> Larger than 4 GB? </li></ul><ul><li> Larger than 10 GB? </li></ul><ul><li> Larger than 20 GB? </li></ul><ul><li> Larger than 100 GB? </li></ul>
    • 6. Problem statement: Memory footprint has stagnated <ul><li>Individual Instances with ~1-2GB of heap were commonplace in 2001 </li></ul><ul><ul><li>Some were still smaller </li></ul></ul><ul><ul><li>Very few were larger </li></ul></ul><ul><li>Individual instances with ~1-2GB dominate in 2010 </li></ul><ul><ul><li>Very few are smaller </li></ul></ul><ul><ul><li>Relatively few are larger </li></ul></ul><ul><li>Could it really be that all applications have the same memory size needs? </li></ul><ul><li>The practical size of an individual Java heap has not moved in ~9 years. </li></ul>
    • 7. Why ~2GB? – It’s all about GC (and only GC) <ul><li>Seems to be the practical limit for responsive applications </li></ul><ul><li>A 100GB heap won’t crash. It just periodically “pauses” for many minutes at a time. </li></ul><ul><li>[Virtually] All current commercial JVMs will exhibit a periodic multi-second pause on a normally utilized 2GB heap. </li></ul><ul><ul><li>It’s a question of “When”, not “If”. </li></ul></ul><ul><ul><li>GC Tuning only moves the “when” and the “how often” around </li></ul></ul><ul><li>“ Compaction is done with the application paused. However, it is a necessary evil, because without it, the heap will be useless… ” (JRockit RT tuning guide). </li></ul>
    • 8. Maybe 2GB is simply enough? <ul><li>We hope not </li></ul><ul><li>Plenty of evidence to support pent up need for more heap </li></ul><ul><li>Common use of lateral scale across machines </li></ul><ul><li>Common use of “lateral scale” within machines </li></ul><ul><li>Use of “external” memory with growing data sets </li></ul><ul><ul><li>Databases certainly keep growing </li></ul></ul><ul><ul><li>External data caches (memcache, Jcache, JavaSpaces) </li></ul></ul><ul><li>Continuous work on the never ending distribution problem </li></ul><ul><ul><li>More and more reinvention of NUMA </li></ul></ul><ul><ul><li>Bring data to compute, bring compute to data </li></ul></ul>
    • 9. Will distributed solutions “solve” the problem? <ul><li>Distributed data solutions are [only] used to solve problems that can’t be solved without distribution </li></ul><ul><ul><li>Extra complexity & loss of efficiency only justified by necessity </li></ul></ul><ul><li>It’s always because something doesn’t fit in one simple symmetric UMA process model </li></ul><ul><ul><li>When we need more compute power than one node has </li></ul></ul><ul><ul><li>When we need more memory state than one node can hold </li></ul></ul><ul><li>Distributed solutions are not used to solve tiny problems </li></ul><ul><li>“ Tiny” is not a matter of opinion, it’s a matter of time </li></ul><ul><ul><li>“ Tiny” gets ~100x bigger every decade </li></ul></ul>
    • 10. “ Tiny” application history 100KB apps on a ¼ to ½ MB Server 10MB apps on a 32 – 64 MB server 1GB apps on a 2 – 4 GB server ??? GB apps on 256 GB <ul><li>Moore’s Law: transistor counts grow </li></ul><ul><li>2x every 18 mouths </li></ul><ul><li>~100x every 10 yrs </li></ul>1980 1990 2000 2010 100GB Zing VM
    • 11. 2GB is the new 640K This is getting embarrassing…. <ul><li>We do strange things within servers now… </li></ul><ul><li>Java runtimes “misbehave” above ~2GB of memory </li></ul><ul><ul><li>Most people won’t tolerate 20 second pauses </li></ul></ul><ul><li>It takes 50 JVM instances to fill up a ~$10K, ~100GB server </li></ul><ul><ul><li>A 256GB server can now be bought for ~$18K (mid 2010) </li></ul></ul><ul><li>Using distributed SW solutions within a single commodity server </li></ul><ul><ul><li>Similar to the EMS tricks Windows 3.1 used to deal with the 640KB cap </li></ul></ul><ul><ul><li>Looks a lot like Yak shaving </li></ul></ul><ul><li>The problem is in the software stack </li></ul><ul><ul><li>Artificial constraints on memory per instance </li></ul></ul><ul><ul><li>GC Pause time is the only limiting factor for instance size </li></ul></ul><ul><ul><li>Can’t just “tune it away” </li></ul></ul><ul><li>Solve GC, and you’ve solved the problem </li></ul>
    • 12. Gil Tene CTO & co-founder, Azul Systems June 26, 2010 The Zing Platform Virtualization++ for Java
    • 13. Java & Virtualization <ul><li> How many of you use virtualization? </li></ul><ul><ul><li>i.e. KVM, VMWare, Xen, desktop virtualization (Fusion, Parallels, VirtualBox, etc.) </li></ul></ul><ul><li> How many of you use it for production applications? </li></ul><ul><li> How many of you think that virtualization will </li></ul><ul><ul><li>make your application run faster? </li></ul></ul>
    • 14. The Virtualization Tax <ul><li> Virtualization is universally considered a “tax” </li></ul><ul><li> Typical focus is on measuring and reducing overhead </li></ul><ul><li> Everyone hopes to get to “virtually the same </li></ul><ul><ul><li>as non-virtualized” performance characteristics </li></ul></ul><ul><li> But we can do so much better…. </li></ul><ul><li>What If virtualization made Applications Better ? </li></ul>
    • 15. Virtualization with Application benefits <ul><li>Improve response times: </li></ul><ul><li> Increase Transaction rates: </li></ul><ul><li> Increase Concurrent users: </li></ul><ul><li> Forget about GC pauses: </li></ul><ul><li> Eliminate daily restarts: </li></ul><ul><li> Elastically grow during peaks: </li></ul><ul><li> Elastically shrink when idle: </li></ul><ul><li> Gain production visibility: </li></ul>If you want to:… Use Zing & Virtualization Use Zing & Virtualization Use Zing & virtualization Use Zing & virtualization Use Zing & virtualization Use Zing & virtualization Use Zing & virtualization Use Zing & virtualization
    • 16. Java Virtualization New foundation for elastic, scalable Java deployments <ul><li> Virtualize the Java runtime….. </li></ul><ul><ul><li>Liberate Java from the OS, optimizing runtime execution </li></ul></ul><ul><ul><li> Allow highly effective use of available resources </li></ul></ul><ul><ul><ul><li>100x better scale, throughput and responsiveness, improving user experience </li></ul></ul></ul><ul><ul><li> Elastically scale applications </li></ul></ul><ul><ul><ul><li>Smoothly scale resources up/down based on real-time demand, improving scalability, efficiency and resiliency </li></ul></ul></ul><ul><ul><li> Simplify deployment configuration </li></ul></ul><ul><ul><ul><li>Reduce instances count, improve management and visibility </li></ul></ul></ul>
    • 17. Azul Java Virtualization Liberating Java from the rigidities of the OS Java Virtualization Java App OS Layer x86 Server
    • 18. Zing™ Platform Components Zing Virtual Appliances Elastic, Scalable Capacity Zing Virtual Machine Transparent Virtualization Zing Resource Controller Management and Monitoring Zing Vision Built-in App Profiling
    • 19. Zing™ Platform Components Zing Virtual Appliances Elastic, Scalable Capacity Zing Virtual Machine Transparent Virtualization Zing Resource Controller Management and Monitoring Zing Vision Built-in App Profiling
    • 20. Zing Elastic Deployments Virtualizing Java workloads in the Cloud Zing Resource Controller Plug-In Hypervisor Hypervisor Hypervisor Linux Windows ZVA App ZVA Apps ZVA App ZVA App Zing Virtual Appliance App ZVA Apps ZVA App ZVA App ZVA App Resources are utilized by the Zing virtual appliances, not the OSs
    • 21. Virtues of the Zing Elastic Platform Making virtualization the best environment for Java <ul><li>Optimized Runtime Platform </li></ul><ul><ul><li>More effective use of resources (dozens of cores, 100s of GBs) </li></ul></ul><ul><ul><li>Scales smoothly over a wide range (from 1 GB to 1 TB) </li></ul></ul><ul><ul><li>Greater stability, resiliency and operating range </li></ul></ul><ul><li>Record-breaking Scalability </li></ul><ul><ul><li>Completely eliminates GC related barriers </li></ul></ul><ul><ul><li>Practical support for 100x larger heaps (e.g. 200-500+ GBs) </li></ul></ul><ul><ul><li>Sustain 100x higher throughput and allocation rates </li></ul></ul><ul><li>Simplified Java App Deployments </li></ul><ul><ul><li>Better app stability with fewer, more robust JVMs </li></ul></ul><ul><ul><li>Zero-overhead runtime visibility </li></ul></ul><ul><ul><li>Application-aware resource control </li></ul></ul>
    • 22. Building on a Heritage of Elastic Runtimes Proven, vertically integrated execution stack <ul><li>Up to 864 cores </li></ul><ul><li>Heaps up to 640 GB </li></ul>Azul Vega™ 3 Compute Appliance Hardware Java Runtime OS Kernel Virtualization
    • 23. Benefits of the Zing Elastic Platform <ul><li>Business Implications </li></ul><ul><li>Consistently fast Java application response times </li></ul><ul><li>Improved customer experience and loyalty </li></ul><ul><li>Faster time to market </li></ul><ul><li>Greater application availability, even during peaks </li></ul><ul><li>Room for growth, delivered in robust and cost effective manner </li></ul><ul><li>Lower costs through simplified deployments and virtualization and cloud enablement </li></ul><ul><li>IT Implications </li></ul><ul><li>100x improvements in key response time and throughput metrics </li></ul><ul><li>Accelerate virtualization and cloud adoption </li></ul><ul><li>Increased resiliency and efficiency for all Java applications through dynamic resource sharing </li></ul><ul><li>Simplified deployments through instance consolidation </li></ul><ul><li>Unmatched production-time visibility and management </li></ul><ul><li>Fast ROI </li></ul>
    • 24. Now you can: <ul><li> Improve response times: with Zing & virtualization! </li></ul><ul><li> Increase Transaction rates: with Zing & virtualization! </li></ul><ul><li> Increase Concurrent users: with Zing & virtualization! </li></ul><ul><li> Forget about GC pauses: with Zing & virtualization! </li></ul><ul><li> Eliminate daily restarts: with Zing & virtualization! </li></ul><ul><li> Elastically grow during peaks: with Zing & virtualization! </li></ul><ul><li> Elastically shrink when idle: with Zing & virtualization! </li></ul><ul><li> Gain production visibility: with Zing & virtualization! </li></ul><ul><ul><ul><li>Smoothly scale resources up/down based on real-time demand, improving scalability, efficiency and resiliency </li></ul></ul></ul><ul><ul><li> Allow highly effective use of available resources </li></ul></ul><ul><ul><ul><li>100x better scale, throughput and responsiveness, improving user experience </li></ul></ul></ul><ul><ul><li> Simplify deployment configuration </li></ul></ul><ul><ul><ul><li>Reduce instances count, improve management and visibility </li></ul></ul></ul>
    • 25. Thank You
    • 26. Gil Tene CTO, Azul Systems Performance Considerations in Concurrent Garbage Collected Systems
    • 27. <ul><ul><li>1. Understand why concurrent garbage collection is a necessity </li></ul></ul><ul><ul><li>2. Gain an understanding of performance considerations specific to concurrent garbage collection </li></ul></ul><ul><ul><li>3. Understand what concurrent collectors are sensitive to, and what can cause them to fail </li></ul></ul><ul><ul><li>4. Learn how [not] to measure GC in a lab </li></ul></ul>
    • 28. What is a concurrent collector? A Concurrent Collector performs garbage collection work concurrently with the application’s own execution A Parallel Collector uses multiple CPUs to perform garbage collection
    • 29. About the speaker Gil Tene (CTO), Azul Systems <ul><li>We deal with concurrent GC on a daily basis </li></ul><ul><li>Azul makes Java scalable thru virtualization </li></ul><ul><ul><li>We make physical (Vega™) and Virtual (Zing™) appliances </li></ul></ul><ul><ul><li>Our appliances power JVMs on Linux, Solaris, AIX, HPUX, … </li></ul></ul><ul><ul><li>Production installations ranging from 1GB to 300GB+ of heap </li></ul></ul><ul><ul><li>Zing VM instances smoothly scale to 100s of GB, 10s of cores </li></ul></ul><ul><li>Concurrent GC has always been a must in our space </li></ul><ul><ul><li>It’s now a must in everyone’s space - can’t scale without it </li></ul></ul><ul><li>Focused on concurrent GC for the past 8 years </li></ul><ul><ul><li>Azul’s GPGC designed for robustness, low sensitivity </li></ul></ul>
    • 30. Why use a concurrent collector? Why not stop-the-world? <ul><li>Because pause times break your SLAs </li></ul><ul><li>Because you need to grow your heap size </li></ul><ul><li>Because your application needs to scale </li></ul><ul><li>Because you can’t predict everything exactly </li></ul><ul><li>Because you live in the real world… </li></ul>
    • 31. Agenda <ul><li>Background </li></ul><ul><li>Failure & Sensitivity </li></ul><ul><li>Terminology & Metrics </li></ul><ul><li>Detail and inter-relations of key metrics </li></ul><ul><li>Collector mechanism examples </li></ul><ul><li>Recommendations for measurements </li></ul><ul><li>Q & A </li></ul>
    • 32. Why we really need concurrent collectors Software is unable to fill up hardware effectively <ul><li>2000: </li></ul><ul><ul><li>A 512MB-1GB heap was “large” </li></ul></ul><ul><ul><li>A 1-2GB commodity server was “large” </li></ul></ul><ul><ul><li>A 2 core commodity server was “large” </li></ul></ul><ul><li>2010: </li></ul><ul><ul><li>A 2GB heap is “large” </li></ul></ul><ul><ul><li>A 128-256GB commodity server is “medium” </li></ul></ul><ul><ul><li>An 24-48 core commodity server is “medium” </li></ul></ul><ul><li>The gap started opening in the late 1990s </li></ul><ul><li>The root cause is Garbage Collection Pauses </li></ul>
    • 33. Agenda <ul><li>Background </li></ul><ul><li>Failure & Sensitivity </li></ul><ul><li>Terminology & Metrics </li></ul><ul><li>Detail and inter-relations of key metrics </li></ul><ul><li>Collector mechanism examples </li></ul><ul><li>Recommendations for measurements </li></ul><ul><li>Q & A </li></ul>
    • 34. What constitutes “failure” for a collector? It’s not just about correctness any more <ul><li>A Stop-The-World collector fails if it gets it wrong… </li></ul><ul><li>A concurrent collector [also] fails if it stops the application for longer than requirements permit </li></ul><ul><ul><li>“ Occasional pauses” longer than SLA allows are real failures </li></ul></ul><ul><ul><ul><li>Even if the Application Instance or JVM didn’t crash </li></ul></ul></ul><ul><ul><li>Otherwise, you would have used a STW collector to begin with </li></ul></ul><ul><li>Simple example: Clustering </li></ul><ul><ul><ul><li>Node failover must occur in X seconds or less </li></ul></ul></ul><ul><ul><ul><li>A GC pause longer than X will trigger failover. It’s a fault. </li></ul></ul></ul><ul><ul><ul><li>( If you don’t think so, ask the guy whose pager just went off… ) </li></ul></ul></ul>
    • 35. Concurrent collectors can be sensitive Go out of the smooth operating range, and you’ll pause <ul><li>Correctness now includes response time </li></ul><ul><li>Just because it didn’t pause under load X, doesn’t mean it won’t pause under load Y </li></ul><ul><li>Outside of the smooth operating range: </li></ul><ul><ul><li>More state (with no additional load) can cause a pause </li></ul></ul><ul><ul><li>More load (with no additional state) can cause a pause </li></ul></ul><ul><ul><li>Different use patterns can cause a pause </li></ul></ul><ul><li>Understand/Characterize your smooth operating range </li></ul>
    • 36. Agenda <ul><li>Background </li></ul><ul><li>Failure & Sensitivity </li></ul><ul><li>Terminology & Metrics </li></ul><ul><li>Detail and inter-relations of key metrics </li></ul><ul><li>Collector mechanism examples </li></ul><ul><li>Recommendations for measurements </li></ul><ul><li>Q & A </li></ul>
    • 37. Terminology Useful terms for discussing concurrent collection <ul><li>Mutator </li></ul><ul><ul><li>Your program… </li></ul></ul><ul><li>Parallel </li></ul><ul><ul><li>Can use multiple CPUs </li></ul></ul><ul><li>Concurrent </li></ul><ul><ul><li>Runs concurrently with program </li></ul></ul><ul><li>Pause time </li></ul><ul><ul><li>Time during which mutator is not running any code </li></ul></ul><ul><li>Generational </li></ul><ul><ul><li>Collects young objects and long lived objects separately. </li></ul></ul><ul><li>Promotion </li></ul><ul><ul><li>Allocation into old generation </li></ul></ul><ul><li>Marking </li></ul><ul><ul><li>Finding all live objects </li></ul></ul><ul><li>Sweeping </li></ul><ul><ul><li>Locating the dead objects </li></ul></ul><ul><li>Compaction </li></ul><ul><ul><li>Defragments heap </li></ul></ul><ul><ul><li>Moves objects in memory </li></ul></ul><ul><ul><li>Remaps all affected references </li></ul></ul><ul><ul><li>Frees contiguous memory regions </li></ul></ul>
    • 38. Metrics Useful metrics for discussing concurrent collection <ul><li>Heap population (aka Live set) </li></ul><ul><ul><li>How much of your heap is alive </li></ul></ul><ul><li>Allocation rate </li></ul><ul><ul><li>How fast you allocate </li></ul></ul><ul><li>Mutation rate </li></ul><ul><ul><li>How fast your program updates references in memory </li></ul></ul><ul><li>Heap Shape </li></ul><ul><ul><li>The shape of the live object graph </li></ul></ul><ul><ul><li>* Hard to quantify as a metric... </li></ul></ul><ul><li>Object Lifetime </li></ul><ul><ul><li>How long objects live </li></ul></ul><ul><li>Cycle time </li></ul><ul><ul><li>How long it takes the collector to free up memory </li></ul></ul><ul><li>Marking time </li></ul><ul><ul><li>How long it takes the collector to find all live objects </li></ul></ul><ul><li>Sweep time </li></ul><ul><ul><li>How long it takes to locate dead objects </li></ul></ul><ul><ul><li>* Relevant for Mark-Sweep </li></ul></ul><ul><li>Compaction time </li></ul><ul><ul><li>How long it takes to free up memory by relocating objects </li></ul></ul><ul><ul><li>* Relevant for Mark-Compact </li></ul></ul>
    • 39. Agenda <ul><li>Background </li></ul><ul><li>Failure & Sensitivity </li></ul><ul><li>Terminology & Metrics </li></ul><ul><li>Detail and inter-relations of key metrics </li></ul><ul><li>Collector mechanism examples </li></ul><ul><li>Recommendations for measurements </li></ul><ul><li>Q & A </li></ul>
    • 40. Cycle Time How long until we can have some more free memory? <ul><li>Heap Population (Live Set) matters </li></ul><ul><ul><li>The more objects there are to paint, the longer it takes </li></ul></ul><ul><li>Heap Shape matters </li></ul><ul><ul><li>Affects how well a parallel marker will do </li></ul></ul><ul><ul><li>One long linked list is the worst case of most markers </li></ul></ul><ul><li>How many passes matters </li></ul><ul><ul><li>A multi-pass marker revisits references modified in each pass </li></ul></ul><ul><ul><li>Marking time can therefore vary significantly with load </li></ul></ul>
    • 41. Heap Population (Live Set) It’s not as simple as you might think… <ul><li>In a Stop-The-World situation, this is simple </li></ul><ul><ul><li>Start with the “roots” and paint the world </li></ul></ul><ul><ul><li>Only things you have actual references to are alive </li></ul></ul><ul><li>When mutator runs concurrently with GC: </li></ul><ul><ul><li>Not a “snapshot” of a single program state </li></ul></ul><ul><ul><li>Objects allocated during GC cycle are considered “live” </li></ul></ul><ul><ul><li>Objects that die after GC starts may be considered “live” </li></ul></ul><ul><ul><li>Weak references “strengthened” during GC… </li></ul></ul><ul><li>So assume: </li></ul><ul><ul><li>Live_Set >= STW_live_set + (Allocation_Rate * Cycle_time) </li></ul></ul>
    • 42. Mutation rate Does your program do any real work? <ul><li>Mutation rate is generally linear to work performed </li></ul><ul><ul><li>The higher the load, the higher the mutation rate </li></ul></ul><ul><li>A multi-pass marker can be sensitive to mutation: </li></ul><ul><ul><li>Revisits references modified in each pass </li></ul></ul><ul><ul><li>Higher mutation rate  longer cycle times </li></ul></ul><ul><ul><li>Can reach a point where marker cannot keep up with mutator </li></ul></ul><ul><ul><li>e.g. one marking thread vs.15 mutator threads </li></ul></ul><ul><li>Some common use patterns have high mutation rates </li></ul><ul><ul><li>e.g. LRU cache </li></ul></ul>
    • 43. Object lifetime Objects are active in the Old Generation <ul><li>Most allocated objects do die young </li></ul><ul><ul><li>So generational collection is an effective filter </li></ul></ul><ul><li>However, most live objects are old </li></ul><ul><ul><li>You’re not just making all those objects up every cycle… </li></ul></ul><ul><li>Large heaps tend to see real churn & real mutation </li></ul><ul><ul><li>e.g. caching is a very common use pattern for large memory </li></ul></ul><ul><li>OldGen is under constant pressure in the real world </li></ul><ul><ul><li>Unlike some/most benchmarks (e.g. SPECjbb) </li></ul></ul>
    • 44. Generational Assumption When you are forced to collect live young things <ul><li>A lot of state dies when transactions are completed </li></ul><ul><ul><li>Transactions typically take some minimum amount of time </li></ul></ul><ul><li>Load usually shows up as concurrently active state </li></ul><ul><ul><li>More concurrent users & transactions – more state </li></ul></ul><ul><li>Higher load always generates garbage faster </li></ul><ul><ul><li>NewGen collections happens more often as load grows </li></ul></ul><ul><li>At some load point NewGen becomes very expensive </li></ul><ul><ul><li>When NewGen GC cycles faster than transactions complete </li></ul></ul><ul><ul><li>pauses significantly longer if it uses a STW mechanism </li></ul></ul>
    • 45. Major things that happen in a pause The non-concurrent parts of “mostly concurrent” <ul><li>If collector does Reference processing in a pause </li></ul><ul><ul><li>Weak, Soft, Final ref traversal </li></ul></ul><ul><ul><li>Pause length depends on # of refs. </li></ul></ul><ul><ul><li>Sensitive to common use cases of weak refs </li></ul></ul><ul><ul><ul><li>e.g. LRU & multi-index cache patterns </li></ul></ul></ul><ul><li>If the collector marks mutated refs in a pause </li></ul><ul><ul><li>Pause length depends on mutation rate </li></ul></ul><ul><ul><li>Sensitive to load </li></ul></ul><ul><li>If the collector performs compaction in a pause… </li></ul>
    • 46. Fragmentation & Compaction You can’t delay it forever <ul><li>Fragmentation *will* happen </li></ul><ul><ul><li>Compaction can be delayed, but not avoided </li></ul></ul><ul><ul><li>“ Compaction is done with the application paused. However, it is a necessary evil, because without it, the heap will be useless… ” (JRockit RT tuning guide). </li></ul></ul><ul><li>If Compaction is done as a stop-the-world pause </li></ul><ul><ul><li>It will generally be your worst case pause </li></ul></ul><ul><ul><li>It is a likely failure of concurrent collection </li></ul></ul><ul><li>Measurements without compaction are meaningless </li></ul><ul><ul><li>Unless you can prove that compaction won’t happen (Good luck with that) </li></ul></ul>
    • 47. More things that may happen in a pause More “mostly concurrent” secrets <ul><li>When collector does Code & Class things in a pause </li></ul><ul><ul><li>Class unloading, Code cache cleaning, System Dictionary, etc. </li></ul></ul><ul><ul><li>Can depend on class and code churn rates </li></ul></ul><ul><ul><li>Becomes a real problem if full collection is required (PermGen) </li></ul></ul><ul><li>GC/Mutator Synchronization, Safe Points </li></ul><ul><ul><li>Can depend on time-to-safepoint affecting runtime artifacts: </li></ul></ul><ul><ul><ul><li>Long running no-safepoint loops (some optimizers do this). </li></ul></ul></ul><ul><ul><ul><li>Huge object cloning, allocation (some runtimes won’t break it up). </li></ul></ul></ul><ul><li>Stack scanning (look for refs in mutator stacks) </li></ul><ul><ul><li>Can depend on # of threads and stack depths </li></ul></ul>
    • 48. Agenda <ul><li>Background </li></ul><ul><li>Failure & Sensitivity </li></ul><ul><li>Terminology & Metrics </li></ul><ul><li>Detail and inter-relations of key metrics </li></ul><ul><li>Collector mechanism examples </li></ul><ul><li>Recommendations for measurements </li></ul><ul><li>Q & A </li></ul>
    • 49. HotSpot CMS Collector mechanism examples <ul><li>Stop-the-world compacting new gen (ParNew) </li></ul><ul><li>Mostly Concurrent, non-compacting old gen (CMS) </li></ul><ul><ul><li>Mostly Concurrent marking </li></ul></ul><ul><ul><ul><li>Mark concurrently while mutator is running </li></ul></ul></ul><ul><ul><ul><li>Track mutations in card marks </li></ul></ul></ul><ul><ul><ul><li>Revisit mutated cards (repeat as needed) </li></ul></ul></ul><ul><ul><ul><li>Stop-the-world to catch up on mutations, ref processing, etc. </li></ul></ul></ul><ul><ul><li>Concurrent Sweeping </li></ul></ul><ul><ul><li>Does not Compact (maintains free list, does not move objects) </li></ul></ul><ul><li>Fallback to Full Collection (Stop the world, serial). </li></ul><ul><ul><ul><li>Used for Compaction, etc. </li></ul></ul></ul>
    • 50. Azul GPGC Collector mechanism examples <ul><li>Concurrent, compacting new generation </li></ul><ul><li>Concurrent, compacting old generation </li></ul><ul><li>Concurrent guaranteed-single-pass marker </li></ul><ul><ul><li>Oblivious to mutation rate </li></ul></ul><ul><ul><li>Concurrent ref (weak, soft, final) processing </li></ul></ul><ul><li>Concurrent Compactor </li></ul><ul><ul><li>Objects moved without stopping mutator </li></ul></ul><ul><ul><li>Can relocate entire generation (New, Old) in every GC cycle </li></ul></ul><ul><li>No Stop-the-world fallback </li></ul><ul><ul><li>Always compacts, and does so concurrently </li></ul></ul>
    • 51. Agenda <ul><li>Background </li></ul><ul><li>Failure & Sensitivity </li></ul><ul><li>Terminology & Metrics </li></ul><ul><li>Detail and inter-relations of key metrics </li></ul><ul><li>Collector mechanism examples </li></ul><ul><li>Recommendations for measurements </li></ul><ul><li>Q & A </li></ul>
    • 52. Measurement Recommendations When you are actually interested in the results… <ul><li>Measure application – not synthetic tests </li></ul><ul><ul><li>Garbage in, Garbage out </li></ul></ul><ul><li>Avoid the urge to tune GC out of the testing window </li></ul><ul><ul><li>You’re only fooling yourself </li></ul></ul><ul><ul><li>Your application needs to run for more than 20 minutes, right? </li></ul></ul><ul><ul><li>Most industry benchmarks are tuned to avoid GC during test  </li></ul></ul><ul><li>Rule of Thumb: </li></ul><ul><ul><li>You should see 5+ of the “bad” GCs during test period </li></ul></ul><ul><ul><li>Otherwise, you simply did not test real behavior </li></ul></ul><ul><ul><li>Test until you can show it’s stable (e.g. What if it trends up?) </li></ul></ul><ul><ul><li>Believe your application, not -verbosegc </li></ul></ul>
    • 53. Don’t ignore “bad” GC Compaction? What Compaction?
    • 54. Measurement Techniques Make reality happen <ul><li>Aim for 20-30 minute “stable load” tests </li></ul><ul><ul><li>If test is longer, you won’t do it enough times to get good data </li></ul></ul><ul><ul><li>Don’t “ramp” load during test period – it will defeat the purpose </li></ul></ul><ul><ul><li>We want to see several days worth of GC in 20-30 minutes </li></ul></ul><ul><li>Add low-load noise to trigger “real” GC behavior </li></ul><ul><ul><li>Don’t go overboard </li></ul></ul><ul><ul><li>A moderately churning large LRU cache can often do the trick </li></ul></ul><ul><ul><li>A gentle heap fragmentation inducer is a sure bet </li></ul></ul><ul><ul><li>Can easily be added orthogonally to application activity </li></ul></ul><ul><ul><li>See Azul’s “Fragger” example ( http://e2e.azulsystems.com ) </li></ul></ul>
    • 55. Establish smooth operating range Know where it works, and know where it doesn’t… <ul><li>Test main metrics for sensitivity </li></ul><ul><li>Stress Heap population, allocation, mutation, etc. </li></ul><ul><li>Add artificial load-linear stress if needed </li></ul><ul><ul><li>E.g. Increase allocation and mutation per transaction </li></ul></ul><ul><ul><li>E.g. Increase state per session, increase static state </li></ul></ul><ul><ul><li>E.g. Increase session length in time </li></ul></ul><ul><ul><li>Drive load with artificially enhanced GC stress </li></ul></ul><ul><ul><li>Keep increasing until you find out where GC breaks SLA in test </li></ul></ul><ul><ul><li>Then back off and test for stability </li></ul></ul>
    • 56. Summary Know where the cliff is, then stay away from the edge… <ul><li>Sensitivity is key </li></ul><ul><ul><li>If it fails, it will be without warning </li></ul></ul><ul><li>Know where you stand on key measurable metrics </li></ul><ul><ul><li>Application driven: Live Set, Allocation rate, Heap size </li></ul></ul><ul><ul><li>GC driven: Cycle times, Compaction Time, Pause times </li></ul></ul><ul><li>Deal with robustness first, and only then with efficiency </li></ul><ul><ul><li>Efficient and 2% away from failure is not a good thing </li></ul></ul><ul><li>Establish your envelope </li></ul><ul><ul><li>Only then will you know how safe (or unsafe) you are </li></ul></ul><ul><ul><li>http://e2e.azulsystems.com </li></ul></ul>
    • 57. Gil Tene CTO, Azul Systems Q & A Remember: Zing Announcement JavaOne Ticket drawing 13:45 @“Double” Conf. room
    • 58. Gil Tene CTO, Azul Systems Thank You Performance Considerations in Concurrent Garbage Collected Systems

    ×