Java Middleware Surgery
Andy Overton &
Mike Croft
Expert Support Team
© C2B2 Consulting Limited 2013
All Rights Reserved
Introduction
• Going to look at 2 scenarios or problems
• One related to issues with server slowdown
and Out Of Memory err...
Scenario 1
• A customer has to restart their servers
regularly as they slow down and become
unresponsive and they see
OutO...
Out Of Memory Errors
• Two types
– Catastrophic – Rapid rise in memory usage, OOME
occurs and server crashes. Often daily....
What to do?
•
•
•
•

Gather information
Analyse the information
Diagnose issues
Resolve the issues

© C2B2 Consulting Limi...
Information Gathering
•
•
•
•
•

Verbose GC output
Heap dumps
Server Logs
Stack Traces
Details of system changes

© C2B2 C...
Gathering verbose GC data
-verbose:gc
-Xloggc:path_to_log/gc.log
-XX:+PrintGCDetails - causes additional
information about...
Gathering Heap Dump data
• Make sure the JVM is set to provide a
heapdump on OutOfMemory errors
• This is not a default se...
Gathering Heap Dump Data manually
• Get the process ID of the running server:
jps – l
• You should see something similar t...
Gathering stack trace data
• Again, retrieve the process id using jps
• Basic command for getting a stack trace and
output...
Analysing the data – GC Logs
• The GC logs will show details of all Garbage
Collection since the server started
• The file...
GCViewer - Standard Behaviour

© C2B2 Consulting Limited 2013
All Rights Reserved
GCViewer - Heap Exhaustion

© C2B2 Consulting Limited 2013
All Rights Reserved
Analysing the data – Heap Dump
• A heap dump contains information about all
Java objects alive at a given point in time
• ...
Eclipse MAT - Overview

© C2B2 Consulting Limited 2013
All Rights Reserved
MAT – Histogram View

© C2B2 Consulting Limited 2013
All Rights Reserved
MAT – Dominator View

© C2B2 Consulting Limited 2013
All Rights Reserved
Analysing the data – Stack Trace
• Threadlogic
• Quickly understand the health levels and get
details about threads
• Thre...
Threadlogic – Summary View

© C2B2 Consulting Limited 2013
All Rights Reserved
Threadlogic – Advisory Map

© C2B2 Consulting Limited 2013
All Rights Reserved
Threadlogic – Details View

© C2B2 Consulting Limited 2013
All Rights Reserved
System changes
• Have you deployed any new applications to
the server?
• Any increased load to the system?
• Any updates t...
Prevention
• Audit all system changes and be prepared to
rollback if necessary
• Ensure you log everything if an OOME occu...
Problematic JMS
• Consuming messages from a remote queue
• Messages getting lost
• Network exceptions in logs

© C2B2 Cons...
Problematic JMS
• Do you care if messages get lost?
• Can the remote producer be trusted?
• How many (physical) network ho...
Problematic JMS
• Use a message bridge
– More reliable than you can code yourself

– Makes adding reliability much easier
...
Problematic JMS
• How complex is your scenario?
– Do you process single units of work over multiple
messages?
– Do you nee...
Problematic JMS
• Which provider should you use?
– Apache ActiveMQ

– Apache Camel
– WebLogic

© C2B2 Consulting Limited 2...
Upcoming SlideShare
Loading in …5
×

Java Middleware Surgery

2,839 views

Published on

During this half hour webinar, C2B2's Expert Middleware Support team reviews real world customer problems we see in production JBoss, WebLogic, Tomcat or GlassFish environments. This overview will help you understand how to deal with server slowdown issues, out of memory errors, problematic JMS and more.

If you have any questions to our support team, please don't hesitate to contact us - please send your questions and feedback to webinar@c2b2.co.uk.

Please note that we can only answer the questions related to the following products: http://www.c2b2.co.uk/supported_products

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,839
On SlideShare
0
From Embeds
0
Number of Embeds
1,653
Actions
Shares
0
Downloads
24
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Hi, My name is Andy Overton. I head up the Expert Support team at C2B2 and alongside me is my colleague Mike Croft who is also a member of the Expert support team.We've had a number of questions but many of them have been very specific regarding a particular application. Therefore we've decided to look at a couple of scenarios which will hopefully be more relevant to everybody and will cover a number of the issues people have raised.
  • This is a fairly common error or issue that we see.
  • Catastrophic - where we get a rapid rise in memory usage, an OOME occurs and the server keels over.Long running - where we see a gradual slowown over time (days) eventually causing an OOME. Can often be more difficult to detect.
  • When problems do occur, particularly for production systems, time taken to resolve the issue is always key, whether that means time taken to restore the affected system to a stable, consistent and usable state or time taken to do root cause analysis to determine the cause and preventative measures to put in place.You need to ensure you gather as much information as possible to help diagnose any issues. You then need to analyse that information and we’ll cover a number of tools that we use for doing so. Then, hopefully, from that analysis you should gain a better understanding of what is causing the issue. This should then enable you to come up with a solution in order to resolve the issue.
  • In an ideal world we'd have all the information needed when the error occurs that we can diagnose the issue and fix it without having to wait for it to happen again. Unfortunately we find that in most production systems the information needed when this kind of issue occurs is not there. The kind of info we are looking for:Verbose GC – Look at garbage collectionHeap Dumps – Allow us to see what is being stored in memoryServer Logs – So we can view any errors that have occurred that may contribute to the issueStack Traces – So we can see any long running threads, stuck threads or other threading issuesDetails of system changes – Generally, when problems occur it is shortly after changes were made to the system.
  • Verbose GC output is a very lightweight process which will make negligible difference to performance in almost all scenarios. Most of the work to produce the output will be done by the garbage collector anyway, so setting –verbose:gc will just save this data to a log file.In theory, memory leaks should not happen in Java because it has Garbage Collection (GC). However, GC only cleans up unused objects that are not referenced anymore. Therefore, if an object is not used, but is still referenced, GC does not remove it, which leads to memory leaks. Beside memory leaks, other memory problems that you might encounter are memory fragmentation, large objects, and tuning problems. In many cases, these memory problems can cause the application server to crash. Many users first notice that application server performance gradually declines, and eventually crashes with OutOfMemory exceptions.
  • You need to ensure that when an OOME occurs you take a heap dump. This is a snapshot of everything loaded into memory at the time of the crash. This is not done by default but is easy to do by adding a couple of JVM parameters.
  • You can also take a snapshot manually using jmap a tool that comes with the JVM
  • Best to take a series of snapshots, once per second for at least a minute when slowdown occurs. This allows you to look for trends over time such as long running threads, stuck threads or threads that are waiting.
  • Here we see two minor collections followed by one major collection. The numbers before and after the arrow indicate the combined size of live objects before and after garbage collection, respectively. The next number in parentheses (e.g., (776768K) again from the first line) is the committed size of the heap: the amount of space usable for java objects without requesting more memory from the operating system. The last item on the line (e.g., 0.2300771 secs) indicates the time taken to perform the collection; in this case approximately a quarter of a second.Whilst you can read them we would advise using a graphical tool such as GCViewer as this make it much easier to view what is happening over time.
  • The most important things to look at in the GCViewer analysis are the* Acc Pauses - Accumulated Pause Time (total time app was stopped for GC).Pauses are the times when an application appears unresponsive because garbage collection is occurring* Total Time - Total Time the application runs.* Throughput - Time the application runs and is not busy with GC. Greater than 99% is fantastic. Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time.* Footprint - Overall Memory Consumption - Ideally as low as possible. This is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability. Thus this usually reflects the size of total Heap allocated via Xms and XmxProbably the two most important lines in the graph are the red and the blue line. The red line indicates the total heap size, the blue line shows how much of the heap is actually used.
  • Here we can see that something is clearly amiss! We are seeing repeated full GC’s and yet the heap size is continuing to grow. This is a sure sign of some form of memory leak and a pre-cursor to heap exhaustion, which will generally be followed by a system crash.
  • A heap dump contains details of all objects on the heap at a given point in time. The files produced are not human readable and the best tool we find for analysing heap dumps is MAT.
  • Landing page of MAT report that provides access to multiple types of heap analysis
  • Select the histogram from the tool bar to list the number of instances per class, the shallow size and the retained size .Generally speaking, shallow heap of an object is its size in the heap and retained size of the same object is the amount of heap memory that will be freed when the object is garbage collected.
  • All classes with their fields/references. The dominator tree displays the biggest objects in the heap dump. You can expand each item to see other objects referenced by it. The next level of the tree lists those objects that would be garbage collected if all incoming references to the parent node were removed.The dominator tree is a powerful tool to investigate which objects keep which other objects aliveThis way you can nail down the biggest memory hog and also the class/object that holds a reference to it there by preventing it from being garbage collected.
  • Whilst the files produced are human readable, thread dump or stack trace analysis can be very complex. To make things simpler we advise using a tool such as Threadlogic. It can save a lot of time searching through multiple files trying to visualise what is going on.
  • This provides the Summary of the threads. We can see the grouping of the threads. It is based on Functionality or ThreadGroup.
  • We can see the advisory map of all the Threads. The Keyword provides the categorisation of the thread. We can also see a description of the thread and the Advice for the thread.
  • We can see the status, detailed Description and Advice of the selected thread.
  • We often find that when asked people will generally state that nothing has changed but generally when issues like this occur a change has taken place. However, the majority of problems that we see are caused by changes being made.
  • You should always keep an audit of all system changes no matter how minor they may seem.Make sure that logging is in place so that when errors like this do occur you have the data available without having to wait for another system crash.Monitoring tools are invaluable for keeping track of normal behaviour and being able to detect issues early, particularly as you can set up alerts to warn you of things such as large CPU usage or in our case memory usage.
  • Java Middleware Surgery

    1. 1. Java Middleware Surgery Andy Overton & Mike Croft Expert Support Team © C2B2 Consulting Limited 2013 All Rights Reserved
    2. 2. Introduction • Going to look at 2 scenarios or problems • One related to issues with server slowdown and Out Of Memory errors • One related to consuming JMS messages from a remote queue and messages dissapearing © C2B2 Consulting Limited 2013 All Rights Reserved
    3. 3. Scenario 1 • A customer has to restart their servers regularly as they slow down and become unresponsive and they see OutOfMemoryExceptions in the logs • Restarting the server fixes the problem © C2B2 Consulting Limited 2013 All Rights Reserved
    4. 4. Out Of Memory Errors • Two types – Catastrophic – Rapid rise in memory usage, OOME occurs and server crashes. Often daily. – Long running - Gradual slowdown over time (days) eventually causing an OOME. © C2B2 Consulting Limited 2013 All Rights Reserved
    5. 5. What to do? • • • • Gather information Analyse the information Diagnose issues Resolve the issues © C2B2 Consulting Limited 2013 All Rights Reserved
    6. 6. Information Gathering • • • • • Verbose GC output Heap dumps Server Logs Stack Traces Details of system changes © C2B2 Consulting Limited 2013 All Rights Reserved
    7. 7. Gathering verbose GC data -verbose:gc -Xloggc:path_to_log/gc.log -XX:+PrintGCDetails - causes additional information about the collections to be printed -XX:+PrintGCTimeStamps - will add a time stamp at the start of each collection. This is useful to see how frequently garbage collections occur © C2B2 Consulting Limited 2013 All Rights Reserved
    8. 8. Gathering Heap Dump data • Make sure the JVM is set to provide a heapdump on OutOfMemory errors • This is not a default setting on Sun’s JVM! • This can be done by adding the following JVM params: -XX:-HeapDumpOnOutOfMemoryError XX:HeapDumpPath=path_to_dump_files/java_pi d<pid>.hprof © C2B2 Consulting Limited 2013 All Rights Reserved
    9. 9. Gathering Heap Dump Data manually • Get the process ID of the running server: jps – l • You should see something similar to this: 3171 weblogic.Server -Xms256m -Xmx512m -XX:CompileThreshold=8000 XX:PermSize=128m ......... • Use jmap to take a snapshot jmap -dump:format=b,file=dump1.bin 3171 © C2B2 Consulting Limited 2013 All Rights Reserved
    10. 10. Gathering stack trace data • Again, retrieve the process id using jps • Basic command for getting a stack trace and outputting it to a file jstack -l <pid> > jstack-output.txt • Best to take a series of snapshots, once per second for at least a minute when slowdown occurs © C2B2 Consulting Limited 2013 All Rights Reserved
    11. 11. Analysing the data – GC Logs • The GC logs will show details of all Garbage Collection since the server started • The files are human readable • Example: [GC 325407K->83000K(776768K), 0.2300771 secs] [GC 325816K->83372K(776768K), 0.2454258 secs] [Full GC 267628K->83769K(776768K), 1.8479984 secs] © C2B2 Consulting Limited 2013 All Rights Reserved
    12. 12. GCViewer - Standard Behaviour © C2B2 Consulting Limited 2013 All Rights Reserved
    13. 13. GCViewer - Heap Exhaustion © C2B2 Consulting Limited 2013 All Rights Reserved
    14. 14. Analysing the data – Heap Dump • A heap dump contains information about all Java objects alive at a given point in time • Not human readable • Eclipse Memory Analyzer Tool • Helps in finding memory leaks and discovering which objects are taking up the most memory © C2B2 Consulting Limited 2013 All Rights Reserved
    15. 15. Eclipse MAT - Overview © C2B2 Consulting Limited 2013 All Rights Reserved
    16. 16. MAT – Histogram View © C2B2 Consulting Limited 2013 All Rights Reserved
    17. 17. MAT – Dominator View © C2B2 Consulting Limited 2013 All Rights Reserved
    18. 18. Analysing the data – Stack Trace • Threadlogic • Quickly understand the health levels and get details about threads • Thread groups help in bunching together related threads © C2B2 Consulting Limited 2013 All Rights Reserved
    19. 19. Threadlogic – Summary View © C2B2 Consulting Limited 2013 All Rights Reserved
    20. 20. Threadlogic – Advisory Map © C2B2 Consulting Limited 2013 All Rights Reserved
    21. 21. Threadlogic – Details View © C2B2 Consulting Limited 2013 All Rights Reserved
    22. 22. System changes • Have you deployed any new applications to the server? • Any increased load to the system? • Any updates to the system? • Are there any fixes or patches related to memory or performance that you are missing? © C2B2 Consulting Limited 2013 All Rights Reserved
    23. 23. Prevention • Audit all system changes and be prepared to rollback if necessary • Ensure you log everything if an OOME occurs • Use monitoring tools to monitor system behaviour and set up alerts so you’re forewarned of any anomalous behaviour © C2B2 Consulting Limited 2013 All Rights Reserved
    24. 24. Problematic JMS • Consuming messages from a remote queue • Messages getting lost • Network exceptions in logs © C2B2 Consulting Limited 2013 All Rights Reserved
    25. 25. Problematic JMS • Do you care if messages get lost? • Can the remote producer be trusted? • How many (physical) network hops? © C2B2 Consulting Limited 2013 All Rights Reserved
    26. 26. Problematic JMS • Use a message bridge – More reliable than you can code yourself – Makes adding reliability much easier © C2B2 Consulting Limited 2013 All Rights Reserved
    27. 27. Problematic JMS • How complex is your scenario? – Do you process single units of work over multiple messages? – Do you need to load balance JMS across multiple servers? © C2B2 Consulting Limited 2013 All Rights Reserved
    28. 28. Problematic JMS • Which provider should you use? – Apache ActiveMQ – Apache Camel – WebLogic © C2B2 Consulting Limited 2013 All Rights Reserved

    ×