Intro to Passkeys and the State of Passwordless.pptx
Analyzing a memory leak problem
1. Analyzing a memory leak
problem
Finding the relationship between a
misconfigured DBAdapter and a
memory leak problem
2. Symptoms
An OSB installation suddenly crashed after
having performance problem during several
hours and this message was found in the log
files.
java.lang.OutOfMemoryError: Java heap space
3. Current configuration
• I tuned that OSB installation in 2015 with this parameters
if "%SERVER_NAME%"=="osb_server1" (
set USER_MEM_ARGS=-Xms6144M -Xmx6144M -Xmn1536M -
XX:PermSize=2048M -XX:MaxPermSize=2048M -Xss256K -
XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:+UseParallelOldGC
-XX:+AlwaysPreTouch -XX:+ParallelRefProcEnabled -
XX:MaxTenuringThreshold=15 -XX:-UseAdaptiveSizePolicy -
XX:+DisableExplicitGC -Dweblogic.threadpool.MinPoolSize=50 -
XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -
XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -
XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -
Xloggc:D:u01configdomainssoa_domainserversosb_server1logs
gclog_osb.txt
)
4. Analyzing the root cause
• One of the first things to solve this kind of problem is
analyzing heap dumps to see which classes are
depleting the memory.
• In this case, I used two different tools Eclipse Memory
Analyzer Tool (MAT) and IBM Heap Analyzer to analyze
two heaps
• I have to remark that this parameter -
XX:+HeapDumpOnOutOfMemoryError I configured in
2015 is useful in these cases because we can get the
information generated in the heap just before the crash
5. Using MAT on a 3GB heap dump
In this case Memory Analyser Tool
(MAT) shows the first suspect for the
leak is the class called:
org.eclipse.persistence.internal.sessio
ns.IsolatedClientSession, which was
occupying the 57.28% of the heap as
can be seen in this figure.
It looks like a memory leak that can be
caused by some methods related to
TopLink
6. Using MAT on a 3GB heap dump
MAT also shows the second suspect
for the memory leak, which is the class
called java.lang.ref.Finalizer as can be
seen in the following figure.
This is interesting because the
finalization of resources is related to
the misuse of resources such as files,
data sources, sockets, etc.
7. Using IBM Heap Analyzer on a 3GB heap dump
IBM Heap Analyzer has discovered an important relationship between
java.lang.ref.finalizer and TopLink, this can be seen in this text.
At this point we can have a hypothesis, the responsible of the problem is related to the
persistence of objects using Oracle TopLink and Java Connector Architecture like
oracle/tip/adapter/sa/impl/inboundJCABindingActivationAgent
8. Using MAT on a 6GB heap dump
In this case MAT shows the main suspects related to the finalization of objects
and Oracle TopLink as can be seen in the following report.
9. Using MAT on a 6GB heap dump
In addition, the Dominator Tree was analysed in detail for this case and it has
shown that many of the Finalizers are not only related to TopLink, but they are
related to a specific driver to connect to MS-SQL databases.
10. Using MAT on a 6GB heap dump
It means the problem can be caused by the misuse or misconfiguration of
TopLink or perhaps because of a bug related to this product. However, according
to page of Eclipse 2.5.2 (version used on that server) between its known bugs,
there are no any bug related to memory leaks. This is the list of bugs for Eclipse
2.5.2
11. Using MAT on a 6GB heap dump
Nevertheless, the most important fact in this case is the relationship with many
objects used to set a connection with the MS-SQL database.
With this in mind, in the following section two connection pools related to SQL
Server are analysed and tuned.
12. Analysing and tuning connection pools related to MS
SQL Server
These are the most used data sources
Both data sources connect the system to MS-SQL Server, which makes sense with
the amount of objects shown in the heap. Since many problems related to the
finalization of objects are related to leak of resources such as connection pools, files,
sockets, etc. One of the goals of the tuning process is to set a timeout for inactive
JDBC connections. More details about these problems can be found here:
http://blog.sysco.no/db/locking/jdbc-leak/
Following, the tuning process for both data sources is shown
13. Data source one
You can see the lack of “testing connections on reserve” to avoid problems with
dirty connections and the lack of a timeout for inactive connections, which can
generate JDBC connection leaks due to bugs, misconfigurations or programming
errors. The configuration was changed to use test connection on reserve and to
use a timeout equal to 600s
14. Data source two
I found this configuration
This case was strange because the
test table name had the query “SQL
Select 1 from dual”. However, when
a data source that points to MS-SQL
Server is created the by default
query generated by Weblogic
console is “SQL Select 1”. Thus, it
seems to be this data source was
modified to point to MS-SQL Server
instead of Oracle.
15. Data source two
In addition, when I used the Administrative Console to tune the Data Source the
following error was issued just when the Test Table Name was modified.
Therefore, this datasource was deleted and created again to fix the test table
name, to activate test connection on reserve and to set the timeout to 600s
16. Looking for leaking of connections (JDBC) after
tuning connection pools
After modifying connection pools the system was monitored between 17:00 and
23:50 and a JDBC connection leak related to the data sources was detected. This
is shown in the following figure
17. Looking for leaking of connections (JDBC)
after tuning connection pools
With this in mind, the log file was checked to find the source code that does not
close resources properly. The following figure shows the source code discovered.
18. Looking for leaking of connections (JDBC)
after tuning connection pools
Again, some objects related to Oracle TopLink are responsible for a leak, of
resources in this case, which can be related to the problem on finalizers shown
in previous sections when the memory leak was analysed. Hence, it is necessary
to review the DBAdapter configuration or looking for some bugs related to
TopLink or some programming or modelling errors.
19. Analysing DBAdapters and outbound
connections
During the analysis of this, it was found a problem on the DbAdapter related to
one of the data sources shown previously because it was using a
PlatformClassName for Oracle databases to connect to a data source that points
to an MS-SQL Server data base. This was the misconfiguration found.
20. Analysing DBAdapters and outbound
connections
This configuration was fixed to use a proper PlatformClassName for MS-SQL
Server and now it looks like this.
21. Monitoring the heap after fixing the DBAdapter
and tuning the CP
We do not have Full GC, which mean the Old Generation has not been depleted. In addition, the
amount of memory used on the Old Generation was almost 1,6 GB from 4,5 GB available, which
means most objects are dying at the Young generation (weak generational hypothesis). This log was
generated using -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -
XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
22. Conclusions and recommendations
According to the analysis done, the memory leak problem, which at the beginning seemed to be a
bug of TopLink was caused by the misconfiguration of outbound connections within the
DBAdapter. In this particular case, the problem was caused by the use of a PlatformClassName
oriented to Oracle databases to work with a MS-SQL Server Database.
Furthermore, due to this error with the PlaformClassName and the wrong test table name, it looks
like this outbound connection and its data source had been used to point to Oracle before. Thus, if
this is the case the recommendation is to review the impact of migrations and configurations
before applying them on the application servers.