Troubleshooting SOA suite11g
Upcoming SlideShare
Loading in...5
×
 

Troubleshooting SOA suite11g

on

  • 1,036 views

 

Statistics

Views

Total Views
1,036
Views on SlideShare
1,033
Embed Views
3

Actions

Likes
1
Downloads
43
Comments
0

1 Embed 3

http://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Troubleshooting SOA suite11g Troubleshooting SOA suite11g Presentation Transcript

  • Oracle SOA Suite 11g Troubleshooting Methodology Compiled by :Amit Deo,Oracle FMW SME Consultant Note:The middleware Universe is full of Workarounds :)
  • Slide 2 of 64 © | 1. Introduction 2. The Problem 3. The Basics of Troubleshooting: Where Do You Start? 4. Infrastructure Issues 5. Performance Issues 6. Deployment Issues 7. Summary Agenda
  • Slide 3 of 64 © | INTRODUCTION
  • Slide 5 of 64 © | THE PROBLEM
  • Slide 6 of 64 © |  T-Mobile's support team had an exceedingly difficult time pinpointing the specific cause of the problem.  Not only did the team involve representatives for each IT functional area, they had no way to troubleshoot from the source and no one team had visibility of the complete picture.  In general resolving problems took the T-Mobile's melded support team approximately multiple days. How Every Large Company Troubleshoots
  • Slide 8 of 64 © |  In the past, App and network admins were to blame for everything. Problem With Troubleshooting Integrations
  • Slide 9 of 64 © |  In the FMW Universe, the integration folks are the new target. Problem With Troubleshooting Integrations
  • Slide 10 of 64 © |  Numerous touch points  Numerous SOA technologies  Focus of this document is on Oracle SOA Suite 11g Problem With Troubleshooting IntegrationsWebApplication OEG OSB SOASuite OSB ODI/OAM/OIM 1324
  • Slide 11 of 64 © |  We created WLST wrapper script that loops through and performs garbage collection for all managed servers  OSB relentlessly fails over HTTPS or due to other connectivity reasons  Always getting OutOfMemoryError: PermGen space after new installs/deployments  Weird… but at least consistent Real World Scenario – Bizarre Behaviour
  • Slide 12 of 64 © | Real World Scenario – Convoluted & Unclear  The infamous and ever misleading “Unable to access the following endpoints” error
  • Slide 13 of 64 © |  Could be:  Caused by: java.net.SocketTimeoutException: Read timed out  Message send failed: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBu ilderException: unable to find valid certification path to requested target Real World Scenario – Convoluted & Unclear
  • Slide 14 of 64 © | THE Basic Principles OF TROUBLESHOOTING: WHERE DO YOU START?
  • Slide 15 of 64 © |  Part skill  Some people have natural tendency to pinpoint problem areas  Can be learned; usually involves methodical approach and logic  Part knowledge  Without understanding the product, doesn’t matter how smart you are :)  Most frustrating when it’s related to an area we don’t know What is Troubleshooting?
  • Slide 16 of 64 © |  Co-Workers  Internet searches  OTN discussion forums http://support.oracle.com  My Oracle Support http://support.oracle.com  Oracle Troubleshooting Guide http://docs.oracle.com/cd/E15586_01/fusionapps.1111/e14496/soa_trouble.htm  Oracle SOA Suite 11g Administrator’s Handbook http://www.packtpub.com/oracle-soa-suite-11g-administrators-handbook/book Existing Resources
  • Slide 17 of 64 © | Start Somewhere – Narrow Down Problem Area Issues Performance Server-wideService-specific Runtime Composite Infrastructure Deployment
  • Slide 18 of 64 © | INFRASTRUCTURE ISSUES
  • Slide 19 of 64 © |  Could be a server issue  Could be a coding issue  Could be a business fault that should be handled by the code..Contact Dev Teams  Must be able to differentiate between infrastructure errors and composite instance errors Troubleshooting the Infrastructure
  • Slide 20 of 64 © | 1. Use logs 2. Use thread dumps Troubleshooting the Infrastructure
  • Slide 21 of 64 © |  The soa_server1.out log file contains most runtime issues.For all other issues refer to the servername.log file.  Must differentiate between infrastructure errors and composite instance errors 1. Using Logs
  • Slide 22 of 64 © |  Random crashes immediately after go-live  Only happened in Production  No warning signs  Error does not appear on the EM console Example: Infrastructure Error <Aug 5, 2013 12:00:02 AM EDT> <Error> <oracle.soa.bpel.engine.dispatch> <BEA-000000> <failed to handle message javax.ejb.EJBException: EJB Exception: java.lang.StackOverflowError...
  • Slide 23 of 64 © |  Often easy to distinguish  Should be handled by the code  Shows as a faulted instance on the EM console Example: Business Fault <Aug 6, 2013 10:10:33 AM EDT> <Error> <oracle.soa.mediator.serviceEngine> <BEA-000000> <Got an exception: oracle.fabric.common.FabricInvocationException: javax.xml.ws.soap.SOAPFaultException: Message: Organization 129024 not found. Stack trace: at Core.WebServices.Message.MessageWebService.SaveNotification(O rganization organization, Notification notification) in c:Data1.0CoreMessageMessageWebService.svc.cs:line 100, detail=javax.xml.ws.soap.SOAPFaultException:
  • Slide 24 of 64 © |  Thrown by external system  No action needed  Shows as a faulted instance on the EM console  No action needed; follow up with target system Example: System Fault (but not your fault!) <Aug 6, 2013 10:10:33 AM EDT> <Error> <oracle.soa.mediator.serviceEngine> <BEA-000000> <Got an exception: oracle.fabric.common.FabricInvocationException: javax.xml.ws.soap.SOAPFaultException: CreateCustomer failed with Message: Cannot insert the value NULL into column 'CustomerID', table '@Customers'; column does not allow nulls. INSERT fails.
  • Slide 25 of 64 © |  The infamous and ever misleading “Unable to access the following endpoints” error Example: System Fault
  • Slide 26 of 64 © |  In this case, due to:  Message send failed: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBu ilderException: unable to find valid certification path to requested target Example: System Fault
  • Slide 27 of 64 © |  Just an infrastructure warning  Threads would eventually clear themselves up  Does not show on the EM console  Due to failed transaction that continues to retry Example: Coding or Infrastructure Problem? <Sep 30, 2013 11:30:04 PM EDT> <Warning> <oracle.integration.platform.instance.store.async> <BEA-000000> <Unable to allocate additional threads, as all the threads [10] are in use. Threads distribution : Fabric Instance Activity = 1,Fabric-Instance-Manager = 9,>
  • Slide 28 of 64 © |  A lot more information is logged in the soa_server1- diagnostic.log file Modifying Logger Levels
  • Slide 29 of 64 © |  A lot more information is logged in the soa_server1- diagnostic.log file Modifying Logger Levels [2012-01-01T22:35:56.144-05:00] [soa_server1] [TRACE] [] [oracle.soa.adapter] [ecid: cb680017c6a0acfe:-3f1527ec:13487d1ea4c:-8000-0000000000000fe1,0:2] JmsProducer_execute:[default destination = jndi/CustomerJMSQueue]: Successfully produced message. [2012-01-01T22:35:56.256-05:00] [soa_server1] [NOTIFICATION] [] [oracle.soa.adapter] [ecid: cb680017c6a0acfe:-5675273b:1348cccad75:-8000-0000000000055743,0] JMSAdapter JMSConsumer JMSMessageConsumer_consume: Got message with ID ID:<458362.1325475356144.0> from destination jndi/CustomerJMSQueue [2012-01-01T22:35:56.261-05:00] [soa_server1] [TRACE] [] [oracle.soa.adapter] [ecid: cb680017c6a0acfe:-5675273b:1348cccad75:-8000-0000000000055743,0] JMS Adapter JMSProducer:CustomerJMS [ CustomerProduce_ptt::CustomerProduce(body) ] XMLHelper_convertJmsMessageHeadersAndPropertiesToXML: <JMSInboundHeadersAndProperties xmlns="http://xmlns.oracle.com/pcbpel/ adapter/jms/">[[ <JMSInboundHeaders> <JMSMessageID>ID:&lt;458362.1325475356144.0></JMSMessageID> <JMSTimestamp>1325475356144</JMSTimestamp>
  • Slide 30 of 64 © |  When a managed server goes into warning state, what are you supposed to do? 2. Using Thread Dumps
  • Slide 31 of 64 © |  Navigate to Servers > (managed server) > Monitoring > Threads Understanding Stuck Threads
  • Slide 32 of 64 © |  AdminServer.log  bam_server1.log Understanding Stuck Threads ####<Dec 23, 2011 6:03:49 PM EST> <Error> <WebLogicServer> <soahost1> <AdminServer> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "658" seconds ####<Dec 23, 2011 5:53:36 PM EST> <Error> <JMX> <soahost1> <bam_ server1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel. Default (self-tuning)'> <<WLS Kernel>> <> <> <1324680816405> <BEA- 149500> <An exception occurred while registering the MBean com.bea:Name=AdminServer,Type=WebServiceRequestBufferingQueue, WebServiceBuffering=AdminServer,Server=AdminServer, WebService=AdminServer. java.lang.OutOfMemoryError: PermGen space
  • Slide 33 of 64 © | 1. We found AdminServer to be in the “Warning” state, due to a stuck thread. 2. We confirmed that there was indeed a stuck “ ExecuteThread ” as shown on both the Oracle WebLogic Administration Console and the AdminServer.log file. 3. By reviewing the soa_server1.log and bam_server1.log files, we found startup errors in the BAM server log. 4. The BAM server was unable to register an AdminServer MBean due to the java.lang.OutOfMemoryError exception that was thrown. Understanding Stuck Threads
  • Slide 34 of 64 © | PERFORMANCE ISSUES
  • Slide 35 of 64 © |  Is logging in to Oracle Enterprise Manager Fusion Middleware Control extremely slow?  Are all composite instances completing in an unusually longer period of time?  Are the logs or your dehydration database growing unusually quickly?  Are you seeing an exceptionally high number of errors in the logs? Server Wide Performance Issues
  • Slide 36 of 64 © | root@soahost1:/root> df –m Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda8 996 451 494 48% / /dev/sda9 815881 697454 76314 91% /u01 /dev/sda7 996 36 909 4% /home /dev/sda5 1984 138 1744 8% /tmp /dev/sda3 1984 283 1598 16% /var /dev/sda2 5950 3842 1802 69% /usr /dev/sda1 99 12 83 13% /boot tmpfs 8023 0 8023 0% /dev/shm Check available disk space  Often an overlooked area
  • Slide 37 of 64 © |  The vmstat or TOP command easily outputs CPU, memory, and I/O statistics  Do not rely on Linux’s reporting of available memory, and best to look at SWAP space usage  Why Linux reports 100% memory usage all the time ??? Check CPU, RAM, and I/O root@soahost1:/root> vmstat -S m procs -------memory--------- --swap-- ---io-- --system-- ----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 59 402 15055 0 0 2 16 0 0 2 2 96 1 0
  • Slide 38 of 64 © |  System log files can reveal resource issues: Check OS Resources root@soahost1:/root> cat /var/log/messages Aug 31 20:53:22 uslx286 sshd[22480]: fatal: setresuid 10000: Resource temporarily unavailable root@soahost1:/root> ps -A | wc -l 297 root@soahost1:/root> lsof | wc -l 6064  Too many open files can exhaust system resources:  Too many running processes can exhaust system resources:
  • Slide 39 of 64 © |  For performance, consider the following:  Switching from Sun JDK to JRockit JDK  Optimizing JVM settings  Additional JVM performance tuning documentation from Oracle can be found at: http://docs.oracle.com/cd/E23943_01/web.1111/e13814.pdf http://docs.oracle.com/cd/E15289_01/doc.40/e15060.pdf JVM Performance Tuning
  • Slide 40 of 64 © |  Add this to the PORT_MEM_ARGS, argument in the setSOADomainEnv.sh(.cmd) script -XX:+HeapDumpOnOutOfMemoryError  Although this is not a performance setting, I recommend setting it to dump the heap to an hprof file when java.lang.OutOfMemoryError exceptions are thrown  This is useful for later analysis and troubleshooting JVM Logging
  • Slide 41 of 64 © |  Ensuring that the heap allocated to the JVM is appropriately sized (that is, comparing heap versus non-heap usage)  Ensure that there is no excessive garbage collection  Monitor JVM thread performance Check JVM
  • Slide 42 of 64 © |  Data source errors are usually easy to identify – when exhausted, errors show up everywhere Check Data Sources
  • Slide 43 of 64 © |  Involve a DBA,who is familiar with the Platform. Check Database Performance
  • Slide 44 of 64 © |  Navigate to Monitoring > Performance Summary  Can choose metrics to display for any composite Viewing Performance Summary Graphs
  • Slide 45 of 64 © |  Right-click on Monitoring > Request Processing  Utilizing SQL queries is so much better Viewing Request Processing Metrics
  • Slide 46 of 64 © |  Remember SQL output from last page?  Let’s also get the invoke durations Composite Instance Performance SELECT composite_instance_id, composite_creation_date, component_name, action, component_state, TO_CHAR((TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),18,4)),'999990.000') duration FROM mediator_instance WHERE component_name = 'Order.Create’
  • Slide 47 of 64 © | DEPLOYMENT ISSUES
  • Slide 48 of 64 © |  Involves: 1. Compilation ant -f ant-sca-package.xml package - DcompositeDir=$CODE/HelloWorld - DcompositeName=HelloWorld -Drevision=1.0 2. Deployment ant -f ant-sca-deploy.xml deploy - DserverURL=$SOAURL/soa-infra/deployer - Duser=$USERNAME -Dpassword=$PASSWORD - DsarLocation=$CODE/HelloWorld/deploy/sca_HelloWorl d_rev1.0.jar -Dpartition=default -Doverwrite=true -DforceDefault=true Understanding the Ant Deployment Process {we are not using Ant..but having this info won't hurt}
  • Slide 49 of 64 © |  Compilation done via the package target in ant-sca- package.xml  The package target calls other targets to perform: 1. Cleanup 2. Validation 3. Compilation Understanding the Ant Compilation Process
  • Slide 50 of 64 © |  Removes any existing SAR files Compilation: The init Target clean: [echo] deleting /u01/svn/HelloWorld/deploy/sca_HelloWorld_rev1.0.jar
  • Slide 51 of 64 © |  Sets environment variables and validates all resources within the code Compilation: The scac-validate Target scac-validate: [echo] Running scac-validate in /u01/svn/HelloWorld/composite.xml [echo] oracle.home = /u01/app/oracle/middleware/Oracle_SOA1/bin/.. [input] skipping input as property compositeDir has already been set. [input] skipping input as property compositeName has already been set. [input] skipping input as property revision has already been set.
  • Slide 52 of 64 © |  Compiles the code Compilation: The scac Target scac: [scac] Validating composite "/u01/svn/HelloWorld/composite.xml" [scac] error: location . Load of wsdl "HelloWorldWebService.wsdl with Message part element undefined in wsdl [file:/u01/svn/HelloWorld/ . [echo] [echo] ERROR IN TRYCATCH BLOCK: [echo] /u01/scripts/build.soa.xml:112: The following error occurred while executing this line: . [echo] /u01/app/oracle/middleware/Oracle_SOA1/bin/ant-sca- compile.xml:269: Java returned: 1 Check log file : /tmp/out.err for errors
  • Slide 53 of 64 © |  Understand that ant runs on the client machine, not the SOA server [echo] /u01/app/oracle/middleware/Oracle_SOA1/bin/ant-sca deploy.xml:188: java.lang.OutOfMemoryError: PermGen space  Compilation errors, check out.err and understand adf- config.xml oracle.fabric.common.wsdl.SchemaBuilder.loadEmbeddedSchemas (SchemaBuilder.java:492) Caused by: java.io.IOException: oracle.mds.exception.MDSException: MDS-00054: The file to be loaded oramds:/apps/Common/HelloWorld.xsd does not exist.  Deployment errors are usually straightforward [deployComposite] INFO: Creating HTTP connection to host:soahost1, port:8001 [deployComposite] java.net.UnknownHostException: soahost1 Types of Errors
  • Slide 54 of 64 © |  Located in Unix/Linux: /tmp/out.err  Located in Microsoft Windows: C:Users[user]AppDataLocalTempout.err Location of out.err
  • Slide 55 of 64 © | OTHER STUFF
  • Slide 56 of 64 © |  DMS Spy Servlet displays instant Dynamic Monitoring Service (DMS) related metrics  Navigate to http://<host>:<soaport>/dms/Spy http://docs.oracle.com/cd/E15586_01/core.1111/e10108/monitor.htm#CFAHIAIB The DMS Spy Servlet
  • Slide 57 of 64 © |  The EDN Database Debug Log can be accessed at: http://<host>:<soaport>/soa-infra/events/edn-db-log  Changing the oracle.integration.platform.blocks.event.saq logger to TRACE:32 captures the body of the event message is available in the EDN trace Check Event Delivery Network (EDN)
  • Slide 58 of 64 © | SUMMARY
  • Slide 59 of 64 © |  Troubleshooting is part politics, part product knowledge  Oracle SOA Suite 11g errors can mostly be classified into:  Runtime (or infrastructure) errors  Performance issues/errors  Deployment errors Summary
  • Slide 60 of 64 © |  For infrastructure errors:  Identify whether it is a composite or an infrastructure error  Consider increasing logger levels  Identifying the root cause of stuck threads may require some drill-down investigation Summary
  • Slide 61 of 64 © |  For performance issues:  Identify whether it is a server-wide performance issue, or specific to a single composite  Check overall system health, even the obvious areas  Obtaining composite instance performance metrics is easily done through SQL,In case of OSB/Paris run SOAP UI unit tests. Summary
  • Slide 62 of 64 © |  For deployment errors:  Understand the ant compilation (i.e., packaging) and deployment processes  Understand adf-config.xml Summary
  • Slide 63 of 64 © |  Oracle SOA Suite 11g Administrator’s Handbook http://www.packtpub.com/oracle-soa-suite-11g- administrators-handbook/book  Chapter 6: Troubleshooting the Oracle SOA Suite 11g Infrastructure  “Highly recommended Book
  • Slide 64 of 64 | Amit Deo Senior Consultant amitsdeo@gmail.com Contact Information