Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oracle SOA Suite 11g Troubleshooting Methodology

679 views

Published on

Collaborate 13

Published in: Technology
  • Be the first to comment

Oracle SOA Suite 11g Troubleshooting Methodology

  1. 1. Oracle SOA Suite 11g Troubleshooting Methodology April 10th, 2013 16:15-17:15 Mile High Ballroom 3C Harold Dost III Senior Consultant Raastech, Inc.
  2. 2. Slide 2 of 64 © Raastech, Inc. 2012 | All rights reserved. 1. Introduction 2. The Problem 3. The Art of Troubleshooting: Where Do You Start? 4. Infrastructure Issues 5. Performance Issues 6. Deployment Issues 7. Summary Agenda
  3. 3. Slide 3 of 64 © Raastech, Inc. 2012 | All rights reserved. INTRODUCTION
  4. 4. Slide 4 of 64 © Raastech, Inc. 2012 | All rights reserved. Harold Dost III  5+ years of Oracle middleware experience  Experience in large implementations involving SOA Suite, BAM, AIA, OSB, OSR, ODI, OWSM, OER, OEG, and more  OCE (SOA Foundation Practitioner) About Me
  5. 5. Slide 5 of 64 © Raastech, Inc. 2012 | All rights reserved. THE PROBLEM
  6. 6. Slide 6 of 64 © Raastech, Inc. 2012 | All rights reserved.  The Macy’s support team had an exceedingly difficult time pinpointing the specific cause of the problem.  Not only did the team involve representatives for each IT functional area, they had no way to troubleshoot from the source and no one team had visibility of the complete picture.  In general resolving problems took the Macy’s melded support team approximately multiple days. http://www.splunk.com/web_assets/pdfs/secure/Troubleshooting_Critical_Applications.pdf How Every Large Company Troubleshoots
  7. 7. Slide 7 of 64 © Raastech, Inc. 2012 | All rights reserved.  The Macy’s support team had an exceedingly difficult time pinpointing the specific cause of the problem.  Not only did the team involve representatives for each IT functional area, they had no way to troubleshoot from the source and no one team had visibility of the complete picture.  In general resolving problems took the Macy’s melded support team approximately multiple days. http://www.splunk.com/web_assets/pdfs/secure/Troubleshooting_Critical_Applications.pdf How Every Large Company Troubleshoots
  8. 8. Slide 8 of 64 © Raastech, Inc. 2012 | All rights reserved.  In the past, network admins were to blame for everything. Problem With Troubleshooting Integrations
  9. 9. Slide 9 of 64 © Raastech, Inc. 2012 | All rights reserved.  In the 21st century, the integration folks are the new target. Problem With Troubleshooting Integrations
  10. 10. Slide 10 of 64 © Raastech, Inc. 2012 | All rights reserved.  Numerous touch points  Numerous SOA technologies  Focus of this presentation is on Oracle SOA Suite 11g Problem With Troubleshooting IntegrationsWebApplication OEG OSB SOASuite OSB ODI 1324
  11. 11. Slide 11 of 64 © Raastech, Inc. 2012 | All rights reserved.  We created an Ant wrapper script that loops through and deploys all composites  Calls the deploy target in ant-sca-deploy.xml  Always getting OutOfMemoryError: PermGen space after exactly 66 composite deployments  Weird… but at least consistent Real World Scenario – Bizarre Behaviour
  12. 12. Slide 12 of 64 © Raastech, Inc. 2012 | All rights reserved. Real World Scenario – Vague & Unclear  The infamous and ever misleading “Unable to access the following endpoints” error
  13. 13. Slide 13 of 64 © Raastech, Inc. 2012 | All rights reserved.  Could be:  Caused by: java.net.SocketTimeoutException: Read timed out  Message send failed: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBu ilderException: unable to find valid certification path to requested target Real World Scenario – Vague & Unclear
  14. 14. Slide 14 of 64 © Raastech, Inc. 2012 | All rights reserved. THE ART OF TROUBLESHOOTING: WHERE DO YOU START?
  15. 15. Slide 15 of 64 © Raastech, Inc. 2012 | All rights reserved.  Part skill  Some people have natural tendency to pinpoint problem areas  Can be learned; usually involves methodical approach and logic  Part knowledge  Without understanding the product, doesn’t matter how smart you are  Most frustrating when it’s related to an area we don’t know What is Troubleshooting?
  16. 16. Slide 16 of 64 © Raastech, Inc. 2012 | All rights reserved.  Co-Workers  Internet searches  OTN discussion forums http://support.oracle.com  My Oracle Support http://support.oracle.com  Oracle Troubleshooting Guide http://docs.oracle.com/cd/E15586_01/fusionapps.1111/e14496/soa_trouble.htm  Oracle SOA Suite 11g Administrator’s Handbook http://www.packtpub.com/oracle-soa-suite-11g-administrators-handbook/book Existing Resources
  17. 17. Slide 17 of 64 © Raastech, Inc. 2012 | All rights reserved. Start Somewhere – Narrow Down Problem Area Issues Performance Server-wideService-specific Runtime Composite Infrastructure Deployment
  18. 18. Slide 18 of 64 © Raastech, Inc. 2012 | All rights reserved. INFRASTRUCTURE ISSUES
  19. 19. Slide 19 of 64 © Raastech, Inc. 2012 | All rights reserved.  Could be a server issue  Could be a coding issue  Could be a business fault that should be handled by the code  Must be able to differentiate between infrastructure errors and composite instance errors Troubleshooting the Infrastructure
  20. 20. Slide 20 of 64 © Raastech, Inc. 2012 | All rights reserved. 1. Use logs 2. Use thread dumps Troubleshooting the Infrastructure
  21. 21. Slide 21 of 64 © Raastech, Inc. 2012 | All rights reserved.  The soa_server1.out log file contains most runtime issues  Must differentiate between infrastructure errors and composite instance errors 1. Using Logs
  22. 22. Slide 22 of 64 © Raastech, Inc. 2012 | All rights reserved.  Random crashes immediately after go-live  Only happened in Production  No warning signs  Error does not appear on the EM console Example: Infrastructure Error <Aug 5, 2011 12:00:02 AM EDT> <Error> <oracle.soa.bpel.engine.dispatch> <BEA-000000> <failed to handle message javax.ejb.EJBException: EJB Exception: java.lang.StackOverflowError...
  23. 23. Slide 23 of 64 © Raastech, Inc. 2012 | All rights reserved.  Often easy to distinguish  Should be handled by the code  Shows as a faulted instance on the EM console Example: Business Fault <Aug 6, 2011 10:10:33 AM EDT> <Error> <oracle.soa.mediator.serviceEngine> <BEA-000000> <Got an exception: oracle.fabric.common.FabricInvocationException: javax.xml.ws.soap.SOAPFaultException: Message: Organization 129024 not found. Stack trace: at Core.WebServices.Message.MessageWebService.SaveNotification(O rganization organization, Notification notification) in c:Data1.0CoreMessageMessageWebService.svc.cs:line 100, detail=javax.xml.ws.soap.SOAPFaultException:
  24. 24. Slide 24 of 64 © Raastech, Inc. 2012 | All rights reserved.  Thrown by external system  No action needed  Shows as a faulted instance on the EM console  No action needed; follow up with target system Example: System Fault (but not your fault!) <Aug 6, 2011 10:10:33 AM EDT> <Error> <oracle.soa.mediator.serviceEngine> <BEA-000000> <Got an exception: oracle.fabric.common.FabricInvocationException: javax.xml.ws.soap.SOAPFaultException: CreateCustomer failed with Message: Cannot insert the value NULL into column 'CustomerID', table '@Customers'; column does not allow nulls. INSERT fails.
  25. 25. Slide 25 of 64 © Raastech, Inc. 2012 | All rights reserved.  The infamous and ever misleading “Unable to access the following endpoints” error Example: System Fault
  26. 26. Slide 26 of 64 © Raastech, Inc. 2012 | All rights reserved.  In this case, due to:  Message send failed: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBu ilderException: unable to find valid certification path to requested target Example: System Fault
  27. 27. Slide 27 of 64 © Raastech, Inc. 2012 | All rights reserved.  Just an infrastructure warning  Threads would eventually clear themselves up  Does not show on the EM console  Due to failed transaction that continues to retry Example: Coding or Infrastructure Problem? <Sep 30, 2011 11:30:04 PM EDT> <Warning> <oracle.integration.platform.instance.store.async> <BEA-000000> <Unable to allocate additional threads, as all the threads [10] are in use. Threads distribution : Fabric Instance Activity = 1,Fabric-Instance-Manager = 9,>
  28. 28. Slide 28 of 64 © Raastech, Inc. 2012 | All rights reserved.  A lot more information is logged in the soa_server1- diagnostic.log file Modifying Logger Levels
  29. 29. Slide 29 of 64 © Raastech, Inc. 2012 | All rights reserved.  A lot more information is logged in the soa_server1- diagnostic.log file Modifying Logger Levels [2012-01-01T22:35:56.144-05:00] [soa_server1] [TRACE] [] [oracle.soa.adapter] [ecid: cb680017c6a0acfe:-3f1527ec:13487d1ea4c:-8000-0000000000000fe1,0:2] JmsProducer_execute:[default destination = jndi/CustomerJMSQueue]: Successfully produced message. [2012-01-01T22:35:56.256-05:00] [soa_server1] [NOTIFICATION] [] [oracle.soa.adapter] [ecid: cb680017c6a0acfe:-5675273b:1348cccad75:-8000-0000000000055743,0] JMSAdapter JMSConsumer JMSMessageConsumer_consume: Got message with ID ID:<458362.1325475356144.0> from destination jndi/CustomerJMSQueue [2012-01-01T22:35:56.261-05:00] [soa_server1] [TRACE] [] [oracle.soa.adapter] [ecid: cb680017c6a0acfe:-5675273b:1348cccad75:-8000-0000000000055743,0] JMS Adapter JMSProducer:CustomerJMS [ CustomerProduce_ptt::CustomerProduce(body) ] XMLHelper_convertJmsMessageHeadersAndPropertiesToXML: <JMSInboundHeadersAndProperties xmlns="http://xmlns.oracle.com/pcbpel/ adapter/jms/">[[ <JMSInboundHeaders> <JMSMessageID>ID:&lt;458362.1325475356144.0></JMSMessageID> <JMSTimestamp>1325475356144</JMSTimestamp>
  30. 30. Slide 30 of 64 © Raastech, Inc. 2012 | All rights reserved.  When a managed server goes into warning state, what are you supposed to do? 2. Using Thread Dumps
  31. 31. Slide 31 of 64 © Raastech, Inc. 2012 | All rights reserved.  Navigate to Servers > (managed server) > Monitoring > Threads Understanding Stuck Threads
  32. 32. Slide 32 of 64 © Raastech, Inc. 2012 | All rights reserved.  AdminServer.log  bam_server1.log Understanding Stuck Threads ####<Dec 23, 2011 6:03:49 PM EST> <Error> <WebLogicServer> <soahost1> <AdminServer> <BEA-000337> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "658" seconds ####<Dec 23, 2011 5:53:36 PM EST> <Error> <JMX> <soahost1> <bam_ server1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel. Default (self-tuning)'> <<WLS Kernel>> <> <> <1324680816405> <BEA- 149500> <An exception occurred while registering the MBean com.bea:Name=AdminServer,Type=WebServiceRequestBufferingQueue, WebServiceBuffering=AdminServer,Server=AdminServer, WebService=AdminServer. java.lang.OutOfMemoryError: PermGen space
  33. 33. Slide 33 of 64 © Raastech, Inc. 2012 | All rights reserved. 1. We found AdminServer to be in the “Warning” state, due to a stuck thread. 2. We confirmed that there was indeed a stuck “ ExecuteThread ” as shown on both the Oracle WebLogic Administration Console and the AdminServer.log file. 3. By reviewing the soa_server1.log and bam_server1.log files, we found startup errors in the BAM server log. 4. The BAM server was unable to register an AdminServer MBean due to the java.lang.OutOfMemoryError exception that was thrown. Understanding Stuck Threads
  34. 34. Slide 34 of 64 © Raastech, Inc. 2012 | All rights reserved. PERFORMANCE ISSUES
  35. 35. Slide 35 of 64 © Raastech, Inc. 2012 | All rights reserved.  Is logging in to Oracle Enterprise Manager Fusion Middleware Control extremely slow?  Are all composite instances completing in an unusually longer period of time?  Are the logs or your dehydration database growing unusually quickly?  Are you seeing an exceptionally high number of errors in the logs? Server Wide Performance Issues
  36. 36. Slide 36 of 64 © Raastech, Inc. 2012 | All rights reserved. root@soahost1:/root> df –m Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda8 996 451 494 48% / /dev/sda9 815881 697454 76314 91% /u01 /dev/sda7 996 36 909 4% /home /dev/sda5 1984 138 1744 8% /tmp /dev/sda3 1984 283 1598 16% /var /dev/sda2 5950 3842 1802 69% /usr /dev/sda1 99 12 83 13% /boot tmpfs 8023 0 8023 0% /dev/shm Check available disk space  Often an overlooked area
  37. 37. Slide 37 of 64 © Raastech, Inc. 2012 | All rights reserved.  The vmstat command easily outputs CPU, memory, and I/O statistics  Do not rely on Linux’s reporting of available memory, and best to look at SWAP space usage  Why Linux reports 100% memory usage all the time http://blog.raastech.com/2008/01/why-linux-reports-100-memory-usage-all.html Check CPU, RAM, and I/O root@soahost1:/root> vmstat -S m procs -------memory--------- --swap-- ---io-- --system-- ----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 59 402 15055 0 0 2 16 0 0 2 2 96 1 0
  38. 38. Slide 38 of 64 © Raastech, Inc. 2012 | All rights reserved.  System log files can reveal resource issues: Check OS Resources root@soahost1:/root> cat /var/log/messages Aug 31 20:53:22 uslx286 sshd[22480]: fatal: setresuid 10000: Resource temporarily unavailable root@soahost1:/root> ps -A | wc -l 297 root@soahost1:/root> lsof | wc -l 6064  Too many open files can exhaust system resources:  Too many running processes can exhaust system resources:
  39. 39. Slide 39 of 64 © Raastech, Inc. 2012 | All rights reserved.  For performance, consider the following:  Switching from Sun JDK to JRockit JDK  Optimizing JVM settings  Additional JVM performance tuning documentation from Oracle can be found at: http://docs.oracle.com/cd/E23943_01/web.1111/e13814.pdf http://docs.oracle.com/cd/E15289_01/doc.40/e15060.pdf JVM Performance Tuning
  40. 40. Slide 40 of 64 © Raastech, Inc. 2012 | All rights reserved.  Add this to the PORT_MEM_ARGS, argument in the setSOADomainEnv.sh(.cmd) script -XX:+HeapDumpOnOutOfMemoryError  Although this is not a performance setting, we recommend setting it to dump the heap to an hprof file when java.lang.OutOfMemoryError exceptions are thrown  This is useful for later analysis and troubleshooting JVM Logging
  41. 41. Slide 41 of 64 © Raastech, Inc. 2012 | All rights reserved.  Ensuring that the heap allocated to the JVM is appropriately sized (that is, comparing heap versus non-heap usage)  Ensure that there is no excessive garbage collection  Monitor JVM thread performance Check JVM
  42. 42. Slide 42 of 64 © Raastech, Inc. 2012 | All rights reserved.  Data source errors are usually easy to identify – when exhausted, errors show up everywhere Check Data Sources
  43. 43. Slide 43 of 64 © Raastech, Inc. 2012 | All rights reserved.  Involve a DBA Check Database Performance
  44. 44. Slide 44 of 64 © Raastech, Inc. 2012 | All rights reserved.  Navigate to Monitoring > Performance Summary  Can choose metrics to display for any composite Viewing Performance Summary Graphs
  45. 45. Slide 45 of 64 © Raastech, Inc. 2012 | All rights reserved.  Right-click on Monitoring > Request Processing  Utilizing SQL queries is so much better Viewing Request Processing Metrics
  46. 46. Slide 46 of 64 © Raastech, Inc. 2012 | All rights reserved.  Remember SQL output from last page?  Let’s also get the invoke durations Composite Instance Performance SELECT composite_instance_id, composite_creation_date, component_name, action, component_state, TO_CHAR((TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),12,2))*60*60) + (TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),15,2))*60) + TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),18,4)),'999990.000') duration FROM mediator_instance WHERE component_name = 'Order.Create’
  47. 47. Slide 47 of 64 © Raastech, Inc. 2012 | All rights reserved. DEPLOYMENT ISSUES
  48. 48. Slide 48 of 64 © Raastech, Inc. 2012 | All rights reserved.  Involves: 1. Compilation ant -f ant-sca-package.xml package - DcompositeDir=$CODE/HelloWorld - DcompositeName=HelloWorld -Drevision=1.0 2. Deployment ant -f ant-sca-deploy.xml deploy - DserverURL=$SOAURL/soa-infra/deployer - Duser=$USERNAME -Dpassword=$PASSWORD - DsarLocation=$CODE/HelloWorld/deploy/sca_HelloWorl d_rev1.0.jar -Dpartition=default -Doverwrite=true -DforceDefault=true Understanding the Ant Deployment Process
  49. 49. Slide 49 of 64 © Raastech, Inc. 2012 | All rights reserved.  Compilation done via the package target in ant-sca- package.xml  The package target calls other targets to perform: 1. Cleanup 2. Validation 3. Compilation Understanding the Ant Compilation Process
  50. 50. Slide 50 of 64 © Raastech, Inc. 2012 | All rights reserved.  Removes any existing SAR files Compilation: The init Target clean: [echo] deleting /u01/svn/HelloWorld/deploy/sca_HelloWorld_rev1.0.jar
  51. 51. Slide 51 of 64 © Raastech, Inc. 2012 | All rights reserved.  Sets environment variables and validates all resources within the code Compilation: The scac-validate Target scac-validate: [echo] Running scac-validate in /u01/svn/HelloWorld/composite.xml [echo] oracle.home = /u01/app/oracle/middleware/Oracle_SOA1/bin/.. [input] skipping input as property compositeDir has already been set. [input] skipping input as property compositeName has already been set. [input] skipping input as property revision has already been set.
  52. 52. Slide 52 of 64 © Raastech, Inc. 2012 | All rights reserved.  Compiles the code Compilation: The scac Target scac: [scac] Validating composite "/u01/svn/HelloWorld/composite.xml" [scac] error: location . Load of wsdl "HelloWorldWebService.wsdl with Message part element undefined in wsdl [file:/u01/svn/HelloWorld/ . [echo] [echo] ERROR IN TRYCATCH BLOCK: [echo] /u01/scripts/build.soa.xml:112: The following error occurred while executing this line: . [echo] /u01/app/oracle/middleware/Oracle_SOA1/bin/ant-sca- compile.xml:269: Java returned: 1 Check log file : /tmp/out.err for errors
  53. 53. Slide 53 of 64 © Raastech, Inc. 2012 | All rights reserved.  Understand that ant runs on the client machine, not the SOA server [echo] /u01/app/oracle/middleware/Oracle_SOA1/bin/ant-sca deploy.xml:188: java.lang.OutOfMemoryError: PermGen space  Compilation errors, check out.err and understand adf- config.xml oracle.fabric.common.wsdl.SchemaBuilder.loadEmbeddedSchemas (SchemaBuilder.java:492) Caused by: java.io.IOException: oracle.mds.exception.MDSException: MDS-00054: The file to be loaded oramds:/apps/Common/HelloWorld.xsd does not exist.  Deployment errors are usually straightforward [deployComposite] INFO: Creating HTTP connection to host:soahost1, port:8001 [deployComposite] java.net.UnknownHostException: soahost1 Types of Errors
  54. 54. Slide 54 of 64 © Raastech, Inc. 2012 | All rights reserved.  Located in Unix/Linux: /tmp/out.err  Located in Microsoft Windows: C:Users[user]AppDataLocalTempout.err Location of out.err
  55. 55. Slide 55 of 64 © Raastech, Inc. 2012 | All rights reserved. OTHER STUFF
  56. 56. Slide 56 of 64 © Raastech, Inc. 2012 | All rights reserved.  DMS Spy Servlet displays instant Dynamic Monitoring Service (DMS) related metrics  Navigate to http://<host>:<soaport>/dms/Spy http://docs.oracle.com/cd/E15586_01/core.1111/e10108/monitor.htm#CFAHIAIB The DMS Spy Servlet
  57. 57. Slide 57 of 64 © Raastech, Inc. 2012 | All rights reserved.  The EDN Database Debug Log can be accessed at: http://<host>:<soaport>/soa-infra/events/edn-db-log  Changing the oracle.integration.platform.blocks.event.saq logger to TRACE:32 captures the body of the event message is available in the EDN trace Check Event Delivery Network (EDN)
  58. 58. Slide 58 of 64 © Raastech, Inc. 2012 | All rights reserved. SUMMARY
  59. 59. Slide 59 of 64 © Raastech, Inc. 2012 | All rights reserved.  Troubleshooting is part art, part product knowledge  Oracle SOA Suite 11g errors can mostly be classified into:  Runtime (or infrastructure) errors  Performance issues/errors  Deployment errors Summary
  60. 60. Slide 60 of 64 © Raastech, Inc. 2012 | All rights reserved.  For infrastructure errors:  Identify whether it is a composite or an infrastructure error  Consider increasing logger levels  Identifying the root cause of stuck threads may require some drill-down investigation Summary
  61. 61. Slide 61 of 64 © Raastech, Inc. 2012 | All rights reserved.  For performance issues:  Identify whether it is a server-wide performance issue, or specific to a single composite  Check overall system health, even the obvious areas  Obtaining composite instance performance metrics is easily done through SQL Summary
  62. 62. Slide 62 of 64 © Raastech, Inc. 2012 | All rights reserved.  For deployment errors:  Understand the ant compilation (i.e., packaging) and deployment processes  Understand adf-config.xml Summary
  63. 63. Slide 63 of 64 © Raastech, Inc. 2012 | All rights reserved.  Oracle SOA Suite 11g Administrator’s Handbook http://www.packtpub.com/oracle-soa-suite-11g- administrators-handbook/book  Chapter 6: Troubleshooting the Oracle SOA Suite 11g Infrastructure  “Highly recommended, a tour de force.” ~Mark Nelson, Oracle A-Team Book http://redstack.wordpress.com/2012/10/28/a-review-of-oracle-soa-suite-11g-administrators-handbook/
  64. 64. Slide 64 of 64 © Raastech, Inc. 2012 | All rights reserved. Harold Dost III Senior Consultant harold.dost@raastech.com @hdost Contact Information
  65. 65. Slide 65 of 64 © Raastech, Inc. 2012 | All rights reserved. Session #:185 Oracle SOA Suite 11g Troubleshooting Methodology ioug.org/eval Evaluation

×