Advertisement

More Related Content

Slideshows for you(20)

Similar to Iib v10 performance problem determination examples(20)

Advertisement

Iib v10 performance problem determination examples

  1. © 2009 IBM Corporation IBM Integration Bus V10 Performance How to analyse your system to optimise performance and throughput Martin Ross - martin.ross@uk.ibm.com
  2. © 2015 IBM Corporation 14 July 2015 Overview  The purpose of this presentation is to demonstrate how to find the cause of poor performance for an IBM Integration Bus node (broker) for two different types of problem.  The examples are obtained on a Windows system but the principles of investigation and problem determination apply equally on all platforms. The system level tools will differ though.
  3. © 2015 IBM Corporation 14 July 2015 Agenda  Introduction  Tools  Techniques  Demonstration
  4. © 2015 IBM Corporation 14 July 2015 What are the main performance costs in message flows? ZYX…CBA Parsing Tree Navigation Root.Body.Level1.Level2. Level3.Description.Line[1]; Tree Copying Set OutputRoot = InputRoot; Resource Access Processing Logic
  5. © 2015 IBM Corporation 14 July 2015 Integration Bus Processes Integration Node Integration Server Application Application Message flows Message flows Libraries Libraries Integration Server [n] Application Application Message flows Message flows Libraries Libraries  Bipservice – Lightweight and resilient process that starts and monitors the bipbroker process – If the bipbroker process fails, bipservice will restart it  Bipbroker – A more substantial process. Contains the deployment manager and administrative agent. All commands, toolkit connections and WebUI go through this process. – Responsible for starting and monitoring the biphttplistener, bipMQTT and DataFlowEngine processes. – If any process fail, bipbroker will restart them.  BipMQTT – Handles MQTT events  Biphttplistener – Runs the brokerwide HTTP connector for HTTP and SOAP nodes.  DataFlowEngine – Runtime engine for all deployed resources.
  6. © 2015 IBM Corporation 14 July 2015 Which resources and how much  Understand typical resource utilisation – need to understand if resource utilisation is higher than expected or running as normal...  In busy times expect to use what is needed (!) – Exactly what will depend on the configuration and the applications – Typical to use CPU and memory plus I/O to some level  In quiet times Message Broker and MQ processes – Should use very little CPU – Should use very little I/O capacity – Will retain memory  Some memory sizes whilst running the Coordinated Request Reply sample – Bipservice 3.7 MB – Bipbroker 112 MB – Biphttplistener 35 MB – DataFlowEngine 154 MB • Can use from ~100 MB to GigaBytes depending on number of flows, complexity of the message flow, the size of the messages  MQ processes – Expect it to be less than IBM Integration Bus (76 MB for a simple queue manager) – Will depend on number of open queues, channels, queue buffer sizes etc.
  7. © 2015 IBM Corporation 14 July 2015 Tools that are needed  Monitoring tools – At the operating system level to observe • System resource usage – CPU, memory, I/O activity • Heaviest resource users – At the component level to observe • Behaviour within the particular component (MQ / IBM Integration Bus) – Both types of tools are needed • They have different views of the world • They are complimentary  Driving tools – Needed to generate a continuous workload • Important to assess performance after warm-up during sustained activity
  8. © 2015 IBM Corporation 14 July 2015 UNIX tools  Vmstat  Iostat  Nmon  filemon System Configuration: lcpu=64 mem=8192MB kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 0 1977672 25823 0 0 0 0 0 0 3 958 696 4 0 96 0 1 0 1977838 25719 0 2 0 98 100 0 29 2941 2250 4 0 96 0 1 0 1977685 25872 0 0 0 0 0 0 2 636 483 4 0 96 0 System configuration: lcpu=64 drives=5 paths=6 vdisks=2 tty: tin tout avg-cpu: % user % sys % idle % iowait 0.0 29.5 3.6 0.1 96.2 0.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk3 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk0 0.0 4.0 1.0 8 0 hdisk1 0.0 0.0 0.0 0 0 cd0 0.0 0.0 0.0 0 0
  9. © 2015 IBM Corporation 14 July 2015 Windows tools – Process Explorer  Watch system activity in detail on Windows  Watch – CPU usage – Commit charge – I/O activity – Physical memory history – Summary information – Individual processes  Download from https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
  10. © 2015 IBM Corporation 14 July 2015 Process Explorer  DataFlowEngine.exe: – This is the Integration Server  amqzlaa0.exe – This is the MQ agent for LOCAL connections (including the broker)  amqrmppa.exe – This is the MQ agent for CLIENT connections  Can quickly see system is busy  Customise by selecting columns of choice
  11. © 2015 IBM Corporation 14 July 2015 Process Explorer – Task Information
  12. © 2015 IBM Corporation 14 July 2015 Find Integration Server name from PID mqsilist DEMO -d2 ----------------------------------- BIP1286I: Integration server 'default' on integration node 'DEMO' is running. Number of message flows that are enabled to run: '4'. Number of applications that are enabled to run: '2'. Process ID: '8248' UUID: 'e1306ebc-3c3a-43c2-b18a-bbdb99e07d5c' Short description: '' Long description: '' BIP8071I: Successful command completion.----------------------------------- BIP1286I: Integration server 'default' on integration node 'DEMO' is running. Number of message flows that are enabled to run: '4'. Number of applications that are enabled to run: '2'. Process ID: '8248' UUID: 'e1306ebc-3c3a-43c2-b18a-bbdb99e07d5c' Short description: '' Long description: '' BIP8071I: Successful command completion. -bash-4.1$ ps -ef | grep DataFlowEngine mqm 4331 4302 0 13:12 pts/0 00:00:00 grep DataFlowEngine mqm 28788 28701 99 12:57 ? 00:41:56 DataFlowEngine CSIM d2cd939a-a7a3-46ce-8168- b89c77744511 default
  13. © 2015 IBM Corporation 14 July 2015 Key tools at the component level  Integration Bus – User trace – Trace nodes – Activity Log – WebUI • Accounting & Statistics: Compare flow statistics at the node (broker), server (execution group), container (application or library) or at an individual message flow level • Resource Statistics: View resource use at the execution group level  MQ Explorer  Java Healthcenter
  14. © 2015 IBM Corporation 14 July 2015 Statistics scope Node (broker) Server (execution group) Node Message Flow Thread Message Model Terminals Resource Statistics Accounting & Statistics
  15. © 2015 IBM Corporation 14 July 2015 Accounting & Statistics  Dynamic reporting of message flow usage for • Problem determination • Chargeback • Profiling  Data at two levels 1. Universal (CPU cost, IO, elapsed time) 2. Broker specific (messages, commits, backouts)  A Choice of intervals • Short or snapshot • Long or archive  Attach a label to the information – AccountingOrigin to collect data to identify and collate this information according to the specific origin of a message, even in consolidated flows. Choice of destination 1. Trace 2. PubSub 3. SMF (z/OS only)
  16. © 2015 IBM Corporation 14 July 2015 Subscribing to Accounting & Statistics  Publish/Subscribe data is published on the topic $SYS/Broker/brokerName/StatisticsAccounting/recordType/executionGroupLabel/messageFlowLabel A subscription for $SYS/Broker/+/StatisticsAccounting/+ receives all statistics for all brokers Notes: The following three characters have a special meaning: – The topic level separator "/" – The multilevel wildcard "#" – The single-level wildcard "+"
  17. © 2015 IBM Corporation 14 July 2015 Resource Statistics  Dynamic reporting of the performance and operating details of resources used by execution groups • Problem determination • Profiling  Supported resources 1. CICS 2. CORBA 3. FTEAgenet 4. JDBCConnectionPools 5. JVM 6. ODBC 7. SOAPInput 8. Security 9. Outbound Sockets  Data • XML messages distributed using PubSub • Access as raw data or view in WebUI  Regular reporting • Data published approximately every 20 seconds
  18. © 2015 IBM Corporation 14 July 2015 Subscribing to Resource Statistics  Data is published on the topic $SYS/Broker/brokerName/ResourceStatistics/executionGroupLabel A subscription for $SYS/Broker/+/ResourceStatistics/+ receives all statistics for all brokers Notes: The following three characters have a special meaning: – The topic level separator "/" – The multilevel wildcard "#" – The single-level wildcard "+"
  19. © 2015 IBM Corporation 14 July 2015 WebUI – Accounting & Statistics  Using the WebUI in IBM Integration Bus v10: – Control statistics at all levels – Easily view and compare flows, helping to understand which are processing the most messages or have the highest elapsed time – Easily view and compare nodes, helping to understand which have the highest CPU or elapsed times – View all statistics metrics available for each flow – View historical flow data
  20. © 2015 IBM Corporation 14 July 2015 WebUI – Resource Statistics  View resource statistics for resource managers in IIB such as: – JVM – ODBC – JDBC – parsers, etc.
  21. © 2015 IBM Corporation 14 July 2015 MQ Explorer
  22. © 2015 IBM Corporation 14 July 2015 IBM Support Assistant and Java Health Centre  Java Health Centre is provided as part of the IBM Support Assistant – Offers very low overhead monitoring tool – Runs along side an IBM Java application  Get visibility, monitoring and profiling in the following application areas: – Performance • Java method profiling • Lock analysis • Garbage collection – Memory – System – Java Class – File input and – Object  Enable the application JVM prior to use – IBM_JAVA_OPTIONS=-Xhealthcenter
  23. © 2015 IBM Corporation 14 July 2015 Demonstration of analysing performance issues  Identify problems in two message flows using – Process Explorer – WebUI Statistics – MQ Explorer – Java Healthcenter Java Compute TransformCoordinated Request Reply
  24. © 2015 IBM Corporation 14 July 2015 Demonstration 1 Analysing a performance problem in the Coordinated Request Reply Scenario
  25. © 2015 IBM Corporation 14 July 2015 Coordinated Request Reply message flows  Consists of three message flows – Request • Converts incoming message from XML to CWF • Saves the incoming message in a queue for subsequent reply processing • Writes a message for the back end reply message flow – BackendReplyApp • Sets the completion time in the message • Writes a reply message – Reply • Reads the message from the back end message flow • Retrieves the original message saved by the request message flow • Writes an output message
  26. © 2015 IBM Corporation 14 July 2015 Coordinated Request Reply queues  The queues – Request – BackendReplyApp – Reply CSIM_SERVER_IN_Q GET_REPTO_STORE GET_REPTO_STORE GET_BACKEND_REPGET_BACKEND_REQ GET_BACKEND_REQ GET_BACKEND_REP CSIM_COMMON_REPLY_Q
  27. © 2015 IBM Corporation 14 July 2015 Run and investigate Steps 1. Ensure all components are started and the applications works as expected - Message flows, databases, external applications etc. 2. Start a load generator [JMSPerfharness in this case] 3. Look at activity - Is processing happening at the expected rate? - Is CPU usage as expected? - Is memory usage as expected? 4. If things do not seem as expected - Look for build up of messages - Poor service times 5. Enable and view statistics 6. Analyse statistics 7. Examine message flows
  28. © 2015 IBM Corporation 14 July 2015 Step 1 – Check flows are running using the WebUI  Check the server is running  Check the flows are running  Check the event/sys log for any errors  Processing messages and no errors
  29. © 2015 IBM Corporation 14 July 2015 Step 2 – Start a load generator  Run JMSPerfharness – Using 10 threads  All threads start successfully – Each thread PUTs a message then GETs a message so should be no messages on queues for any period of time  Check event/sys log for any error messages
  30. © 2015 IBM Corporation 14 July 2015 Step 3 – Look at CPU activity  Messages being processed but: – Rate is low, much lower than expected – Very little CPU being used • Integration Server does not register any CPU activity
  31. © 2015 IBM Corporation 14 July 2015 Step 4 – Look for a build up of messages  Key queues are  Build up of messages on queues: – GET_REPTO_STORE – GET_BACKEND_REQ  What does this mean? GET_BACKEND_REQ GET_BACKEND_REP CSIM_SERVER_IN_Q GET_REPTO_STORE GET_BACKEND_REQ GET_BACKEND_REP GET_REPTO_STORE CSIM_COMMON_REPLY_Q Request BackendReplyApp Reply
  32. © 2015 IBM Corporation 14 July 2015 Step 4 – Look for a build up of messages...  Looking at the flows – Queue GET_REPTO_STORE is used by Request and Reply message flows – Queue GET_BACKEND_REQ is used by BackendReplyApp message flow – GET_REPTO_STORE is used mid-flow (so flows using this are less likely to be the problem) – GET_BACKEND_REQ is the input queue for the BackendReplyApp • Indicates flow is not running fast enough or not enough instances allocated  Need to investigate what is happening with BackendReplyApp – For this use WebUI flow statistics
  33. © 2015 IBM Corporation 14 July 2015 Step 5 – Enable flow statistics  Start and stop statistics using the WebUI for: – All flows in a server – All flows in a container – Individual flows
  34. © 2015 IBM Corporation 14 July 2015 Step 5 – View statistics  Select the statistics view  Drill down to the problem flow  Start by comparing flows  Flow analysis view for most detail
  35. © 2015 IBM Corporation 14 July 2015 Step 6 – Compare flows  Compare flows to determine which one might be causing the problem  We can see that the BackendReplyApp flow has an average elapsed time of 1,000.9 milliseconds. It only has 1 active thread, and has processed 20 messages in the 20 second statistical snapshot period.  This matches the rate we see in JMSPerfHarness!
  36. © 2015 IBM Corporation 14 July 2015 Step 6 – Analyse the flow  Display historical flow details such as message rate, CPU and elapsed time  View all nodes within the flow to determine and sort by average elapsed and CPU times  The compute node Modify_CompletionTime seems to be a problem!  What does high elapsed time and low CPU time suggest the problem might be?
  37. © 2015 IBM Corporation 14 July 2015 Step 7 – Review the code  Having worked out which node is causing the problem  We can quickly see why the node is taking 1 second elapsed time but little CPU
  38. © 2015 IBM Corporation 14 July 2015 Problem found!!  1 second sleep in the compute node within the message flow is causing slow processing times and no CPU usage – Matches the observations at the start • Low CPU and low message rate  Unlikely to be so easy in future but slow service times, like slow synchronous web service invocations would have the same effect  If it was slow web service response times then allocate more additional instances to improve processing rate
  39. © 2015 IBM Corporation 14 July 2015 Summary of steps for this investigation  Use a systemic approach – Key steps used were 1.Ensure all components are started and the applications works as expected - Message flows, databases, external applications etc. 2.Start a load generator [JMSPerfharness in this case] 3.Look at activity - Is processing happening at the expected rate? - Is CPU usage as expected? - Is memory usage as expected? 4.If things do not seem as expected - Look for build up of messages - Poor service times 5.Enable and view statistics 6.Analyse statistics 7.Examine message flows  It is very important to – Use tools • System level and component level – Start at a high level – system level and then close-in on the problem
  40. © 2015 IBM Corporation 14 July 2015 Demonstration 2 Analysing a performance problem in the Java Compute Transform Scenario
  41. © 2015 IBM Corporation 14 July 2015 JavaComputeTransform message flows  Consists of one message flow – JavaCompute • Reads an XML message • Transforms to a different format using a Java Compute node JAVA_COMPUTE_IN JAVA_COMPUTE_OUT
  42. © 2015 IBM Corporation 14 July 2015 What is the problem we need to solve?  The problem is characterised by – Low message rate – High CPU usage at both system and Integration Server level – Sufficient messages on the input queue  Likely issue is one of high CPU usage in a message flow – But which flow and which node?
  43. © 2015 IBM Corporation 14 July 2015 Compare the flows  All of the elapsed and CPU time is in the JavaCompute message flow, so continue investigation here
  44. © 2015 IBM Corporation 14 July 2015 Finding the processing Node for investigation  The majority of the elapsed and CPU time within the flow is spent in the JavaCompute Node  What might cause this?  As this is a Java Compute Node continue investigation using the Java Healthcenter
  45. © 2015 IBM Corporation 14 July 2015 Find the Integration Server port for Java Health Center Environment variable: IBM_JAVA_OPTIONS=-Xhealthcenter Opens ports starting 1972, the Integration Server running the JavaComputeTransform scneario is using port 1974
  46. © 2015 IBM Corporation 14 July 2015 Alternate method for finding the port number > mqsilist DEMO -d2 ----------------------------------- BIP1286I: Integration server 'default' on integration node 'DEMO' is running. Number of message flows that are enabled to run: '4'. Number of applications that are enabled to run: '2'. Process ID: '7284' UUID: 'e1306ebc-3c3a-43c2-b18a-bbdb99e07d5c' Short description: '' Long description: '' BIP8071I: Successful command completion. > netstat -a -b -n -o [runmqlsr.exe] TCP [::]:1972 [::]:0 LISTENING 7352 [bipbroker.exe] TCP [::]:1973 [::]:0 LISTENING 7748 [biphttplistener.exe] TCP [::]:1974 [::]:0 LISTENING 7284 [DataFlowEngine.exe] TCP [::]:4417 [::]:0 LISTENING 7352 [bipbroker.exe] TCP [::]:49152 [::]:0 LISTENING 924
  47. © 2015 IBM Corporation 14 July 2015 Invoking the Java Health Center
  48. © 2015 IBM Corporation 14 July 2015 Attaching to the Integration Server JVM
  49. © 2015 IBM Corporation 14 July 2015 Connect to a port
  50. © 2015 IBM Corporation 14 July 2015 Connection complete and ready to analyse
  51. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – CPU
  52. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – Classes
  53. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – Environment
  54. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – Garbage Collection
  55. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – I/O
  56. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – Locking
  57. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – Native Memory
  58. © 2015 IBM Corporation 14 July 2015 Analysis and Recommendations – Profiling
  59. © 2015 IBM Corporation 14 July 2015 The cause  Having worked out which node is causing the problem  We can quickly see why the node is consuming a lot of CPU  A call to the method bubble_sort() just before propagating out of the node is sorting the entire output message
  60. © 2015 IBM Corporation 14 July 2015 If you suspect there is a product problem  Identify the problem as best you can  Find the simplest test that recreates the problem  Collect the data identified in the Must Gather list – For IIB http://www-01.ibm.com/support/docview.wss?rs=849&uid=swg21209857 – For MQ: http://www-01.ibm.com/support/docview.wss?uid=swg21229861#MG6
  61. © 2015 IBM Corporation 14 July 2015 Summary  Wide range of tools available covering operating system and component performance – Expect to use multiple tools – After all it is important to understand what is happening at different levels – Demonstration has shown how to use the key tools for MQ and IIB to debug a problem  Practice before hand – Being familiar with the tools is a great help in a crisis – Learning a new tool and solving a crisis is not a good combination  Know your applications and systems – What is normal in terms of processing rate, CPU usage etc. – This information allows to know whether there is a problem and to what extent
  62. © 2015 IBM Corporation 14 July 2015 Additional Information  WebSphere Message Broker: Designing for Performance – http://www-01.ibm.com/support/docview.wss?rs=849&uid=swg24006518  WebSphere Message Broker: Message display, test & performance utilities (IH03) – http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg24000637  IBM Monitoring and Diagnostic Tools for Java – Getting started with Health Center – http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/getting_started.html  IBM Monitoring and Diagnostic Tools for Java – Health Center – http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/  IBM Monitoring and Diagnostic Tools for Java – Knowledge Center – http://www-01.ibm.com/support/knowledgecenter/#!/SS3KLZ/SS3KLZ/welcome_tools_family.html
  63. © 2015 IBM Corporation 14 July 2015 Backup chart  MQ processes  Additional Instances usage and tuning
  64. © 2015 IBM Corporation 14 July 2015 WebSphere MQ V7 Processes Task Function AMQALMPX The checkpoint processor that periodically takes journal checkpoints. AMQZMUC0 Utility manager. This job executes critical queue manager utilities, for example the journal chain manager. AMQZXMA0 The execution controller that is the first job started by the queue manager. It handles MQCONN requests, and starts agent processes to process WebSphere MQ API calls AMQZFUMA Object authority manager (OAM) AMQZLAA0 Queue manager agents that perform most of the work for applications that connect to the queue manager using MQCNO_STANDARD_BINDING. AMQZLAS0 Queue manager agent. AMQZMUF0 Utility Manager AMQZMGR0 Process controller. This job is used to start up and manage listeners and services. AMQZMUR0 Utility manager. This job executes critical queue manager utilities, for example the journal chain manager. AMQZDMAA Deferred Message Processor AMQFQPUB Publish/subscribe process. AMQFCXBA Broker worker job. RUNMQBRK Broker control job. AMQRMPPA Channel process pooling job. AMQCRSTA TCP/IP-invoked channel responder.
  65. © 2015 IBM Corporation 14 July 2015 WebSphere MQ V7 Processes Task Function AMQCRS6B LU62 receiver channel and client connection. AMQRRMFA Repository manager for clusters. AMQCLMAA Non-threaded TCP/IP listener. AMQPCSEA PCF command processor that handles PCF and remote administration requests. RUNMQTRM Trigger monitor. RUNMQDLQ Dead letter queue handler. RUNMQCHI The channel initiator. RUNMQCHL Sender channel job that is started for each sender channel. RUNMQLSR Threaded TCP/IP listener. AMQXSSVN Shared memory servers. AMQRCMLA Channel MQSC and PCF command processor. AMQZTRCN Trace.  Number present at any time will vary – Dependent on configuration, applications running, etc. – Some will always be present, such as AMGZXMA0 (Execution Controller)
  66. © 2015 IBM Corporation 14 July 2015 Additional Instances usage and tuning  Integration Server level data contains the following data for each message flow in it: – MessageFlowName – TotalElapsedTime – MaximumElapsedTime – MinimumElapsedTime – TotalCPUTime – MaximumCPUTime – MinimumCPUTime – CPUTimeWaitingForInputMessage – ElapsedTimeWaitingForInputMessage – TotalInputMessages – TotalNumberOfTimeOutsWaitingForRepliesToAggregateMessages – TotalSizeOfInputMessages – MaximumSizeOfInputMessages – MinimumSizeOfInputMessages – NumberOfThreadsInPool – TimesMaximumNumberOfThreadsReached – TotalNumberOfMQErrors – TotalNumberOfMessagesWithErrors – TotalNumberOfErrorsProcessingMessages – TotalNumberOfCommits – TotalNumberOfBackouts  Fields NumberOfThreadsInPool and TimesMaximumNumberOfThreadsReached show for every message flow the number of additional instances allocated and the number of times they were all used – Use this data to determine if: • More additional instances are required • Too many are allocated
  67. © 2015 IBM Corporation 14 July 2015 Additional Instances usage and tuning  % Time Thread Pool Limit Reached = TimesMaximumNumberOfThreadsReached / TotalInputMessages
Advertisement