• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Real Life Java EE Performance Tuning

Real Life Java EE Performance Tuning






Total Views
Views on SlideShare
Embed Views



2 Embeds 6

http://www.slideshare.net 4
http://www.linkedin.com 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Real Life Java EE Performance Tuning Real Life Java EE Performance Tuning Presentation Transcript

    • Real Life Java EE Performance Tuning Matt Brasier Principal Consultant C2B2 Consulting LTD [email_address]
    • About Me
      • Professional Services Consultant
      • Customers include
        • Red Hat (JBoss)
        • BEA
        • Cape Clear
        • Government/Finance/Telecoms
      • C2B2 Consulting
        • SOA and Java EE consultancy
        • Fast, Reliable, Manageable, Secure
    • What we will cover
      • Philosophy
        • How I approach a performance problem situation
      • Enterprise Java Performance
        • What kind of things affect performance of Enterprise Systems
      • Case Study 1
        • A new version of the application runs slowly
      • Case Study 2
        • Logging in takes a long time in the live environment
      • Case Study 3
        • The application does not scale
    • What we will learn
      • Philosophy
        • Suggestions to keep in mind when looking at a performance problem
      • Tools
        • Suggested tools for looking at a performance problem
      • Techniques
        • How to use the tools, knowledge and skills to solve your performance problem
    • Philosophy
      • ‘ A good understanding’ is the best performance tuning tool
      • Prefer common and open source tools
      • Observe, Hypothesize, Tweak, Test
      • ‘ Trust no-one’
    • Classic Java performance problems
      • Memory leaks
        • Increased GC Time
      • Poor GC or JVM Memory configuration
      • CPU bound code
      • IO bound code
      • Memory bound code
        • Increased GC time
    • Enterprise Java Performance
      • CAVEAT: Consultancy Selection Bias
      • 80/20: 80% of time finding, 20% fixing
      • Many ‘Enterprise’ Java performance problems turn out not to be ‘classic’ performance bottlenecks
        • Infrastructure/Middleware performance
      • There are many factors that can affect the performance of an enterprise system
        • Not just code
    • Enterprise Java Performance
      • Not all Java EE performance problems are classical ‘Java performance problems’
      • Common types of Java EE performance problem
        • Resource starvation
        • Threading problems
        • ‘ Suboptimal configuration’
        • Network related problems
        • Scalability problems
    • A Good Understanding
      • Consider the system as a whole
      • Know how infrastructure components work
        • Not just what they do, but how they do it
      • How do the Java EE specifications say they should work?
    • Approach
      • Understand the system
      • Understand the environment
      • Understand the situation
      • Talk to people who know
        • But trust no-one
      • Take a look for myself
      • Observe, Hypothesize, Tweak, Test
        • Rinse and repeat
    • Case Study 1
    • Case Study 1
      • Existing customer calls
        • “ We deployed a new version of the application, and it is running a lot slower”
      • The Environment
        • Sun Java 5
        • WebLogic Server 9.2 Cluster (3 nodes)
        • WebLogic Integration 9.2 Cluster (3 nodes)
        • Documentum Document Management
        • Oracle Database
        • Solaris OS
    • Case Study 1
      • The System
        • Web Application
        • WLI based workflow system
      • The situation
        • New version deployed into the performance testing environment
        • Automated performance tests indicate the application is approximately 30% slower
    • Case Study 1
      • Observe
        • No monitoring in place
        • Some alerting, but no historical data
      • Hypothesize
        • If we had more monitoring, we would stand a better chance
      • Tweak
        • Put some monitoring in place
        • Hyperic HQ from SpringSource
    • Case Study 1
      • Test
        • Re-run tests
      • Observe
        • Monitoring indicates that one server is slower
          • Handling less requests per second
          • Lots of transaction timeouts
          • Higher CPU
          • Less network traffic
      • Tweak
        • Add more monitoring to the slow server
        • Examine log files
        • Thread dumps!
    • Case Study 1
      • Hypothesize
        • Thread dumps show lots of threads in logging code waiting to write to the log file
        • Log files for the slow server have DEBUG messages in them
          • The other servers don’t
      • “ The logging configurations are identical, the servers are configured with Maven”
        • Trust no one
      • Test
        • Log in to the server and manually check the logging configuration
    • Case Study 1
      • Solution
        • Debug logging was enabled on one server
        • Turned debug logging off - the system was then about the same speed as the old release
    • Hyperic HQ
    • Hyperic HQ
      • Monitoring tool
        • Not a profiling tool
      • Historical data
        • Trends
        • Abnormal behaviour
        • ‘ Hot’ spots
      • Wide variety of data
        • JVM level statistics
        • JMX statistics
        • OS statistics
    • Thread Dumps
      • My Number 2 tool for finding performance problems
        • CTRL-BREAK in windows
        • Kill -3 on Unix/Linux
        • Jstack tool
        • Available from consoles of many application servers
      • All threads in the VM and what they are doing at that moment
    • Thread Dumps
      • A number of thread dumps over time gives a good picture
        • Any operation that appears a lot is a suspect
        • Understand what ‘normal’ thread dumps look like
      • http://java.sun.com/developer/technicalArticles/Programming/Stacktrace/
    • Thread Dump
    • Thread Dumps
      • Look near the top of each stack
      • Look for stacks with your code in them
      • Look for long stacks
      • Look for deadlocks and other threading issues
    • The Understanding
      • What does a normal WebLogic thread dump look like?
      • It is not normal to see logging code frequently in a thread dump
      • Lots of threads all waiting on a single lock object is a Bad Thing™
      • If three servers are supposed to do the same thing, their thread dumps should look similar
        • Over time
    • Lessons
      • Thread dumps hold a lot of information
      • Infrastructure configuration faults are more common than infrastructure bugs
      • Automated/continuous build and deploy solutions are no silver bullet
        • Check the results yourself
      • Believe your ‘instincts’
    • Case Study 2
    • Case Study 2
      • Customer Call
        • “ We deployed our application into the live environment and it takes several minutes for users to log in”
      • Environment
        • Apache web servers
        • WebLogic Portal 8.1 Cluster (2 nodes)
        • Oracle Database
        • Windows Server 2003
        • Bespoke Single Sign On server
    • Case Study 2
      • The System
        • Web application based on WSRP portlets
        • Oracle database storing user data
      • The Situtation
        • The first users to log-in in the morning find that it takes several minutes
        • After the first few log-ins, the application runs fine
    • Case Study 2
      • Hypothesize
        • The bespoke Single Sign On server makes me suspicious
          • Bespoke code is tested less
      • Test
        • Turn on debug logging for the SSO implementation
        • Observe timings of log messages
    • Case Study 2
      • Observe
        • The logs indicate that the SSO log-in is proceeding as expected
        • It appears that loading the users profile data from the database is taking a long time
      • Hypothesize
        • TCP timeouts when connecting to the database due to a firewall
    • Case Study 2
      • Test
        • Observe the connection pool statistics in the WebLogic console
        • The console indicates that a large number of connections have been opened during the time the application has been running
          • Connections are not normally closed and re-opened
        • See how long you need to leave the system before the problem occurs
    • Case Study 2
      • Solution
        • Discussions with the networking team indicated that there was a firewall, configured to silently terminate network connections that were Idle for 60 minutes
        • Set WebLogic to test connections after they have been idle for 50 minutes.
    • Lessons
      • Consider the system as a whole
        • Hardware
        • Networking
        • OS
        • Middleware
        • Application
    • The Understanding
      • Firewalls are often configured to silently terminate idle TCP connections
      • The TCP protocol requires that a connection is closed by both sides, or times out
        • The time out is several minutes
      • In a healthy WebLogic connection pool, the number of connections opened since the server started = the maximum number in the pool
    • Case Study 3
    • Case Study 3
      • Customer call
        • “ It takes about 20 seconds to render a page, and the performance does not scale”
      • Environment
        • WebLogic Portal 9.1 Cluster (2 nodes)
        • Oracle 10g Database
        • Red Hat Enterprise Linux
    • Case Study 3
      • The System
        • Online content delivery system
        • WebLogic Portal with a commercial set of portlets
      • The Situation
        • Two problems
          • Running the performance tests with 20 threads in JMeter is twice as slow as running the tests with 10 threads
          • Viewing a content item takes around 20 seconds
    • Case Study 3
      • Handle the two problems separately
        • They may be related, they may not be
    • Case Study 3
      • Observe
        • Viewing a content item takes around 16 seconds on my laptop
      • Test
        • Is the rendering speed dependent on the browser used?
        • Is the rendering speed dependent on the client machine?
        • What does the page source look like?
    • Case Study 3
      • Observe
        • In Opera the page renders quickly except for the table of contents on the left
        • In Firefox, the whole page renders at the same time
        • The page renders faster in IE and Opera than firefox
        • The page renders faster on faster machines
        • There is a lot of Javascript, and AJAX is used to load the table of contents
    • Case Study 3
      • Hypothesize
        • The AJAX rendering of the TOC is taking a long time, and slowing down the whole page load
      • Tweak
        • Remove the TOC from the page
        • Disable JavaScript in the browser
      • Test
        • The page renders in less than 2 seconds
    • Case Study 3
      • Hypothesize
        • JMeter does not execute the javascript, so the poor performance of JMeter is not related to the poor page load speed
    • Case Study 3
      • Solution 1
        • The portlet developers have used AJAX to render the table of contents for a content item, this is much slower than just constructing the table of contents on the server side
        • Rewrite the portlet to construct the table of contents on the server side
        • Developers sometimes select a technology to enhance their CVs, not to implement a business requirement
    • Case Study 3
      • Problem 2 – Scalability
      • Observe
        • Running the tests on JMeter with 10 users, each page response takes 5s
        • Running the test with 20 users each page response takes 12s
        • JMeter is being run on an old laptop, which is at 100% CPU in both cases
    • Case Study 3
      • Hypothesize
        • As the test machine is at 100% CPU, it is the performance of JMeter that is being measured, not the performance of WebLogic
      • Observe
        • WebLogic is running at around 2% CPU usage, with many idle threads
    • Case Study 3
      • Tweak
        • Run the test from a number of more modern machines, and make sure each one does not exceed 70% CPU
      • Observe
        • Four machines can each run 20 threads and get responses in 1.5 seconds, and WebLogic is still running at around 5% CPU and not struggling
    • Case Study 3
      • Solution
        • The problem was that the test client was not able to generate the loads requested, resulting in the performance of the test client being measured
        • Use a larger test client
    • Useful tools
      • Ethereal/Wireshark
        • Network traffic sniffer
        • See when requests/responses were sent/received
      • Firebug + YSlow
        • Firefox plugin for performance analysis
    • Lessons
      • Separate problems should initially be prioritised and investigated separately
        • Keep in mind that they may be related
      • Ensure the test system can generate the required load
        • It should have plenty of free resources available
    • Lessons
      • The consultant effect
        • Take a step back
        • Get a fresh perspective
    • The Understanding
      • A slow test client will give slow results
      • Client side rendering is usually less efficient than server side
      • WebLogic is normally fast!
    • What did we learn?
      • Simple tools can provide a lot of information
      • Understanding how the system should behave will help highlight possible causes
      • Experience is vital
        • Write a log of what you find
      • Take a step back from the problem
        • Use a second pair of eyes
    • What did we learn?
      • Philosophy
        • Understand they system as a whole
        • A deep understanding of how it should work
      • Tools
        • Thread dumps
        • Monitoring tools
        • Packet sniffing
      • Techniques
        • Observe, Hypothesize, Tweak, Test
    • Questions
    • Session Evaluation
      • Please complete a session evaluation and turn it into any conference staff member or at the registration desk. Thank you.