3. Java Application Profiling: Tips and Tricks
● System or Kernel CPU
● Lock Contention
● Volatile Usage
● Data Structure Resizing
● Increasing Parallelism
● Other Tips
3
Strategies
4. Java Application Profiling: Tips and Tricks
● System or Kernel CPU
● Lock Contention
● Volatile Usage
● Data Structure Resizing
● Increasing Parallelism
● Other Tips
4
Strategies
9. Java Application Profiling: Tips and Tricks
● System or Kernel CPU
● Lock Contention
● Volatile Usage
● Data Structure Resizing
● Increasing Parallelism
● Other Tips
9
Strategies
11. Java Application Profiling: Tips and Tricks
● System or Kernel CPU
● Lock Contention
● Volatile Usage
● Data Structure Resizing
● Increasing Parallelism
● Other Tips
11
Strategies
14. Java Application Profiling: Tips and Tricks
● System or Kernel CPU
● Lock Contention
● Volatile Usage
● Data Structure Resizing
● Increasing Parallelism
● Other Tips
14
Strategies
17. Java Application Profiling: Tips and Tricks
● System or Kernel CPU
● Lock Contention
● Volatile Usage
● Data Structure Resizing
● Increasing Parallelism
● Other Tips
17
Strategies
Next we will talk about Java Application Profiling Tips and Tricks
Examples of using tools to solve performance issues. General
Optimization opportunity falls in to one of the following categories
Lets first look at System CPU
Example of function hotspot showing 65% of total cpu attributed to FIleOutputStream.write
Use of BufferedOutPutStream with a default size inplace of FIleOutputStream. BufferedOutputStream can also be specified
With a defsault size
If an explicit size is specified, consider a size that is a multiple of the operating systems page size since operating systems efficiently fetch
memory that are multiples of the operating system page size
Reduced cpu utilization from 45.182 to 6.655 sec after this change
Use of Java NIO can also improve performance in cases where large number of connections are open (ex chat application)
Next we will look at lock contention
Trace the time taken by __lwp_cond_wait and _lwp_park methods
Next we will look at use of volatile fields
Reduce False sharing occurrence due to use of update of volatile fields in code. In this case an update to a field in cache of core0 by causes cache reload on core1
In this example 11.4% of all char[] allocation result from call of expandCapacity method of StringBuilder. This suggests allocating a default size could improve performance overhead of reallocation of char[] for each char resize.
Next we will look at increasing parallelism
Reduce lock contention by identifying parallelism in task. 2 types of parallelism : task & data parallelism
Difference between DataParallelism (1-4) and Task Parallelism(5-9)
Same task are performed on different subsets of same data.
Synchronous computation is performed.
As there is only one execution thread operating on all sets of data, so the speedup is more
Amount of parallelization is proportional to the input size
Different task are performed on the same or different data
Asynchronous computation is performed
As each processor will execute a different thread or process on the same or different set of data, so speedup is less
Amount of parallelization is proportional to the number of independent tasks is performed
It is designed for optimum load balance on multiprocessor system
Timeline view of Solaris Studio 16-64 only thread 1 is active Investigate Paralellizabilirty of thread1