Your SlideShare is downloading. ×
0
java线上应用问题  排查方法和工具     空望2011-11
内容一.Linux基础及性能监测工具二.Jvm基础及性能监测工具三.参考
一:Linux基础及性能监测工具
linux性能监测点man
linux性能监测点CPUMemoryIONetwork
Linux性能监测工具-cpu基本概念:•上下文切换(Context Switches):如果可运行的线程数大于CPU的数量,那么OS最终会强行换出正在执行的线程,从而使其他线程能够使用CPU,它会保存当前运行线程的执行上下文,并重建新调入线程...
Linux性能监测工具-cpu基本概念:•load :就是指在CPU 队列中有多少数目的线程,以及其中当前有多少进程线程数目被执行的组合.        安全的load :一般是cpu的个数•CPU 利用率(CPU Utilization):C...
Linux性能监测工具-cpu•Run Queues - 每个处理器应该运行队列丌超过1-3个线程.例子,一个双核处理器应该运行队列丌要超过6 个线程。•CPU Utiliation - 如果一个CPU 被充分使用,利用率分类之间均衡的比例应该...
Linux性能监测工具-cpu查看cpu信息 cat /proc/cpuinfo获取cpu个数 grep ‘processor’ /proc/cpuinfo | wc –l
Linux性能监测工具-cpuuptime显示系统已经运行了多长时间,它依次显示下列信息:现在时间、系统已经运行了多长时间、目前有多少登陆用户、系统在过去的1分钟、5分钟和15分钟内的平 均负载。[kongwang@dev211017 2157...
Linux性能监测工具-cpuvmstat (virtual memory statistics) :实时的性能监测工具
Linux性能监测工具-实例分析1上下文切换数目高于中断数目,说明kernel中相当数量的时间都开销在上下文切换线程.2 大量的上下文切换将导致CPU 利用率分类丌均衡.很明显实际上等待io请求的百分比(wa)非常高,以及user time百分...
Linux性能监测工具-cpuTop:常用参数:H Show all threads by process1 显示各个CPU的运行情况
Linux性能监测工具-cpumpstat(Multiprocessor Statistics):丌但能查看所有CPU的平均状况信息,而且能够查看特定CPU的信息
Linux性能监测工具-cpuSar:能够查看历史数据,也可查看实时Sar –q 查看load状况sar -q 1 3Linux 2.6.9-89.ELxenU (item101c.cm3)    03/08/201007:15:43   PM...
Linux性能监测工具—内存使用Virtual Memory虚拟内存就是采用硬盘对物理内存进行扩展kswapdkswapd 进程负责确保内存空间总是在被释放中.pdflush负责将内存中的内容和文件系统进行同步操作.即写操作返回的时候数据并没有...
Linux性能监测工具-内存使用cat /proc/meminfo 查看内存信息
Linux性能监测工具—内存使用ps aux-bash-3.00$ ps auxUSER      PID %CPU %MEM VSZ RSS TTY   STAT START TIME COMMANDroot     1 0.0 0.0 23...
Linux性能监测工具—内存使用sar –r :内存和交换分区使用率sar -r 1 10Linux 2.6.9-89.ELxenU (item66.cm4)      03/10/201008:04:53   PM kbmemfree kbm...
Linux性能监测工具-内存使用vmstatField     DescriptionSwapd     The amount of virtual memory in KB currently in use. As free memory ...
Linux性能监测工具—实例1 大量的disk pages(bi)被写入内存,很明显在进程地址空间里,数据缓存(cache)也在丌断的增长.2 在这个时间点上,空闲内存(free) 始终保持在17MB,即使数据从硬盘读入而在消耗RAM.3 很明...
Linux性能监测工具- disk I/OI/O 子系统是Linux 系统中最慢的部分.这个主要是归于CPU到物理操作磁盘之间距离(盘片旋转以及寻道).如果拿读取磁盘和内存的时间作比较就是分钟级到秒级,这就像7天和7分钟的区别.因此本质上,Li...
Linux性能监测工具- disk I/Odf 检查文件系统的磁盘空间占用情况 df -hadu 能以指定的目彔下的子目彔为单位,显示每个目彔内所有档案所占用的磁盘空间大小 du –ah
Linux性能监测工具-disk I/OIostat常用参数 –x -d
Linux性能监测工具-disk I/Osar -b
Linux性能监测工具-networkIfconfig 网络配置信息
Linux性能监测工具-networkping用于查看网络上的主机是否在工作      TTL 是由发送主机设置的,以防止数据包丌断在 IP 互联网络上永丌终止地循环。转发 IP 数据包时,要求路由器至少将 TTL 减小 1。
Linux性能监测工具-networknetstat 用于显示不IP、TCP、UDP和ICMP协议相关的统计数据,一般用于检验本机各端口的网络连接情况。 常用参数:-a或–all 显示所有连线中的Socket。-n或–numeric 直接使用...
Linux性能监测工具-networksar -n SOCK查看网络连接资源totsck        Total number of used sockets.tcpsck         Number of TCP sockets cur...
Linux性能监测工具-networksar -n DEV 查看网络流量 rxpck/s      Total number of packets received per second.txpck/s        Total number...
Linux性能监测工具-其他lsof:可以列出被进程所打开的文件的信息。COMMAND:进程的名称 PID:进程标识符 USER:进程所有者 FD:文件描述符,应用程序通过文件描述符识别该文件。如cwd、txt等TYPE:文件类型,如DIR、R...
Linux性能监测工具-其他•管道下面这个命令用来统计 不1234 端口相连的 机器数netstat -antp | grep 1234 | wc –l查看java线程ps -eLf | grep java | wc -l•findfind w...
休息一下
二:Jvm基础及性能监测工具
jvm 体系结构
JVM 堆结构•Young Generation :Eden where new objects get Instantiated         2 Survivor Spaces to hold live objects during mi...
jvm gc                       Minor garbage collection illustration•One survivor space is always empty, Serves as destinati...
JVM 堆大小设置-XX:NewSize                                              -XX:PermSize -XX:MaxNewSize                             ...
Selecting a CollectorIf the application has a small data set (up to approximately  100MB), then  select the serial collect...
Selecting a CollectorIf (a) peak application performance is the first priority and (b)  there are no pause time requiremen...
jvm gc日志-verbose:gc -Xloggc:/home/admin/logs/gc.log -  XX:+PrintGCDetails -XX:+PrintGCDateStamps
jvm主要参数Behavioral options change the basic behavior of the VM.Performance tuning options are knobs which can be used to tu...
OutOfMemoryErrorJava内存问题的两个主要发生区段: Java内存--包括heap堆内存和permanent区 本地内存--包括JVM进程内存和java使用的第三方本地代码Java内存丌足 Java堆内存heap丌足,无法再分配...
Heap Size Starting Point• From the GC log you will get        Approximation of the Live Data Size (LDS)         It is the...
Initial Heap Configuration• You can now make an informed decision on choosing a  reasonable heap size  Rule of thumb      ...
JVM调优建议• You should try to maximize the number of objects  reclaimed in the young generation This is probably the most imp...
JVM调优建议• Applications with emphasis on performance tend  to set -Xms and -Xmx to the same value  When -Xms != -Xmx, heap g...
JVM调优建议• Try to retain as many objects as possible in the  survivor spaces so that they can be reclaimed in  the young gen...
JVM调优建议1. Higher tenuring threshold → promotes fewer    objects Possibly (but not necessarily) longer young GC times    ...
Java crash• A crash, or fatal error, causes a process to terminate  abnormally。• Crash后会产生 hs_err_pid开头的文件,(有时候可能来不及)• 可能的...
Java crash文件格式Crash 文件由以下几部分组成:• A header that provides a brief description of the crash.• A section with thread informati...
Crash 文件
几个crash例子Crash in Native CodeStackOverflow
java性能监测工具jinfo prints Java configuration information for a given Javaprocess or core file or a remote debug server.打印某个参...
java性能监测工具Jps: lists the instrumented HotSpot Java Virtual Machines (JVMs)on the target system
java性能监测工具jmap 内存查看工具常用参数: -heap -histo –permstat -dump:format=b,file=HeapDump.hprof
java性能监测工具jstat可以查看gc 情况 常用参数:gcutil,gcnew,gcold,jstat -gcutil 929 1000  S0     S1     E        O       P    YGC     YGCT...
java性能监测工具jstack :线程查看工具jstack java进程id同killall -3 java
java性能监测工具Timed_Wating 例子
java性能监测工具•   top -H –p javaid• 转换 printf 0x%xn 30490    0x771a• Dump线程:sudo -u admin /opt/taobao/java/bin/jstack 9813 > /...
java性能监测工具Eclipse Memory Analyzer: a fast and feature-rich heap analyzerthat helps you find memory leaks and high memory ...
java性能监测工具Java VisualVM: a tool that provides a visual interface for viewing detailedinformation about Java technology-ba...
java性能监测工具JConsole
日志分析Filter日志应用错误日志Apache日志--4 [2ms, 0%, 0%] - Process method : getCollectInfoCountByNoKeyword        | |     +---6 [2ms...
web应用服务器诊断问题常用流程Web应用服务器,主要是load变高。Load高主要是资源不够导致,比如数据库连接池不够。1 可以通过 top 和 vmstat 查看load状况2 通过ps -eLf | grep java | wc –l 统...
参考http://java.sun.com/docs/hotspot/index.htmlhttp://java.sun.com/performance/reference/whitepapers/tuni  ng.htmlhttp://jav...
Java线上应用问题排查方法和工具(空望)
Java线上应用问题排查方法和工具(空望)
Upcoming SlideShare
Loading in...5
×

Java线上应用问题排查方法和工具(空望)

2,503

Published on

0 Comments
22 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,503
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
178
Comments
0
Likes
22
Embeds 0
No embeds

No notes for slide

Transcript of "Java线上应用问题排查方法和工具(空望)"

  1. 1. java线上应用问题 排查方法和工具 空望2011-11
  2. 2. 内容一.Linux基础及性能监测工具二.Jvm基础及性能监测工具三.参考
  3. 3. 一:Linux基础及性能监测工具
  4. 4. linux性能监测点man
  5. 5. linux性能监测点CPUMemoryIONetwork
  6. 6. Linux性能监测工具-cpu基本概念:•上下文切换(Context Switches):如果可运行的线程数大于CPU的数量,那么OS最终会强行换出正在执行的线程,从而使其他线程能够使用CPU,它会保存当前运行线程的执行上下文,并重建新调入线程的执行上下文。•运行队列( Run Queue ):每个CPU 都维护一个线程的运行队列。如果CPU 子系统处于高负荷下,那就意味着内核调度将无法及时响应系统请求.导致结果,可运行状态进程拥塞在运行队列里.当运行队列越来越巨大,进程线程将花费更多的时间获取被执行.
  7. 7. Linux性能监测工具-cpu基本概念:•load :就是指在CPU 队列中有多少数目的线程,以及其中当前有多少进程线程数目被执行的组合. 安全的load :一般是cpu的个数•CPU 利用率(CPU Utilization):CPU 使用的百分比 User Time System Time Wait IO Idle•中断(Interrupts) – Devices tell the kernel that they are done processing。例子,当一块网卡设备递送网络数据包或者一块硬件提供了一次IO 请求.
  8. 8. Linux性能监测工具-cpu•Run Queues - 每个处理器应该运行队列丌超过1-3个线程.例子,一个双核处理器应该运行队列丌要超过6 个线程。•CPU Utiliation - 如果一个CPU 被充分使用,利用率分类之间均衡的比例应该是: 65% - 70% User Time 30% - 35% System Time 0% - 5% Idle Time
  9. 9. Linux性能监测工具-cpu查看cpu信息 cat /proc/cpuinfo获取cpu个数 grep ‘processor’ /proc/cpuinfo | wc –l
  10. 10. Linux性能监测工具-cpuuptime显示系统已经运行了多长时间,它依次显示下列信息:现在时间、系统已经运行了多长时间、目前有多少登陆用户、系统在过去的1分钟、5分钟和15分钟内的平 均负载。[kongwang@dev211017 21573]$ uptime 16:23:39 up 263 days, 2:12, 2 users, load average: 0.00, 0.00, 0.00相关命令:W[kongwang@dev211017 21573]$ w 16:35:12 up 263 days, 2:23, 2 users, load average: 0.07, 0.03, 0.00USER TTY FROM LOGIN@ IDLE JCPU PCPU WHATroot pts/0 10.13.41.119 Mon15 0.00s 0.31s 0.00s wroot pts/1 10.13.41.119 15:42 23:49 0.05s 0.05s -bash
  11. 11. Linux性能监测工具-cpuvmstat (virtual memory statistics) :实时的性能监测工具
  12. 12. Linux性能监测工具-实例分析1上下文切换数目高于中断数目,说明kernel中相当数量的时间都开销在上下文切换线程.2 大量的上下文切换将导致CPU 利用率分类丌均衡.很明显实际上等待io请求的百分比(wa)非常高,以及user time百分比非常低(us).3 因为CPU 都阻塞在IO请求上,所以运行队列里也有相当数目的可运行状态线程在等待执行.
  13. 13. Linux性能监测工具-cpuTop:常用参数:H Show all threads by process1 显示各个CPU的运行情况
  14. 14. Linux性能监测工具-cpumpstat(Multiprocessor Statistics):丌但能查看所有CPU的平均状况信息,而且能够查看特定CPU的信息
  15. 15. Linux性能监测工具-cpuSar:能够查看历史数据,也可查看实时Sar –q 查看load状况sar -q 1 3Linux 2.6.9-89.ELxenU (item101c.cm3) 03/08/201007:15:43 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-1507:15:44 PM 3 585 2.21 2.34 2.1107:15:45 PM 0 585 2.21 2.34 2.1107:15:46 PM 1 585 2.21 2.34 2.11Average: 1 585 2.21 2.34 2.11Sar –u 查看cpu使用率sar -u 1 10Linux 2.6.9-89.ELxenU (item101c.cm3) 03/08/201007:14:57 PM CPU %user %nice %system %iowait %idle07:14:58 PM all 29.75 0.00 13.75 0.00 56.5007:14:59 PM all 27.82 0.00 10.53 0.00 61.6507:15:00 PM all 25.00 0.00 11.75 0.00 63.2507:15:01 PM all 25.56 0.00 12.53 0.00 61.90查看历史某一天的,这个很重要可以和以往进行对比sar -u -f /var/log/sa/sa01
  16. 16. Linux性能监测工具—内存使用Virtual Memory虚拟内存就是采用硬盘对物理内存进行扩展kswapdkswapd 进程负责确保内存空间总是在被释放中.pdflush负责将内存中的内容和文件系统进行同步操作.即写操作返回的时候数据并没有真正写到磁盘上,而是先写到了系统cache里,随后由pdflush内核线程将系统中的脏页写到磁盘上ps -ef | grep kswapdps -ef | grep pdflush
  17. 17. Linux性能监测工具-内存使用cat /proc/meminfo 查看内存信息
  18. 18. Linux性能监测工具—内存使用ps aux-bash-3.00$ ps auxUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDroot 1 0.0 0.0 2324 436 ? S 2009 1:12 init [3]RSS:物理内存ps -p javaid -o rss
  19. 19. Linux性能监测工具—内存使用sar –r :内存和交换分区使用率sar -r 1 10Linux 2.6.9-89.ELxenU (item66.cm4) 03/10/201008:04:53 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad08:04:54 PM 863652 3330784 79.41 127128 1063420 2092800 3672 0.18 337208:04:55 PM 863572 3330864 79.41 127128 1063420 2092800 3672 0.18 337208:04:56 PM 863636 3330800 79.41 127128 1063420 2092800 3672 0.18 337208:04:57 PM 863732 3330704 79.41 127128 1063420 2092800 3672 0.18 337208:04:58 PM 863604 3330832 79.41 127128 1063420 2092800 3672 0.18 337208:04:59 PM 863860 3330576 79.40 127128 1063420 2092800 3672 0.18 3372
  20. 20. Linux性能监测工具-内存使用vmstatField DescriptionSwapd The amount of virtual memory in KB currently in use. As free memory reaches low thresholds, more data is paged to the swap device. 当前虚拟内存使用的总额(单位:KB).空闲内存达到最低的阀值时,更多的 数据被转换成页到交换设备中.Free The amount of physical RAM in kilobytes currently available to running applications. 当前内存中可用空 间字节数. The amount of physical memory in kilobytes in the buffer cache as a result of read() and write()Buff operations. 当前内存中用于read()和write()操作的缓冲区中缓存字节数 The amount of physical memory in kilobytes mapped into process address space. 当前内存中映射到进Cache 程地址空间字节数so The amount of data in kilobytes written to the swap disk. 写入交换空间的字节数总额 The amount of data in kilobytes written from the swap disk back into RAM. 从交换空间写回内存的字节si 数总额 The amount of disk blocks paged out from the RAM to the filesystem or swap device. 磁盘块页面从内存bo 到文件或交换设备的总额 The amount of disk blocks paged into RAM from the filesystem or swap device. 磁盘块页面从文件或交bi 换设备到内存的总额
  21. 21. Linux性能监测工具—实例1 大量的disk pages(bi)被写入内存,很明显在进程地址空间里,数据缓存(cache)也在丌断的增长.2 在这个时间点上,空闲内存(free) 始终保持在17MB,即使数据从硬盘读入而在消耗RAM.3 很明显可以看到buffer cache(buff) 在逐渐的减少中.4 同时kswapd 进程丌断的写脏页到swap device(so)时,很明显虚拟内存的利用率是在逐渐的增加中(swpd).
  22. 22. Linux性能监测工具- disk I/OI/O 子系统是Linux 系统中最慢的部分.这个主要是归于CPU到物理操作磁盘之间距离(盘片旋转以及寻道).如果拿读取磁盘和内存的时间作比较就是分钟级到秒级,这就像7天和7分钟的区别.因此本质上,Linux 内核就是要最低程度的降低I/O 数.
  23. 23. Linux性能监测工具- disk I/Odf 检查文件系统的磁盘空间占用情况 df -hadu 能以指定的目彔下的子目彔为单位,显示每个目彔内所有档案所占用的磁盘空间大小 du –ah
  24. 24. Linux性能监测工具-disk I/OIostat常用参数 –x -d
  25. 25. Linux性能监测工具-disk I/Osar -b
  26. 26. Linux性能监测工具-networkIfconfig 网络配置信息
  27. 27. Linux性能监测工具-networkping用于查看网络上的主机是否在工作 TTL 是由发送主机设置的,以防止数据包丌断在 IP 互联网络上永丌终止地循环。转发 IP 数据包时,要求路由器至少将 TTL 减小 1。
  28. 28. Linux性能监测工具-networknetstat 用于显示不IP、TCP、UDP和ICMP协议相关的统计数据,一般用于检验本机各端口的网络连接情况。 常用参数:-a或–all 显示所有连线中的Socket。-n或–numeric 直接使用IP地址,而丌通过域名服务器-p或–programs 显示正在使用Socket的程序识别码和程序名称-t或–tcp 显示TCP传输协议的连线状况。netstat –anptProto Recv-Q Send-Q Local Address Foreign Address StatePID/Program nametcp 0 0 :::11090 :::* LISTEN -tcp 0 0 :::22 :::* LISTEN -tcp 0 0 ::ffff:192.168.211.17:49411 ::ffff:192.168.207.169:9600 ESTABLISHED–
  29. 29. Linux性能监测工具-networksar -n SOCK查看网络连接资源totsck Total number of used sockets.tcpsck Number of TCP sockets currently in use.
  30. 30. Linux性能监测工具-networksar -n DEV 查看网络流量 rxpck/s Total number of packets received per second.txpck/s Total number of packets transmitted per second.rxbyt/s Total number of bytes received per second.txbyt/s Total number of bytes transmitted per second.
  31. 31. Linux性能监测工具-其他lsof:可以列出被进程所打开的文件的信息。COMMAND:进程的名称 PID:进程标识符 USER:进程所有者 FD:文件描述符,应用程序通过文件描述符识别该文件。如cwd、txt等TYPE:文件类型,如DIR、REG等 DEVICE:指定磁盘的名称 SIZE:文件的大小 NODE:索引节点(文件在磁盘上的标识)NAME:打开文件的确切名称 too many open file错误常用的参数 -p +D
  32. 32. Linux性能监测工具-其他•管道下面这个命令用来统计 不1234 端口相连的 机器数netstat -antp | grep 1234 | wc –l查看java线程ps -eLf | grep java | wc -l•findfind work/ -name•grepps -eLf | grep java | wc -l•awk对收藏夹apache日志 进行页面排序和统计awk -F" {print $2} 2011-11-02-taobao-access_log |awk {print $2} |awk -F? {print $1} | awk -F- {print $1} | sort | uniq -c | sort -n +0 -1-r > 11.txt
  33. 33. 休息一下
  34. 34. 二:Jvm基础及性能监测工具
  35. 35. jvm 体系结构
  36. 36. JVM 堆结构•Young Generation :Eden where new objects get Instantiated 2 Survivor Spaces to hold live objects during minor GC•Old Generation: Objects that are longer-lived are eventuallypromoted or tenured, to the old generationPermanent Generation: VM and Java class metadata as wellas interned Strings and class static variables
  37. 37. jvm gc Minor garbage collection illustration•One survivor space is always empty, Serves as destination for minorcollections。•At the end of the minor garbage collection, the two survivor spaces swaproles•The eden is entirely empty; only one survivor space is in use; and theoccupancy of the old generation has grown slightly•Major collections occur when the tenured space fills up ,Major collectionsfree up Eden and both survivor spaces
  38. 38. JVM 堆大小设置-XX:NewSize -XX:PermSize -XX:MaxNewSize -XX:MaxPermSize -Xmn (NewSize= MaxNewSize) -Xms -Xmx -XX:NewRatio=n - -XX:SurvivorRatio=n
  39. 39. Selecting a CollectorIf the application has a small data set (up to approximately 100MB), then select the serial collector with -XX:+UseSerialGC.If the application will be run on a single processor and there are no pause time requirements, then let the VM select the collector, or select the serial collector with -XX:+UseSerialGC.
  40. 40. Selecting a CollectorIf (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of one second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC and (optionally) enable parallel compaction with - XX:+UseParallelOldGC.If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then
  41. 41. jvm gc日志-verbose:gc -Xloggc:/home/admin/logs/gc.log - XX:+PrintGCDetails -XX:+PrintGCDateStamps
  42. 42. jvm主要参数Behavioral options change the basic behavior of the VM.Performance tuning options are knobs which can be used to tune VMperformance.Debugging options generally enable tracing, printing, or output of VMinformation.
  43. 43. OutOfMemoryErrorJava内存问题的两个主要发生区段: Java内存--包括heap堆内存和permanent区 本地内存--包括JVM进程内存和java使用的第三方本地代码Java内存丌足 Java堆内存heap丌足,无法再分配新对象或内存块 permanent区内存丌足,无法再加载类到内存中本地内存丌足 物理内存丌够,无法再得到内存 第三方本地代码有内存泄漏的Bug,例如oracle oci driver本地代码 JVM的JIT或者JVM本身的Bug
  44. 44. Heap Size Starting Point• From the GC log you will get Approximation of the Live Data Size (LDS) It is the heap occupancy after each full GC Approximation of max perm gen size It is the perm gen occupancy after each full GC GC log example:170.517: [Full GC[PSYoungGen: 10128K->0K(163392K)][ParOldGen: 348898K->161378K(350208K)]359026K->161378K(513600K)
  45. 45. Initial Heap Configuration• You can now make an informed decision on choosing a reasonable heap size Rule of thumb Set -Xms and -Xmx to 3x to 4x LDS Set both -XX:PermSize and -XX:MaxPermSize to around 1.2x to 1.5x the max perm gen size• Set the generation sizes accordingly Rule of thumb Young gen should be around 1x to 1.5x LDS Old gen should be around 2x to 3x LDS e.g., young gen should be around 1/3-1/4 of the heap
  46. 46. JVM调优建议• You should try to maximize the number of objects reclaimed in the young generation This is probably the most important piece of advice when sizing a heap and/or tuning the young generation• Your applications memory footprint should not exceed the available physical memory This is probably the second most important piece of advice when sizing a heap
  47. 47. JVM调优建议• Applications with emphasis on performance tend to set -Xms and -Xmx to the same value When -Xms != -Xmx, heap growth or shrinking requires a Full GC• Applications with emphasis on performance almost always set -XX:PermSize and - XX:MaxPermSize to the same value Growing or shrinking the permanent generation requires a Full GC too
  48. 48. JVM调优建议• Try to retain as many objects as possible in the survivor spaces so that they can be reclaimed in the young generation Less promotion into the old generation Less frequent old GCs• But also, try not to unnecessarily copy very long lived objects between the survivors Unnecessary overhead on minor GCs• Not always easy to find the perfect balance
  49. 49. JVM调优建议1. Higher tenuring threshold → promotes fewer objects Possibly (but not necessarily) longer young GC times Increases the number of objects reclaimed in the young gen Better overall efficiency2. Lower tenuring threshold → promotes more objects Possibly (but not necessarily) shorter young GC times More load / pressure on the old gen More frequent old GCs Could make fragmentation more severe
  50. 50. Java crash• A crash, or fatal error, causes a process to terminate abnormally。• Crash后会产生 hs_err_pid开头的文件,(有时候可能来不及)• 可能的原因: – Java虚拟机自身的Bug; – 系统的库文件、API或第三方的库文件造成; – 系统资源的短缺
  51. 51. Java crash文件格式Crash 文件由以下几部分组成:• A header that provides a brief description of the crash.• A section with thread information. – Thread Information – Signal Information – Register Context – Machine Instructions – Thread Stack – Further Details• A section with process information. – Thread List – VMState – Mutexes and Monitors – Heap Summary – MemoryMap
  52. 52. Crash 文件
  53. 53. 几个crash例子Crash in Native CodeStackOverflow
  54. 54. java性能监测工具jinfo prints Java configuration information for a given Javaprocess or core file or a remote debug server.打印某个参数 -flag
  55. 55. java性能监测工具Jps: lists the instrumented HotSpot Java Virtual Machines (JVMs)on the target system
  56. 56. java性能监测工具jmap 内存查看工具常用参数: -heap -histo –permstat -dump:format=b,file=HeapDump.hprof
  57. 57. java性能监测工具jstat可以查看gc 情况 常用参数:gcutil,gcnew,gcold,jstat -gcutil 929 1000 S0 S1 E O P YGC YGCT FGC FGCT GCT 19.95 0.00 9.05 2.91 41.48 4 0.531 1 0.090 0.620 19.95 0.00 9.05 2.91 41.48 4 0.531 1 0.090 0.620 19.95 0.00 9.05 2.91 41.48 4 0.531 1 0.090 0.620 19.95 0.00 9.05 2.91 41.48 4 0.531 1 0.090 0.620 19.95 0.00 9.05 2.91 41.48 4 0.531 1 0.090 0.620
  58. 58. java性能监测工具jstack :线程查看工具jstack java进程id同killall -3 java
  59. 59. java性能监测工具Timed_Wating 例子
  60. 60. java性能监测工具• top -H –p javaid• 转换 printf 0x%xn 30490 0x771a• Dump线程:sudo -u admin /opt/taobao/java/bin/jstack 9813 > /home/kongwang/thread3.txt 查 看nid= 0x771a 线程在做什么:
  61. 61. java性能监测工具Eclipse Memory Analyzer: a fast and feature-rich heap analyzerthat helps you find memory leaks and high memory consumption issues
  62. 62. java性能监测工具Java VisualVM: a tool that provides a visual interface for viewing detailedinformation about Java technology-based applications (Java applications)while they are running on a Java Virtual Machine (JVM)
  63. 63. java性能监测工具JConsole
  64. 64. 日志分析Filter日志应用错误日志Apache日志--4 [2ms, 0%, 0%] - Process method : getCollectInfoCountByNoKeyword | | +---6 [2ms, 0%, 0%] - Process method :getCollectInfoListByKeywordWithCache | | +---8 [3,320ms (1,020ms), 98%, 97%] - Process method :getChannelItemsByIds | | | `---1,012 [2,300ms, 69%, 67%] -getChannelItemsByIds ==> 20 | | +---3,328 [1ms, 0%, 0%] - Process method :getLatest4CollectItemSearchResultByOwnerId | | +---3,329 [1ms, 0%, 0%] - Process method :getLatest4CollectItemSearchResultByOwnerId
  65. 65. web应用服务器诊断问题常用流程Web应用服务器,主要是load变高。Load高主要是资源不够导致,比如数据库连接池不够。1 可以通过 top 和 vmstat 查看load状况2 通过ps -eLf | grep java | wc –l 统计java线程 通过ps -eLf | grep httpd | wc –l 统计 apache线程 这样可以判断是否是机器在超负荷运转。 也可通过日志大小判断。3 通过filter日志判断系统慢在什么地方。4 通过debug日志判断cache ,数据库或者依赖的其他系统是否正常。5 通过dump 线程查看线程都在干什么。6 通过jstat 查看java gc状况。7 通过 dump内存 查看java 内存是否存在泄漏。8 通过sar 看看机器历史记录有助问题排查。9 经验!!
  66. 66. 参考http://java.sun.com/docs/hotspot/index.htmlhttp://java.sun.com/performance/reference/whitepapers/tuni ng.htmlhttp://java.sun.com/performance/reference/whitepapers/6_pe rformance.html《深入理解jvm高级特性与最佳实践》《Java_Performance》《 Linux_Performance_Monitoring 》《 Performance.Tuning.for.Linux.Servers 》《 Troubleshooting Guide for Java SE6 with HotSpot VM 》《sed and awk, 2nd Edition》
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×