0
The
New

Brendan Gregg
(and many others)
Prentice Hall, 2013
A brief talk for: A Midsummer Night’s System, San Francisco, ...
Systems Performance
• Analysis of apps to metal. Think LAMP not AMP.
• An activity for everyone: from casual to full time....
1990’s Systems Performance
* Proprietary Unix, closed source, static tools
$ vmstat 1
kthr
memory
r b w
swap free re
0 0 0...
2010’s Systems Performance
• Open source (the norm)
• Ultimate documentation
• Dynamic tracing
• Observe everything
• Visu...
1990’s Operating System Internals
• Studied in text books

operating system
applications
libraries
syscalls
kernel
2010’s Operating System Internals
• Study the source, observe in production with dynamic tracing

operating system
applica...
1990’s Systems Observability
For example, Solaris 9:
apptrace

Operating System

netstat

sotruss

Applications
DBs, all s...
2010’s Systems Observability
For example, SmartOS:
Operating System
Applications
DBs, all server types, ...
System Librari...
Dynamic Tracing turned on the light
•

Example DTrace scripts from the DTraceToolkit, DTrace book, ...

cifs*.d, iscsi*.d ...
Actual DTrace Example
• Given an interesting kernel function, eg, ZFS SPA sync:
usr/src/uts/common/fs/zfs/spa.c:
/*
* Sync...
Linux Dynamic Tracing Example
•
•

Two DTrace ports in progress (dtrace4linux, Oracle), and SystemTap
perf has basic dynam...
1990’s Performance Visualizations
Text-based and line graphs
$ iostat -x 1
device
sd0
sd5
sd12
sd12
sd13
sd14
sd15
sd16
nf...
2010’s Performance Visualizations
• Utilization and latency heat maps, flame graphs
Cloud Computing
• Performance may be limited by cloud resource controls, rather
than physical limits

• Hardware Virtualiz...
1990’s Performance Methodologies
• Tools Method
• 1. For each vendor tool:
• 2. For each useful tool metric:
• 3. Look for...
2010’s Performance Methodologies
• Documented in “Systems Performance”:
• Workload Characterization
• USE Method
• Thread ...
The New Systems Performance
• Really understand how systems work
• New observability, visualizations, methodologies
• Olde...
Upcoming SlideShare
Loading in...5
×

The New Systems Performance

4,003

Published on

A brief talk on systems performance for the July 2013 meetup "A Midsummer Night's System", video: http://www.youtube.com/watch?v=P3SGzykDE4Q. This summarizes how systems performance has changed from the 1990's to today. This was the reason for writing a new book on systems performance, to provide a reference that is up to date, covering new tools, technologies, and methodologies.

Published in: Education, Technology
1 Comment
24 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,003
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
146
Comments
1
Likes
24
Embeds 0
No embeds

No notes for slide

Transcript of "The New Systems Performance"

  1. 1. The New Brendan Gregg (and many others) Prentice Hall, 2013 A brief talk for: A Midsummer Night’s System, San Francisco, July 2013
  2. 2. Systems Performance • Analysis of apps to metal. Think LAMP not AMP. • An activity for everyone: from casual to full time. Operating System • The basis is the system Applications • The target is System Libraries System Call Interface everything can cause performance problems Kernel • All software VFS Sockets File Systems TCP/UDP Volume Managers IP Block Device Interface Ethernet Scheduler Device Drivers Resource Controls Firmware Metal Virtual Memory
  3. 3. 1990’s Systems Performance * Proprietary Unix, closed source, static tools $ vmstat 1 kthr memory r b w swap free re 0 0 0 8475356 565176 2 1 0 0 7983772 119164 0 0 0 0 8046208 181600 0 [...] * * * * * mf 8 0 0 page disk faults cpu pi po fr de sr cd cd s0 s5 in sy cs us sy id 0 0 0 0 1 0 0 -0 13 378 101 142 0 0 99 0 0 0 0 0 224 0 0 0 1175 5654 1196 1 15 84 0 0 0 0 0 322 0 0 0 1473 6931 1360 1 7 92 Limited metrics and documentation Some perf issues could not be solved Analysis methodology constrained by tools Perf experts used inference and experimentation Literature is still around
  4. 4. 2010’s Systems Performance • Open source (the norm) • Ultimate documentation • Dynamic tracing • Observe everything • Visualizations • Comprehend many metrics • Cloud computing • Resource controls • Methodologies • Where to begin, and steps to root cause
  5. 5. 1990’s Operating System Internals • Studied in text books operating system applications libraries syscalls kernel
  6. 6. 2010’s Operating System Internals • Study the source, observe in production with dynamic tracing operating system applications libraries syscalls kernel
  7. 7. 1990’s Systems Observability For example, Solaris 9: apptrace Operating System netstat sotruss Applications DBs, all server types, ... System Libraries truss kstat lockstat File Systems IP Block Device Interface Ethernet Scheduler TCP/UDP Volume Managers Solaris Kernel Sockets Virtual Memory CPU Interconnect prstat ps vmstat Device Drivers I/O Bus iostat I/O Bridge I/O Controller Disk CPU 1 Memory Bus DRAM snoop Expander Interconnect Network Controller Interface Transports Disk cpustat cputrack mpstat System Call Interface VFS Hardware Port Port netstat kstat Various: sar kstat
  8. 8. 2010’s Systems Observability For example, SmartOS: Operating System Applications DBs, all server types, ... System Libraries truss kstat VFS Sockets File Systems IP Block Device Interface Ethernet Scheduler Virtual Memory CPU Interconnect prstat ps vmstat Device Drivers I/O Bus iostat I/O Bridge I/O Controller snoop intrstat Expander Interconnect Network Controller Interface Transports Disk Disk cpustat cputrack mpstat TCP/UDP Volume Managers Hardware plockstat lockstat System Call Interface illumos Kernel dtrace netstat Port Port CPU 1 Memory Bus DRAM nicstat kstat Various: sar kstat
  9. 9. Dynamic Tracing turned on the light • Example DTrace scripts from the DTraceToolkit, DTrace book, ... cifs*.d, iscsi*.d :Services nfsv3*.d, nfsv4*.d ssh*.d, httpd*.d Language Providers: Databases: fswho.d, fssnoop.d sollife.d solvfssnoop.d dnlcsnoop.d zfsslower.d ziowait.d ziostacks.d spasync.d metaslab_free.d hotuser, umutexmax.d, lib*.d node*.d, erlang*.d, j*.d, js*.d php*.d, pl*.d, py*.d, rb*.d, sh*.d mysql*.d, postgres*.d, redis*.d, riak*.d Applications DBs, all server types, ... System Libraries System Call Interface ide*.d, mpt*.d VFS Sockets File Systems IP Block Device Interface Ethernet Scheduler TCP/UDP Volume Managers iosnoop, iotop disklatency.d satacmds.d satalatency.d scsicmds.d scsilatency.d sdretry.d, sdqueue.d opensnoop, statsnoop errinfo, dtruss, rwtop rwsnoop, mmap.d, kill.d shellsnoop, zonecalls.d weblatency.d, fddist Device Drivers Virtual Memory priclass.d, pridist.d cv_wakeup_slow.d displat.d, capslat.d minfbypid.d pgpginbypid.d macops.d, ixgbecheck.d ngesnoop.d, ngelink.d soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.d sotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.d ipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.d tcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.d tcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.d tcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.d tcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.d
  10. 10. Actual DTrace Example • Given an interesting kernel function, eg, ZFS SPA sync: usr/src/uts/common/fs/zfs/spa.c: /* * Sync the specified transaction group. New blocks may be dirtied as * part of the process, so we iterate until it converges. */ void spa_sync(spa_t *spa, uint64_t txg) { dsl_pool_t *dp = spa->spa_dsl_pool; [...] • Trace and print timestamp with latency: # dtrace -n 'fbt::spa_sync:entry { self->ts = timestamp; } fbt::spa_sync:return /self->ts/ { printf("%Y %d ms", walltimestamp, (timestamp - self->ts) / 1000000); self->ts = 0; }' dtrace: description 'fbt::spa_sync:entry ' matched 2 probes CPU ID FUNCTION:NAME 0 53625 spa_sync:return 2013 Jul 26 17:37:02 12 ms 0 53625 spa_sync:return 2013 Jul 26 17:37:08 726 ms 6 53625 spa_sync:return 2013 Jul 26 17:37:17 6913 ms 6 53625 spa_sync:return 2013 Jul 26 17:37:17 59 ms
  11. 11. Linux Dynamic Tracing Example • • Two DTrace ports in progress (dtrace4linux, Oracle), and SystemTap perf has basic dynamic tracing (not programatic); eg: # perf probe --add='tcp_sendmsg' Add new event: probe:tcp_sendmsg (on tcp_sendmsg) [...] # perf record -e probe:tcp_sendmsg -aR -g sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.091 MB perf.data (~3972 samples) ] # perf report --stdio [...] # Overhead Command Shared Object Symbol # ........ ....... ................. ........... # 100.00% sshd [kernel.kallsyms] [k] tcp_sendmsg | --- tcp_sendmsg sock_aio_write do_sync_write active traced call stacks from vfs_write sys_write arbitrary kernel locations! system_call __GI___libc_write
  12. 12. 1990’s Performance Visualizations Text-based and line graphs $ iostat -x 1 device sd0 sd5 sd12 sd12 sd13 sd14 sd15 sd16 nfs6 [...] r/s 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 extended device statistics w/s kr/s kw/s wait actv 0.1 5.2 3.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.2 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 svc_t 69.8 1.1 3.1 0.0 0.0 1.9 0.0 0.0 0.0 %w 0 0 0 0 0 0 0 0 0 %b 0 0 0 0 0 0 0 0 0
  13. 13. 2010’s Performance Visualizations • Utilization and latency heat maps, flame graphs
  14. 14. Cloud Computing • Performance may be limited by cloud resource controls, rather than physical limits • Hardware Virtualization complicates things – as a guest you can’t analyze down to metal directly • Hopefully the cloud provider provides an API for accessing physical statistics, or does the analysis on your behalf
  15. 15. 1990’s Performance Methodologies • Tools Method • 1. For each vendor tool: • 2. For each useful tool metric: • 3. Look for problems • Ad Hoc Checklist Method • 10 recent issues: run A, if B, fix C • Analysis constrained by tool metrics
  16. 16. 2010’s Performance Methodologies • Documented in “Systems Performance”: • Workload Characterization • USE Method • Thread State Analysis Method • Drill-down Analysis • Latency Analysis • Event Tracing • ...
  17. 17. The New Systems Performance • Really understand how systems work • New observability, visualizations, methodologies • Older performance tools and approaches still used as appropriate • Great time to be working with systems, both enterprise and the cloud!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×