SlideShare a Scribd company logo
1 of 115
Download to read offline
What Linux can learn from

Solaris performance
and vice-versa

Brendan Gregg
Lead Performance Engineer
brendan@joyent.com
@brendangregg

SCaLE12x
February, 2014
Linux vs Solaris Performance Differences
DTrace libumem Symbols Microstate futex
Accounting
Up-to-date
Mature fully
packages
preemptive
Applications

CPU scalability

likely()/unlikely()
CONFIGurable

RCU

DynTicks

System Libraries

SLUB

System Call Interface

ZFS
btrfs

I/O
Scheduler

VFS

Sockets

File Systems

TCP/UDP

Volume Managers

IP

Block Device Interface

Ethernet

Scheduler
Virtual
Memory

Virtualization
Device Drivers

Zones
KVM

Crossbow FireEngine
More device drivers

Process
swapping
Overcommit
& OOM Killer
MPSS
Lazy TLB
whoami
• Lead Performance Engineer at Joyent
• Work on Linux and SmartOS performance
• Work/Research: tools, visualizations, methodologies
• Did kernel engineering at Sun Microsystems; worked on
DTrace and ZFS
Joyent
• High-Performance Cloud Infrastructure
• Compete on cloud instance/OS performance
• OS Virtualization for bare metal performance (Zones)
• Core developers of illumos/SmartOS and Node.js
• Recently launched Manta: a high performance object store
• KVM for Linux guests
• Certified Ubuntu on Joyent cloud now available!
SCaLE11x: Linux Performance Analysis
strace

Operating System

pidstat

netstat

Hardware

perf

Applications
DBs, all server types, ...

mpstat

System Libraries

CPU
Interconnect

System Call Interface
Linux Kernel

perf
dtrace
stap
lttng
ktap

VFS

Sockets

File Systems

TCP/UDP

Volume Managers

IP

Block Device Interface

Ethernet

Scheduler

top ps
pidstat

Virtual
Memory

vmstat
slabtop
free

Device Drivers

iostat
iotop
blktrace

perf
Expander Interconnect

I/O Bus
I/O Bridge

tcpdump

I/O Controller
Disk

Memory
Bus

perf

DRAM

nicstat
ip

Network Controller
Interface Transports

Disk

CPU
1

Various:
Port

Swap

swapon

ping

Latest version: http://www.brendangregg.com/linuxperf.html

Port

traceroute

sar
dstat
/proc
SCaLE12x: Linux and Solaris Performance
• Also covered in my new book,
Systems Performance
(Prentice Hall, 2013)

• Focus is on understanding
systems and the methodologies
to analyze them. Linux and
Solaris are used as examples
Agenda
• Why systems differ
• Specific differences
• What Solaris could learn from Linux
• What Linux could learn from Solaris
• What both can learn
• Results
Terminology
• For convenience in this talk:
•

Linux = an operating system distribution which uses the Linux kernel.
eg, Ubuntu.

•

Solaris = a distribution from the family of operating systems whose
kernel code originated from Sun Solaris.

•

SmartOS = a Solaris-family distribution developed by Joyent, based on
the illumos kernel, which was based on the OpenSolaris kernel, which
was based on the Solaris kernel

•

System = everything you get when you pick a Linux or Solaris
distribution: the kernel, libraries, tools, and package repos

• Opinions in this presentation are my own, and I do not
represent Oracle Solaris. I'll actually be talking about SmartOS.
Why Systems Differ
Why Systems Differ
• Does the system even matter?
• Will your application perform the same on Linux and Solaris?
Example
• Let's start with this simple program:
perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }'

• This counts to 100,000,000, setting a variable
• To simplify this further, we're only interested in performance of
the loop, which dominates runtime, not program startup.
Example Results
• One of these is Linux, the other SmartOS. Same hardware:
systemA$ time perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }'
real
user
sys

0m18.534s
0m18.450s
0m0.018s

systemB$ time perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }'
real
user
sys

0m16.253s
0m16.230s
0m0.010s

• One system is 14% slower
• Imagine that's your system – you'd want to know why
• I recently had a customer with a complex performance issue try
a one-liner like this, as a simple test, and with a similar result.
It's an interesting tour of some system differences
Possible Differences:Versioning
• Different versions of Perl
• Applications improve performance from release to release
• Linux and SmartOS distributions use entirely different
package repos; different software versions are common

• Different compilers used to build Perl
• Compilers come from package repos, too. I've seen 50%
performance improvements by gcc version alone

• Different compiler options used to build Perl
• Application Makefile: #ifdef Linux -O3 #else -O0. ie, the
performance difference is due to a Makefile decision

• 32-bit vs 64-bit builds
Possible Differences: OS
• Different system libraries
• If any library calls are used. eg: strcmp(), malloc(),
memcpy(), ... These implementations vary between Linux
and Solaris, and can perform very differently

• Robert Mustacchi enhanced libumem to provide improved
malloc() performance on SmartOS. This can make a
noticeable difference for some workloads

• Different background tasks
• Runtime could be perturbed by OS daemons doing async
housekeeping. These differ between Linux and Solaris
Possible Differences: Observability
• Can the 14% be root caused?
• Observability tools differ. These don't cause the 14%, but
can make the difference as to whether you can diagnose
and fix it – or not.

• DTrace has meant that anything can be solved; without an
equivalent on Linux, you may have to live with that 14%

• Although, Linux observability is getting much better...
Possible Differences: Kernel
• Can the kernel make a difference? ... As a reminder:
perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }'

• The program makes no system calls during the loop
Possible Differences: Kernel, cont.
• Can the kernel make a difference? ... As a reminder:
perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }'

• The program makes no system calls during the loop
• Yes, for a number of reasons:
• Setting the string involves memory I/O, and the kernel
controls memory placement. Allocating nearby memory in
a NUMA system can significantly improve performance

• The kernel may also control the CPU clock speed (eg, Intel
SpeedStep), and vary it for temp or power reasons

• The program could be perturbed by interrupts: eg, network
I/O (although the performance effect should be small).
Possible Differences: Kernel, cont.
• During a perturbation, the kernel CPU scheduler may
migrate the thread to another CPU, which can hurt
performance (cold caches, memory locality)

• Sure, but would that happen for this simple Perl program?
Possible Differences: Kernel, cont.
• During a perturbation, the kernel CPU scheduler may
migrate the thread to another CPU, which can hurt
performance (cold caches, memory locality)

• Sure, but would that happen for this simple Perl program?
# dtrace -n 'profile-99 /pid == $target/ { @ = lquantize(cpu, 0, 16, 1); }' -c ...
value ------------- Distribution ------------- count
< 0 |
0
0 |
1
1 |@@@@@
483
Yes, a lot!
2 |
1
3 |@@@@@@@
663
4 |
2
5 |@@@
This shows the CPUs
276
6 |
0
Perl ran on. It should
7 |@@@@@@
512
8 |
1
stay put, but instead
9 |@@@
288
10 |
0
runs across many.
11 |@@@@@@
576
12 |
0
13 |@@@@@
442
We've been fixing
14 |
2
15 |@@@
308
this in SmartOS
16 |
0
Kernel Myths and Realities
• Myth: "The kernel gets out of the way for applications"
• The only case where the kernel gets out of the way is
when your software calls halt() or shutdown()

• The performance difference between kernels may be
small, eg, 5% – but I have already seen a 5x difference this
year
arch/ia64/kernel/smp.c:
void
cpu_die(void)
{
max_xtp();
local_irq_disable();
cpu_halt();
/* Should never be here */
BUG();
for (;;);
}

unintentional kernel humor...
Other Differences
• The previous example was simple. Any applications that do
I/O (ie, everything) encounter more differences:

• Different network stack implementations, including support
for different TCP/IP features

• Different file systems, storage I/O stacks
• Different device drivers and device feature support
• Different resource control implementations
• Different virtualization technologies
• Different community support: stackoverflow, meetups, ...
Types of Differences
App versions from
system repos
Syscall interface
File systems:
ZFS, btrfs, ...

Compiler Observability
OS
System library
options
tools
daemons implementations;
malloc(), str*(), ...
Applications
DBs, all server types, ...
System Libraries
System Call Interface

Resource
controls

Volume Managers

IP
Ethernet

Scheduler

TCP/UDP

Block Device Interface

Virtualization
Technologies

Sockets

File Systems

I/O scheduling

VFS

Scheduler
classes and
behavior

Virtual
Memory

Virtualization
Device Drivers

Device driver TCP/IP stack
support
and features

Memory
allocation
and locality

Network device
CPU fanout
Specific Differences
Specific Differences
• Comparing systems is like comparing countries
• I'm often asked: how's Australia different from the US?
• Where do I start!?
• I'll categorize performance differences into big or small, based
on their engineering cost, not their performance effect

• If one system is 2x faster than another for a given workload,
the real question for the slower system is:

• Is this a major undertaking to fix?
• Is there a quick fix or workaround?
• Using SmartOS for specific examples...
Big Differences
• Major bodies of perf work and other big differences, include:
• Linux
• up-to-date packages, large community, more device
drivers, futex, RCU, btrfs, DynTicks, SLUB, I/O scheduling
classes, overcommit & OOM killer, lazy TLB, likely()/
unlikely(), CONFIGurable

• SmartOS
• Mature: Zones, ZFS, DTrace, fully pre-emptable kernel
• Microstate accounting, symbols by default, CPU scalability,
MPSS, libumem, FireEngine, Crossbow, binary /proc,
process swapping
Big Differences: Linux
Up-to-date packages
Large community
More device drivers

Latest application versions, with the latest
performance fixes
Weird perf issue? May be answered on
stackoverflow, or discussed at meetups
There can be better coverage for high
performing network cards or driver features

futex

Fast user-space mutex

RCU

Fast-performing read-copy updates

btrfs

Modern file system with pooled storage

DynTicks
SLUB

Dynamic ticks: tickless kernel, reduces
interrupts and saves power
Simplified version of SLAB kernel memory
allocator, improving performance
Big Differences: Linux, cont.
I/O scheduling classes Block I/O classes: deadline, anticipatory, ...
Overcommit & OOM
killer

Doing more with less main memory

Lazy TLB

Higher performing munmap()

likely()/unlikely()
CONFIGurable

Kernel is embedded with compiler information
for branch prediction, improving runtime perf
Lightweight kernels possible by disabling
features
Big Differences: SmartOS
OS virtualization for high-performing server
Mature Zones
instances
Fully-featured and high-performing modern
Mature ZFS
integrated file system with pooled storage
Programmable dynamic and static tracing for
Mature DTrace
performance analysis
Mature fully preSupport for real-time systems was an early Sun
emptable kernel
differentiator
Numerous high-resolution thread state times for
Microstate accounting
performance debugging
Symbols

Symbols available for profiling tools by default

CPU scalability

Code is often tested, and bugs fixed, for large
SMP servers (mainframes)

MPSS

Multiple page size support (not just hugepages)
Big Differences: SmartOS, cont.
libumem
FireEngine
Crossbow
binary /proc
Process swapping

High-performing memory allocation library, with
per-thread CPU caches
High-performing TCP/IP stack enhancements,
including vertical perimeters and IP fanout
High-performing virtualized network interfaces,
as used by OS virtualization
Process statistics are binary (slightly more
efficient) by default
Apart from paging (what Linux calls swapping),
Solaris can still swap out entire processes
Big Differences: Linux vs SmartOS
DTrace libumem Symbols Microstate futex
Accounting
Up-to-date
Mature fully
packages
preemptive
Applications

CPU scalability

likely()/unlikely()
CONFIGurable

RCU

DynTicks

System Libraries

SLUB

System Call Interface

ZFS
btrfs

I/O
Scheduler

VFS

Sockets

File Systems

TCP/UDP

Volume Managers

IP

Block Device Interface

Ethernet

Scheduler
Virtual
Memory

Resource Controls
Device Drivers

Zones

Crossbow FireEngine

More device drivers

Process
swapping
Overcommit
& OOM Killer
MPSS
Lazy TLB
Small Differences
• Smaller performance-related differences, tunables, bugs
• Linux
• glibc, better TCP defaults, better CPU affinity, perf stat, a
working sar, htop, splice(), fadvise(), ionice, /usr/bin/time,
mpstat %steal, voluntary preemption, swappiness, various
accounting frameworks, tcp_tw_reuse/recycle, TCP tail
loss probe, SO_REUSEPORT, ...

• SmartOS
• perf tools by default, kstat, vfsstat, iostat -e, ptime -m,
CPU-only load averages, some STREAMS leftovers, ZFS
SCSI cache flush by default, different TCP slow start
default, ...
Small Differences, cont.
• Small differences change frequently: a feature is added to one
kernel, then the other a year later; a difference may only exist
for a short period of time.

• These small kernel differences may still make a significant
performance difference, but are classified as "small" based on
engineering cost.
System Similarities
• It's important to note that many performance-related features
are roughly equivalent:

• Both are Unix-like systems: processes, kernel, syscalls,
time sharing, preemption, virtual memory, paged virtual
memory, demand paging, ...

• Similar modern features: unified buffer cache, memory
mapped files, multiprocessor support, CPU scheduling
classes, CPU sets, 64-bit support, memory locality,
resource controls, PIC profiler, epoll, ...
Non Performance Differences
• Linux
• Open source (vs Oracle Solaris), "everyone knows it",
embedded Linux, popular and well supported desktop/
laptop use...

• SmartOS
• SMF/FMA, contracts, privileges, mdb (postmortem
debugging), gcore, crash dumps by default, ...
WARNING

The next sections are not suitable for those suffering
Not Invented Here (NIH) syndrome,
or those who are easily trolled
What Solaris can learn from Linux performance
What Solaris can learn from Linux performance
• Packaging

• Overcommit & OOM Killer

• Community

• SLUB

• Compiler Options

• Lazy TLB

• likely()/unlikely()

• TIME_WAIT Recycling

• Tickless Kernel

• sar

• Process Swapping

• KVM

• Either learning what to do, or learning what not to do...
Packaging
• Linux package repositories are often well stocked and updated
• Convenience aside, this can mean that users run newer
software versions, along with the latest perf fixes

• They find "Linux is faster", but the real difference is the version
of: gcc, openssl, mysql, ... Solaris is unfairly blamed
Packaging, cont.
• Packaging is important and needs serious support
• Dedicated staff, community
• eg, Joyent has dedicated staff for the SmartOS package repo,
which is based on pkgsrc from NetBSD

• It's not just the operating system that matters; it's the
ecosystem
Community
• A large community means:
• Q/A sites have performance tips: stackoverflow, ...
• Conference talks on performance (this one!), slides, video
• Weird issues more likely found and fixed by someone else
• More case studies shared: what tuning/config worked
• A community helps people hear about the latest tools, tuning,
and developments, and adopt them
Community, cont.
• Linux users expect to Google a question and find an answer
on stackoverflow

• Either foster a community to share content on tuning, tools,
configuration, or, have staff to create such content.

• Hire a good community manager!
Compiler Options
• Apps may compile with optimizations for Linux only. eg:
Oh, ha ha ha
• #ifdef Linux -O3 #else -O0
• Developers are often writing software on Linux, and that
platform gets the most attention. (Works on my system.)

• I've also seen 64-bit vs 32-bit. #ifdef Linux USE_FUTEX would
be fine, since Solaris doesn't have them yet.

• Last time I found compiler differences using Flame Graphs:
Extra Function:
UnzipDocid()

Linux

SmartOS
Compiler Options, cont.
• Can be addressed by tuning packages in the repo
• Also file bugs/patches with developers to tune Makefiles
• Someone has to do this, eg, package repo staff/community
who find and do the workarounds anyway
likely()/unlikely()
• These become compiler hints (__builtin_expect) for branch
prediction, and are throughout the Linux kernel:
net/ipv4/tcp_output.c, tcp_transmit_skb():
[...]

[...]

if (likely(clone_it)) {
if (unlikely(skb_cloned(skb)))
skb = pskb_copy(skb, gfp_mask);
else
skb = skb_clone(skb, gfp_mask);
if (unlikely(!skb))
return -ENOBUFS;
}

• The Solaris kernel doesn't do this yet
•

If the kernel is built using profile feedback instead – which should
be even better – I don't know about it

• The actual perf difference is likely to be small
likely()/unlikely(), cont.
• Could be adopted by kernel engineering
• Might help readability, might not
Tickless Kernel
• Linux does this already (DynTicks), which reduces interrupts
and improves processor power saving (good for laptops and
embedded devices)

• Solaris still has a clock() routine, to perform various kernel
housekeeping functions

• Called by default at 100 Hertz
• If hires_tick=1, at 1000 Hertz
• I've occasionally encountered perf issues involving 10 ms
latencies, that don't exist on Linux

• ... which become 1 ms latencies after setting hires_tick=1
Tickless Kernel, cont.
• Sun/Oracle did start work on this years ago...
Process Swapping
• Linux doesn't do it. Linux "swapping" means paging.
• Process swapping
made sense on the
PDP-11/20, where
the maximum
process size was
64 Kbytes

• Paging was added
later in BSD, but
the swapping code
remained
Process Swapping, cont.
• Consider ditching it
• All that time learning what swapping is could be spent learning
more useful features
Overcommit & OOM Killer
• On Linux, malloc() may never fail
• No virtual memory limit (main memory + swap) like Solaris
by default. Tunable using vm.overcommit_memory

• More user memory can be allocated than can be stored.
May be great for small devices, running applications that
sparsely use the memory they allocate

• Don't worry, if Linux runs very low on available main memory,
a sacrificial process is identified by the kernel and killed
by the Out Of Memory (OOM) Killer, based on an OOM score

• OOM score was just added to htop (1.0.2, Jan 2014):
Overcommit & OOM Killer, cont.
• Solaris can learn why not to do this (cautionary tale)
• If an important app depended on this, and couldn't be fixed,
the kernel could have an overcommit option that wasn't default

• ... this is why so much new code doesn't check for ENOMEM
SLUB
• Linux integrated the Solaris kernel SLAB allocator, then later
simplified it: The SLUB allocator

• Removed object queues and per-CPU caches, leaving NUMA
optimization to the page allocator's free lists

• Worth considering?
Lazy TLB
• Lazy TLB mode: a way to delay TLB updates (shootdowns)
• munmap() heavy workloads on Solaris can experience heavy
HAT CPU cross calls. Linux doesn't seem to have this problem.

TLB

Lazy TLB

As seen by Solaris

Correct

Reckless

As seen by Linux

Paranoid

Fast
Lazy TLB, cont.
• This difference needs to be investigated, quantified, and
possibly fixed (tunable?)
TIME_WAIT Recycling
• A localhost HTTP benchmark on Solaris:
# netstat -s 1 | grep ActiveOpen
tcpActiveOpens
=728004
tcpActiveOpens
=
0
tcpActiveOpens
= 4939
tcpActiveOpens
= 5849
tcpActiveOpens
= 1341
tcpActiveOpens
= 1006
tcpActiveOpens
=
872
tcpActiveOpens
=
932
tcpActiveOpens
=
879
tcpActiveOpens
=
562
tcpActiveOpens
=
613

tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens
tcpPassiveOpens

=726547
=
0
= 4939
= 5798
= 1292
= 1008
=
870
=
932
=
879
=
586
=
594

Fast
Slow

• Connection rate drops by 5x due to sessions in TIME_WAIT
• Linux avoids this by recycling better (tcp_tw_reuse/recycle)
• Usually doesn't hurt production workloads, as it must be a
flood of connections from a single host to a single port. It
comes up in benchmarks/evaluations.
TIME_WAIT Recycling, cont.
• Improve tcp_time_wait_processing()
• This is being fixed for illumos/SmartOS
sar
• Linux sar is awesome, and has extra options:
$ sar -n
11:16:34
11:16:35
11:16:35
11:16:35
11:16:35
11:16:35
11:16:35
11:16:35

DEV -n TCP -n ETCP 1
PM
IFACE
rxpck/s
PM
eth0
104.00
PM
eth1
7.00
PM ip6tnl0
0.00
PM
lo
0.00
PM ip_vti0
0.00
PM
sit0
0.00
PM
tunl0
0.00

11:16:34 PM
11:16:35 PM

active/s passive/s
0.00
0.00

11:16:34 PM
11:16:35 PM

atmptf/s
0.00

txpck/s
675.00
0.00
0.00
0.00
0.00
0.00
0.00
iseg/s
99.00

rxkB/s
7.35
0.38
0.00
0.00
0.00
0.00
0.00

txkB/s
984.72
0.00
0.00
0.00
0.00
0.00
0.00

rxcmp/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00

oseg/s
681.00

estres/s retrans/s isegerr/s
0.00
0.00
0.00

orsts/s
0.00

• -n DEV: network interface statistics
• -n TCP: TCP statistics
• -n ETCP: TCP error statistics
• Linux sar's other metrics are also far less buggy

txcmp/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00

rxmcst/s
0.00
0.00
0.00
0.00
0.00
0.00
0.00
sar, cont.
• Sar must be fixed for the 21st century
• Use the Linux sar options and column names, which follow a
neat convention
KVM
• The KVM type 2 hypervisor originated for Linux
• While Zones are faster, KVM can run different kernels (Linux)
• vs Type 1 hypervisors (Xen):
• KVM has better perf
observability, as it can
use the regular OS tools

• KVM can use OS
resource controls, just
like any other process
KVM, cont.
• illumos/SmartOS learned this, Joyent ported KVM!
• Oracle Solaris doesn't have it yet
What Linux can learn from Solaris performance
What Linux can learn from Solaris performance
• ZFS
• Zones
• STREAMS
• Symbols
• prstat -mLc
• vfsstat
• DTrace
• Culture

• Either learning what to do, or learning what not to do...
ZFS
• More performance features than you can shake a stick at:
• Pooled storage, COW, logging (batching writes), ARC,
variable block sizes, dynamic striping, intelligent prefetch,
multiple prefetch streams, snapshots, ZIO pipeline,
compression (lzjb can improve perf by reducing I/O load),
SLOG, L2ARC, vdev cache, data deduplication (possibly
better cache reach)

• The Adaptive Replacement Cache (ARC) can make a big
difference: it can resist perturbations (backups) and stay warm

• ZFS I/O throttling (in illumos/SmartOS) throttles disk I/O at the
VFS layer, to solve cloud noisy neighbor issues

• ZFS is Mature. Widespread use in criticial environments
ZFS, cont.
• Linux has been learning about ZFS for a while
• http://zfsonlinux.org/
• btrfs
Zones
• Ancestry: chroot  FreeBSD jails  Solaris Zones
• OS Virtualization. Zero I/O path overheads.
Zones, cont.
• Compare to HW Virtualization:

• This shows the initial I/O control flow. There are optimizations/
variants for improving the HW Virt I/O path, esp for Xen.
Zones, cont.
• Comparing 1 GB instances on Joyent
• Max network throughput:
• KVM: 400 Mbits/sec
• Zones: 4.54 Gbits/sec (over 10x)
• Max network IOPS:
• KVM: 18,000 packets/sec
• Zones: 78,000 packets/sec (over 4x)
• Numbers go much higher for larger instances
•

http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen
Zones, cont.
• Performance analysis for Zones is also easy. Analyze the
applications as usual:
Operating System
Applications

Zone .

analyze

...

System Libraries
System Call Interface

Kernel

VFS

Sockets

File Systems

TCP/UDP

Volume Managers

IP

Block Device Interface

Ethernet

Resource Controls
Device Drivers
Firmware
Metal

Scheduler
Virtual
Memory
Zones, cont.
Host Applications

• Compared

QEMU

to HW Virt
(KVM):

analyze

Guest Applications
System Libraries
System Call Interface
VFS
File Systems

TCP/UDP

Volume Managers

IP

Block Device Interface

Linux
kernel

Sockets

Ethernet

Scheduler

...

Virtual
Memory

Resource Controls

correlate

Device Drivers

observability
boundary

Virtual Devices

System Libraries
System Call Interface

host
kernel

KVM

VFS

Sockets

File Systems

TCP/UDP

Volume Managers

IP

Block Device Interface

Ethernet

Resource Controls
Device Drivers
Firmware
Metal

Scheduler
Virtual
Memory
Zones, cont.
• Linux has been learning: LXC & cgroups, but not widespread
adoption yet. Docker will likely drive adoption.
STREAMS
• AT&T modular I/O subsystem
• Like Unix shell pipes, but for kernel messages. Can push
modules into the stream to customize processing

• Introduced (fully) in Unix 8th Ed Research Unix, became SVr4
STREAMS, and was used by Solaris for network TCP/IP stack

• With greater demands for TCP/IP performance, the overheads
of STREAMS reduced scalability

• Sun switched high-performing paths to be direct function calls
STREAMS, cont.
• A cautionary tale: not good for high performance code paths
Symbols
• Compilers on Linux strip symbols by default, making perf
profiler output inscrutable without the dbgsym packages
57.14%

sshd libc-2.15.so
[.] connect
|
--- connect
|
|--25.00%-- 0x7ff3c1cddf29
|
|--25.00%-- 0x7ff3bfe82761
What??
|
0x7ff3bfe82b7c
|
|--25.00%-- 0x7ff3bfe82dfc
--25.00%-- [...]

• Linux compilers also drop frame pointers, making stacks hard
to profile. Please use -fno-omit-frame-pointer to stop this.

• as a workaround, perf_events has "-g dwarf" for libunwind
• Solaris keeps symbols and stacks, and often has CTF too,
making Mean Time To Flame Graph very fast
Symbols, cont.
Flame Graphs need
symbols and stacks
Symbols, cont.
• Keep symbols and frame pointers. Faster resolution for
performance analysis and troubleshooting.
prstat -mLc
• Per-thread time broken down into states, from a top-like tool:
$ prstat -mLc 1
PID USERNAME USR SYS
63037 root
83 16
12927 root
14 49
63037 root
0.5 0.6
[...]

TRP
0.1
0.0
0.0

TFL
0.0
0.0
0.0

DFL
0.0
0.0
0.0

LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
0.0 0.0 0.5 30 243 45K
0 node/1
0.0 34 2.9 6K 365 .1M
0 ab/1
3.7 95 0.4 1K
0 1K
0 node/2

• These columns narrow an investigation
immediately, and have been critical for
solving countless issues. Unsung hero
of Solaris performance analysis

• Well suited for the Thread State Analysis
(TSA) methodology, which I've taught
in class, and has helped students get
started and fix unknown perf issues

•

http://www.brendangregg.com/tsamethod.html
prstat -mLc, cont.
• Linux has various thread states: delay accounting, I/O
accounting, schedstats. Can they be added to htop?
See TSA Method for use case and desired metrics.
vfsstat
• VFS-level iostat (added to SmartOS, not Solaris):
$ vfsstat -M 1
r/s
w/s Mr/s Mw/s ractv wactv read_t writ_t
761.0 267.1 15.4
1.6
0.0
0.0
12.0
24.7
4076.8 2796.0 41.7 2.3 0.1
0.0
16.6
3.1
4945.1 2807.4 157.1 2.3 0.1
0.0
25.2
3.4
3550.9 1910.4 109.7 1.6 0.4
0.0 112.9
3.3
[...]

%r
0
6
12
39

%w
0
0
0
0

d/s
1.3
0.0
0.0
0.0

del_t
23.5
0.0
0.0
0.0

zone
5716a5b6
5716a5b6
5716a5b6
5716a5b6

• Shows what the applications request from
the file system, and the true performance
that they experience

• iostat includes asynchronous I/O
• vfsstat sees issues iostat can't:
• lock contention
• resource control throttling

Applications
System Libraries
System Call Interface

vfsstat

VFS
File Systems
Volume Managers

iostat

Block Device Interface

Device Drivers
Storage Devices
vfsstat, cont.
• Add vfsstat, or VFS metrics to sar.
DTrace
• Programmable, real-time, dynamic and static
tracing, for performance analysis and
troubleshooting, in dev and production

• Used on Solaris, illumos/SmartOS,
Mac OS X, FreeBSD, ...

• Solve virtually any perf issue. eg,
fix the earlier Perl 15% delta,
no matter where the problem is.
Without DTrace's capabilities, you
may have to wear that 15%.

• Users can write their own custom DTrace
one-liners and scripts, or use/modify others
(eg, mine).
DTrace: illumos Scripts
•

Some of my DTrace scripts from the DTraceToolkit, DTrace book...

cifs*.d, iscsi*.d :Services
nfsv3*.d, nfsv4*.d
ssh*.d, httpd*.d

Language Providers:
Databases:

fswho.d, fssnoop.d
sollife.d
solvfssnoop.d
dnlcsnoop.d
zfsslower.d
ziowait.d
ziostacks.d
spasync.d
metaslab_free.d

hotuser, umutexmax.d, lib*.d
node*.d, erlang*.d, j*.d, js*.d
php*.d, pl*.d, py*.d, rb*.d, sh*.d
mysql*.d, postgres*.d, redis*.d, riak*.d

Applications
DBs, all server types, ...
System Libraries
System Call Interface

ide*.d, mpt*.d

VFS

Sockets

File Systems

IP

Block Device Interface

Ethernet

Scheduler

TCP/UDP

Volume Managers
iosnoop, iotop
disklatency.d
satacmds.d
satalatency.d
scsicmds.d
scsilatency.d
sdretry.d, sdqueue.d

opensnoop, statsnoop
errinfo, dtruss, rwtop
rwsnoop, mmap.d, kill.d
shellsnoop, zonecalls.d
weblatency.d, fddist

Device Drivers

Virtual
Memory

priclass.d, pridist.d
cv_wakeup_slow.d
displat.d, capslat.d
minfbypid.d
pgpginbypid.d
macops.d, ixgbecheck.d
ngesnoop.d, ngelink.d

soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.d
sotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.d
ipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.d
tcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.d
tcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.d
tcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.d
tcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.d
DTrace, cont.
• What Linux needs to learn about DTrace:

Feature #1 is production safety
• There should be NO risk of panics or freezes. It should be an
everyday tool like top(1).

• Related to production safety is the minimization of overheads,
which can be done with in-kernel summaries. Some of the
Linux tools need to learn how to do this, too, as the overheads
of dump & post-analysis can get too high.

• Features aren't features if users don't use them
DTrace, cont.
• Linux might get DTrace-like capabilities via:
• dtrace4linux
• perf_events
• ktap
• SystemTap
• LTTng
• The Linux kernel has the necessary frameworks which are
sourced by these tools: tracepoints, kprobes, uprobes

• ... and another thing Linux can learn:
• DTrace has a memorable unofficial mascot (the ponycorn
by Deirdré Straughan, using General Zoi's pony creator).
She's created some for the Linux tools too...
dtrace4linux
• Two DTrace ports in development for Linux:
• 1. dtrace4linux
• https://github.com/dtrace4linux/linux
• Mostly by Paul Fox
• Not safe for production use yet;
I've used it to solve issues by
first reproducing them in the lab

• 2. Oracle Enterprise Linux DTrace
• Has been steady progress. Oracle
Linux 6.5 featured "full DTrace
integration" (Dec 2013)
dtrace4linux: Example
• Tracing ext4 read/write calls with size distributions (bytes):
#!/usr/sbin/dtrace -s
fbt::vfs_read:entry, fbt::vfs_write:entry
/stringof(((struct file *)arg0)->f_path.dentry->d_sb->s_type->name) == "ext4"/
{
@[execname, probefunc + 4] = quantize(arg2);
}
dtrace:::END
{
printa("n
}

%s %s (bytes)%@d", @);

# ./ext4rwsize.d
dtrace: script './ext4rwsize.d' matched 3 probes
^C
CPU
ID
FUNCTION:NAME
1
2
:END
[...]
vi read (bytes)
value ------------- Distribution ------------128 |
256 |
512 |@@@@@@@
1024 |@
2048 |
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
8192 |

count
0
1
17
2
0
75
0
dtrace4linux: Example
• Tracing TCP retransmits (tcpretransmit.d for 3.11.0-17):
#!/usr/sbin/dtrace -qs
dtrace:::BEGIN { trace("Tracing TCP retransmits... Ctrl-C to end.n"); }
fbt::tcp_retransmit_skb:entry {
this->so = (struct sock *)arg0;
this->d = (unsigned char *)&this->so->__sk_common; /* 1st is skc_daddr */
printf("%Y: retransmit to %d.%d.%d.%d, by:", walltimestamp,
this->d[0], this->d[1], this->d[2], this->d[3]);
stack(99);
}
# ./tcpretransmit.d
Tracing TCP retransmits... Ctrl-C to end.
1970 Jan 1 12:24:45: retransmit to 50.95.220.155, by:
kernel`tcp_retransmit_skb
kernel`dtrace_int3_handler+0xcc
kernel`dtrace_int3+0x3a
kernel`tcp_retransmit_skb+0x1
that
kernel`tcp_retransmit_timer+0x276
used to kernel`tcp_write_timer
kernel`tcp_write_timer_handler+0xa0
kernel`tcp_write_timer+0x6c
work...
kernel`call_timer_fn+0x36
kernel`tcp_write_timer
kernel`run_timer_softirq+0x1fd
kernel`__do_softirq+0xf7
kernel`call_softirq+0x1c
[...]
perf_events
• In the Linux tree. perf-tools package. Can do sampling, static
and dynamic tracing, with stack traces and local variables

• Often involves an enablecollectdumpanalyze cycle
• A powerful profiler, loaded with
features (eg, libunwind stacks!)

• Isn't programmable, and so has
limited ability for processing data
in-kernel. Does counts.

• You can post-process in userland, but the overheads of
passing all event data incurs
overhead; can be Gbytes of data
perf_events: Example
• Dynamic tracing of tcp_sendmsg() with size:
# perf probe --add 'tcp_sendmsg size'
[...]
# perf record -e probe:tcp_sendmsg -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.052 MB perf.data (~2252 samples) ]
# perf script
# ========
# captured on: Fri Jan 31 23:49:55 2014
# hostname : dev1
# os release : 3.13.1-ubuntu-12-opt
[...]
# ========
#
sshd
sshd
sshd
sshd
sshd
sshd
[...]

1301
1301
2371
2372
2372
2372

[001]
[001]
[000]
[000]
[001]
[001]

502.424719:
502.424814:
502.952590:
503.025023:
503.203776:
503.281312:

probe:tcp_sendmsg:
probe:tcp_sendmsg:
probe:tcp_sendmsg:
probe:tcp_sendmsg:
probe:tcp_sendmsg:
probe:tcp_sendmsg:

(ffffffff81505d80)
(ffffffff81505d80)
(ffffffff81505d80)
(ffffffff81505d80)
(ffffffff81505d80)
(ffffffff81505d80)

size=b0
size=40
size=27
size=3c0
size=98
size=2d0
ktap
• A new static/dynamic tracing tool for Linux
• Lightweight, simple, based on lua. Uses bytecode for
programmable and safe tracing

• Suitable for use on embedded Linux
• http://www.ktap.org
• Features are limited (still in
development), but I've been
impressed so far

• In development, so I can't recommend
production use yet
ktap: Example
• Summarize read() syscalls by return value (size/err):
# ktap -e 's = {}; trace syscalls:sys_exit_read { s[arg2] += 1 }
trace_end { histogram(s); }'
^C
value ------------- Distribution -------------11 |@@@@@@@@@@@@@@@@@@@@@@@@
18 |@@@@@@
histogram
72 |@@
1024 |@
of a key/
0 |
2 |
value table
446 |
515 |
48 |

count
50
13
6
4
2
2
1
1
1

• Write scripts (excerpt from syslatl.kp, highlighting time delta):
trace syscalls:sys_exit_* {
if (self[tid()] == nil) { return }
delta = (gettimeofday_us() - self[tid()]) / (step * 1000)
if (delta > max) { max = delta }
lats[delta] += 1
self[tid()] = nil
}
ktap: Setup
• Installing on Ubuntu (~5 minutes):
#
#
#
#
#
#

apt-get install git gcc make
git clone https://github.com/ktap/ktap
cd ktap
make
make install
make load

• Example dynamic tracing of tcp_sendmsg() and stacks:
# ktap -e 's = ptable(); trace probe:tcp_sendmsg { s[backtrace(12, -1)] <<< 1 }
trace_end { for (k, v in pairs(s)) { print(k, count(v), "n"); } }'
Tracing... Hit Ctrl-C to end.
^C
ftrace_regs_call
sock_aio_write
do_sync_write
vfs_write
SyS_write
system_call_fastpath
17
SystemTap
• Sampling, static and dynamic tracing, fully programmable
• The most featured of all the tools. Does some things that
DTrace can't (eg, loops).

• http://sourceware.org/systemtap
• Has its own tracing language,
which is compiled (gcc) into
kernel modules (slow; safe?)

• I used it a lot in 2011, and had
problems with panics/freezes;
never felt safe to run it on my
customer's production systems

• Needs vmlinux/debuginfo
SystemTap: Setup
• Setting up a SystemTap on ubuntu (2014):
# ./opensnoop.stp
semantic error: while resolving probe point: identifier 'syscall' at ./
opensnoop.stp:11:7
source: probe syscall.open
helpful tips...
^
semantic error: no match
Pass 2: analysis failed. [man error::pass2]
Tip: /usr/share/doc/systemtap/README.Debian should help you get started.
# more /usr/share/doc/systemtap/README.Debian
[...]
supported yet, see Debian bug #691167). To use systemtap you need to
manually install the linux-image-*-dbg and linux-header-* packages
that match your running kernel. To simplify this task you can use the
stap-prep command. Please always run this before reporting a bug.
# stap-prep
You need package linux-image-3.11.0-17-generic-dbgsym but it does not seem
to be available
Ubuntu -dbgsym packages are typically in a separate repository
Follow https://wiki.ubuntu.com/DebuggingProgramCrash to add this
repository
SystemTap: Setup, cont.
• After following ubuntu's DebuggingProgramCrash site:
# apt-get install linux-image-3.11.0-17-generic-dbgsym
Reading package lists... Done
Building dependency tree
but my perf issue
Reading state information... Done
is happening now...
The following NEW packages will be installed:
linux-image-3.11.0-17-generic-dbgsym
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 834 MB of archives.
After this operation, 2,712 MB of additional disk space will be used.
Get:1 http://ddebs.ubuntu.com/ saucy-updates/main linux-image-3.11.0-17generic-dbgsym amd64 3.11.0-17.31 [834 MB]
0% [1 linux-image-3.11.0-17-generic-dbgsym 1,581 kB/834 MB 0%]
215 kB/s
1h 4min 37s

• In fairness:
• 1. The Red Hat SystemTap developer's primary focus is to
get it working on Red Hat (where they say it works fine)

• 2. Lack of CTF isn't SystemTap's fault, as said earlier
SystemTap: Example
• opensnoop.stp:
#!/usr/bin/stap
probe begin
{
printf("n%6s %6s %16s %sn", "UID", "PID", "COMM", "PATH");
}
probe syscall.open
{
printf("%6d %6d %16s %sn", uid(), pid(), execname(), filename);
}

• Output:
# ./opensnoop.stp
UID
PID
0 11108
0 11108
0 11108
0 11108
0 11108
0 11108
[...]

COMM
sshd
sshd
sshd
sshd
sshd
sshd

PATH
<unknown>
<unknown>
/lib/x86_64-linux-gnu/libwrap.so.0
/lib/x86_64-linux-gnu/libpam.so.0
/lib/x86_64-linux-gnu/libselinux.so.1
/usr/lib/x86_64-linux-gnu/libck-connector.so.0
LTTng
• Profiling, static and dynamic tracing
• Based on Linux Trace Toolkit (LTT), which dabbled with
dynamic tracing (DProbes) in 2001

• Involves an enablestartstopview cycle
• Designed to be highly efficient
• I haven't used it properly yet,
so I don't have an informed
opinion (sorry LTTng, not
your fault)
LTTng, cont.
• Example sequence:
#
#
#
#
#
#

lttng
lttng
lttng
lttng
lttng
lttng

create session1
enable-event sched_process_exec -k
start
stop
view
destroy session1
DTrace, cont.
• 2014 is an exciting year for dynamic tracing and Linux –
one of these may reach maturity and win!
DTrace, final word
• What Oracle Solaris can learn from dtrace4linux:
• Dynamic tracing is crippled without source code
• Oracle could give customers scripts to run, but customers
lose any practical chance of writing them themselves

• If the dtrace4linux port is completed, it will be
more useful than Oracle Solaris
DTrace (unless they open
source it again)
Culture
• Sun Microsystems, out of necessity, developed a performance
engineering culture that had an appetite for understanding and
measuring the system: data-driven analysis

• If your several-million-dollar Ultra Enterprise 10000
doesn’t perform well and your company is losing nontrivial sums of money every minute because of it, you call
Sun Service and start demanding answers.
– System Performance Tuning [Musumeci 02]

• Includes the diagnostic cycle:
• hypothesis  instrumentation  data  hypothesis
• Some areas of Linux are already learning this (and some
areas of Solaris never did)
Culture, cont.
• Linux perf issues are often debugged using only top(1), *stat,
sar(1), strace(1), and tcpdump(8). These leave many areas
not measured.
top layer

Kernel

strace layer

If only
it were
this
simple...

tcpdump layer

• What about the other tools and metrics that are part of Linux?
perf_events, tracepoints/kprobes/uprobes, schedstats, I/O
accounting, blktrace, etc.
Culture, cont.
• Understand the system, and measure if at all possible
• Hypothesis  instrumentation  data  hypothesis
• Use perf_events (and others once they are stable/safe)
• strace(1) is intermediate, not advanced
• High performance doesn't just mean hardware, system, and
config. It foremost means analysis of performance limiters.
What Both can Learn
What Both can Learn
• Get better at benchmarking
Benchmarking
• How Linux vs Solaris performance is often compared
• Results incorrect or misleading almost 100% of the time
• Get reliable benchmark results by active benchmarking:
• Analyze performance of all components during the
benchmark, to identify limiters

• Takes time: benchmarks may take weeks, not days
• Opposite of passive benchmarking: fire and forget
• If SmartOS loses a benchmark, my management demands
answers, and I almost always overturn the result with analysis

• Test variance as well as steady workloads. Measure jitter.
These differ between systems as well.
Results
Out-of-the-Box
• Out-of-the-box, which is faster, Linux or Solaris?
Out-of-the-Box
• Out-of-the-box, which is faster, Linux or Solaris?
• There are many differences, it's basically a crapshoot.
I've seen everything from 5% to 5x differences either way

• It really depends on workload and platform characteristics:
• CPU usage, FS I/O, TCP I/O, TCP connect rates, default
TCP tunables, synchronous writes, lock calls, library calls,
multithreading, multiprocessor, network driver support, ...

• From prior Linux vs SmartOS comparisons, it's hard to pick a
winner, but in general my expectations are:

• SmartOS: for heavy file system or network I/O workloads
• Linux: for CPU-bound workloads
Out-of-the-Box Relevance
• Out-of-the-box performance isn't that interesting: if you care
about performance, why not do some analysis and tuning?
In Practice
• With some analysis and tuning, which is faster?
In Practice
• With some analysis and tuning, which is faster?
• Depends on the workload, and which differences matter to you
• With analysis, I can usually make SmartOS beat Linux
• DTrace and microstate accounting give me a big
advantage: I can analyze and fix all the small differences
(which sometimes exist as apps are developed for Linux)

• Although, perf/ktap/... are catching up
• I can do the same and make Linux beat SmartOS, but it's
much more time-consuming without an equivalent DTrace

• On the same hardware, it's more about the performance
engineer than the kernel. DTrace doesn't run itself.
At Joyent
• Joyent builds high performance SmartOS instances that
frequently beat Linux, but that's due to more than just the OS.
We use:

• Config: OS virtualization (Zones), all instances on ZFS
• Hardware: 10 GbE networks, plenty of DRAM for the ARC
• Analysis: DTrace to quickly root-cause issues and tune
• With SmartOS (ZFS, DTrace, Zones, KVM) configured for
performance, and with some analysis, I expect to win most
head-to-head performance comparisons
Learning From Linux
• Joyent has also been learning from Linux to improve SmartOS.
• Package repos: dedicated staff
• Community: dedicated staff
• Compiler options: dedicated repo staff
• KVM: ported!
• TCP TIME_WAIT: fixed localhost; more fixes to come
• sar: fix understood
• More to do...
References
• General Zoi's Awesome Pony Creator: http://
generalzoi.deviantart.com/art/Pony-Creator-v3-397808116

•

More perf examples: http://www.brendangregg.com/perf.html

•

More ktap examples: http://www.brendangregg.com/ktap.html

•

http://www.ktap.org/doc/tutorial.html

•

Flame Graphs: http://www.brendangregg.com/flamegraphs.html

•

Diagnostic cycle: http://dtrace.org/resources/bmc/cec_analytics.pdf

•

http://www.brendangregg.com/activebenchmarking.html

•

https://blogs.oracle.com/OTNGarage/entry/doing_more_with_dtrace_on
Thank You
• More info:
• illumos: http://illumos.org
• SmartOS: http://smartos.org
• DTrace: http://dtrace.org
• Joyent: http://joyent.com
• Systems Performance book:
http://www.brendangregg.com/sysperf.html

• Me: http://www.brendangregg.com, http://dtrace.org/blogs/
brendan, @brendangregg, brendan@joyent.com

More Related Content

What's hot

Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...PadmaGadiyar
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsJulian Mazzitelli
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLinaro
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3Linaro
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
 
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIOHigh Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIORebekah Rodriguez
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyScyllaDB
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 

What's hot (20)

Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practiceRoom 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Linux Internals - Part III
Linux Internals - Part IIILinux Internals - Part III
Linux Internals - Part III
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIOHigh Performance Object Storage in 30 Minutes with Supermicro and MinIO
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 

Viewers also liked

Solaris vs Linux
Solaris vs LinuxSolaris vs Linux
Solaris vs LinuxGrigale LTD
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014Brendan Gregg
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing ToolsBrendan Gregg
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningAdrian Cockcroft
 

Viewers also liked (20)

Solaris vs Linux
Solaris vs LinuxSolaris vs Linux
Solaris vs Linux
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Designing Tracing Tools
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and Tuning
 

Similar to What Linux can learn from Solaris performance and vice-versa

Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsNational Cheng Kung University
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudBrendan Gregg
 
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. TanenbaumA Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaumeurobsdcon
 
Chap1
Chap1Chap1
Chap1adisi
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
 
Computer system organization
Computer system organizationComputer system organization
Computer system organizationSyed Zaid Irshad
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...The Linux Foundation
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Scale11x lxc talk
Scale11x lxc talkScale11x lxc talk
Scale11x lxc talkdotCloud
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ HomeAbhishek Parolkar
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersVaibhav Sharma
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
Include os @ flossuk 2018
Include os @ flossuk 2018Include os @ flossuk 2018
Include os @ flossuk 2018Per Buer
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
"Hints" talk at Walchand College Sangli, March 2017
"Hints" talk at Walchand College Sangli, March 2017"Hints" talk at Walchand College Sangli, March 2017
"Hints" talk at Walchand College Sangli, March 2017Neeran Karnik
 

Similar to What Linux can learn from Solaris performance and vice-versa (20)

Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the Cloud
 
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. TanenbaumA Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
 
Chap1
Chap1Chap1
Chap1
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 
Clustering
ClusteringClustering
Clustering
 
Computer system organization
Computer system organizationComputer system organization
Computer system organization
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
 
Attack on the Core
Attack on the CoreAttack on the Core
Attack on the Core
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Scale11x lxc talk
Scale11x lxc talkScale11x lxc talk
Scale11x lxc talk
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
Device Drivers
Device DriversDevice Drivers
Device Drivers
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Include os @ flossuk 2018
Include os @ flossuk 2018Include os @ flossuk 2018
Include os @ flossuk 2018
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
"Hints" talk at Walchand College Sangli, March 2017
"Hints" talk at Walchand College Sangli, March 2017"Hints" talk at Walchand College Sangli, March 2017
"Hints" talk at Walchand College Sangli, March 2017
 

More from Brendan Gregg

Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance AnalysisBrendan Gregg
 

More from Brendan Gregg (20)

Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 

Recently uploaded

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 

Recently uploaded (20)

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 

What Linux can learn from Solaris performance and vice-versa

  • 1. What Linux can learn from Solaris performance and vice-versa Brendan Gregg Lead Performance Engineer brendan@joyent.com @brendangregg SCaLE12x February, 2014
  • 2. Linux vs Solaris Performance Differences DTrace libumem Symbols Microstate futex Accounting Up-to-date Mature fully packages preemptive Applications CPU scalability likely()/unlikely() CONFIGurable RCU DynTicks System Libraries SLUB System Call Interface ZFS btrfs I/O Scheduler VFS Sockets File Systems TCP/UDP Volume Managers IP Block Device Interface Ethernet Scheduler Virtual Memory Virtualization Device Drivers Zones KVM Crossbow FireEngine More device drivers Process swapping Overcommit & OOM Killer MPSS Lazy TLB
  • 3. whoami • Lead Performance Engineer at Joyent • Work on Linux and SmartOS performance • Work/Research: tools, visualizations, methodologies • Did kernel engineering at Sun Microsystems; worked on DTrace and ZFS
  • 4. Joyent • High-Performance Cloud Infrastructure • Compete on cloud instance/OS performance • OS Virtualization for bare metal performance (Zones) • Core developers of illumos/SmartOS and Node.js • Recently launched Manta: a high performance object store • KVM for Linux guests • Certified Ubuntu on Joyent cloud now available!
  • 5. SCaLE11x: Linux Performance Analysis strace Operating System pidstat netstat Hardware perf Applications DBs, all server types, ... mpstat System Libraries CPU Interconnect System Call Interface Linux Kernel perf dtrace stap lttng ktap VFS Sockets File Systems TCP/UDP Volume Managers IP Block Device Interface Ethernet Scheduler top ps pidstat Virtual Memory vmstat slabtop free Device Drivers iostat iotop blktrace perf Expander Interconnect I/O Bus I/O Bridge tcpdump I/O Controller Disk Memory Bus perf DRAM nicstat ip Network Controller Interface Transports Disk CPU 1 Various: Port Swap swapon ping Latest version: http://www.brendangregg.com/linuxperf.html Port traceroute sar dstat /proc
  • 6. SCaLE12x: Linux and Solaris Performance • Also covered in my new book, Systems Performance (Prentice Hall, 2013) • Focus is on understanding systems and the methodologies to analyze them. Linux and Solaris are used as examples
  • 7. Agenda • Why systems differ • Specific differences • What Solaris could learn from Linux • What Linux could learn from Solaris • What both can learn • Results
  • 8. Terminology • For convenience in this talk: • Linux = an operating system distribution which uses the Linux kernel. eg, Ubuntu. • Solaris = a distribution from the family of operating systems whose kernel code originated from Sun Solaris. • SmartOS = a Solaris-family distribution developed by Joyent, based on the illumos kernel, which was based on the OpenSolaris kernel, which was based on the Solaris kernel • System = everything you get when you pick a Linux or Solaris distribution: the kernel, libraries, tools, and package repos • Opinions in this presentation are my own, and I do not represent Oracle Solaris. I'll actually be talking about SmartOS.
  • 10. Why Systems Differ • Does the system even matter? • Will your application perform the same on Linux and Solaris?
  • 11. Example • Let's start with this simple program: perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }' • This counts to 100,000,000, setting a variable • To simplify this further, we're only interested in performance of the loop, which dominates runtime, not program startup.
  • 12. Example Results • One of these is Linux, the other SmartOS. Same hardware: systemA$ time perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }' real user sys 0m18.534s 0m18.450s 0m0.018s systemB$ time perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }' real user sys 0m16.253s 0m16.230s 0m0.010s • One system is 14% slower • Imagine that's your system – you'd want to know why • I recently had a customer with a complex performance issue try a one-liner like this, as a simple test, and with a similar result. It's an interesting tour of some system differences
  • 13. Possible Differences:Versioning • Different versions of Perl • Applications improve performance from release to release • Linux and SmartOS distributions use entirely different package repos; different software versions are common • Different compilers used to build Perl • Compilers come from package repos, too. I've seen 50% performance improvements by gcc version alone • Different compiler options used to build Perl • Application Makefile: #ifdef Linux -O3 #else -O0. ie, the performance difference is due to a Makefile decision • 32-bit vs 64-bit builds
  • 14. Possible Differences: OS • Different system libraries • If any library calls are used. eg: strcmp(), malloc(), memcpy(), ... These implementations vary between Linux and Solaris, and can perform very differently • Robert Mustacchi enhanced libumem to provide improved malloc() performance on SmartOS. This can make a noticeable difference for some workloads • Different background tasks • Runtime could be perturbed by OS daemons doing async housekeeping. These differ between Linux and Solaris
  • 15. Possible Differences: Observability • Can the 14% be root caused? • Observability tools differ. These don't cause the 14%, but can make the difference as to whether you can diagnose and fix it – or not. • DTrace has meant that anything can be solved; without an equivalent on Linux, you may have to live with that 14% • Although, Linux observability is getting much better...
  • 16. Possible Differences: Kernel • Can the kernel make a difference? ... As a reminder: perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }' • The program makes no system calls during the loop
  • 17. Possible Differences: Kernel, cont. • Can the kernel make a difference? ... As a reminder: perl -e 'for ($i = 0; $i < 100_000_000; $i++) { $s = "SCaLE12x" }' • The program makes no system calls during the loop • Yes, for a number of reasons: • Setting the string involves memory I/O, and the kernel controls memory placement. Allocating nearby memory in a NUMA system can significantly improve performance • The kernel may also control the CPU clock speed (eg, Intel SpeedStep), and vary it for temp or power reasons • The program could be perturbed by interrupts: eg, network I/O (although the performance effect should be small).
  • 18. Possible Differences: Kernel, cont. • During a perturbation, the kernel CPU scheduler may migrate the thread to another CPU, which can hurt performance (cold caches, memory locality) • Sure, but would that happen for this simple Perl program?
  • 19. Possible Differences: Kernel, cont. • During a perturbation, the kernel CPU scheduler may migrate the thread to another CPU, which can hurt performance (cold caches, memory locality) • Sure, but would that happen for this simple Perl program? # dtrace -n 'profile-99 /pid == $target/ { @ = lquantize(cpu, 0, 16, 1); }' -c ... value ------------- Distribution ------------- count < 0 | 0 0 | 1 1 |@@@@@ 483 Yes, a lot! 2 | 1 3 |@@@@@@@ 663 4 | 2 5 |@@@ This shows the CPUs 276 6 | 0 Perl ran on. It should 7 |@@@@@@ 512 8 | 1 stay put, but instead 9 |@@@ 288 10 | 0 runs across many. 11 |@@@@@@ 576 12 | 0 13 |@@@@@ 442 We've been fixing 14 | 2 15 |@@@ 308 this in SmartOS 16 | 0
  • 20. Kernel Myths and Realities • Myth: "The kernel gets out of the way for applications" • The only case where the kernel gets out of the way is when your software calls halt() or shutdown() • The performance difference between kernels may be small, eg, 5% – but I have already seen a 5x difference this year arch/ia64/kernel/smp.c: void cpu_die(void) { max_xtp(); local_irq_disable(); cpu_halt(); /* Should never be here */ BUG(); for (;;); } unintentional kernel humor...
  • 21. Other Differences • The previous example was simple. Any applications that do I/O (ie, everything) encounter more differences: • Different network stack implementations, including support for different TCP/IP features • Different file systems, storage I/O stacks • Different device drivers and device feature support • Different resource control implementations • Different virtualization technologies • Different community support: stackoverflow, meetups, ...
  • 22. Types of Differences App versions from system repos Syscall interface File systems: ZFS, btrfs, ... Compiler Observability OS System library options tools daemons implementations; malloc(), str*(), ... Applications DBs, all server types, ... System Libraries System Call Interface Resource controls Volume Managers IP Ethernet Scheduler TCP/UDP Block Device Interface Virtualization Technologies Sockets File Systems I/O scheduling VFS Scheduler classes and behavior Virtual Memory Virtualization Device Drivers Device driver TCP/IP stack support and features Memory allocation and locality Network device CPU fanout
  • 24. Specific Differences • Comparing systems is like comparing countries • I'm often asked: how's Australia different from the US? • Where do I start!? • I'll categorize performance differences into big or small, based on their engineering cost, not their performance effect • If one system is 2x faster than another for a given workload, the real question for the slower system is: • Is this a major undertaking to fix? • Is there a quick fix or workaround? • Using SmartOS for specific examples...
  • 25. Big Differences • Major bodies of perf work and other big differences, include: • Linux • up-to-date packages, large community, more device drivers, futex, RCU, btrfs, DynTicks, SLUB, I/O scheduling classes, overcommit & OOM killer, lazy TLB, likely()/ unlikely(), CONFIGurable • SmartOS • Mature: Zones, ZFS, DTrace, fully pre-emptable kernel • Microstate accounting, symbols by default, CPU scalability, MPSS, libumem, FireEngine, Crossbow, binary /proc, process swapping
  • 26. Big Differences: Linux Up-to-date packages Large community More device drivers Latest application versions, with the latest performance fixes Weird perf issue? May be answered on stackoverflow, or discussed at meetups There can be better coverage for high performing network cards or driver features futex Fast user-space mutex RCU Fast-performing read-copy updates btrfs Modern file system with pooled storage DynTicks SLUB Dynamic ticks: tickless kernel, reduces interrupts and saves power Simplified version of SLAB kernel memory allocator, improving performance
  • 27. Big Differences: Linux, cont. I/O scheduling classes Block I/O classes: deadline, anticipatory, ... Overcommit & OOM killer Doing more with less main memory Lazy TLB Higher performing munmap() likely()/unlikely() CONFIGurable Kernel is embedded with compiler information for branch prediction, improving runtime perf Lightweight kernels possible by disabling features
  • 28. Big Differences: SmartOS OS virtualization for high-performing server Mature Zones instances Fully-featured and high-performing modern Mature ZFS integrated file system with pooled storage Programmable dynamic and static tracing for Mature DTrace performance analysis Mature fully preSupport for real-time systems was an early Sun emptable kernel differentiator Numerous high-resolution thread state times for Microstate accounting performance debugging Symbols Symbols available for profiling tools by default CPU scalability Code is often tested, and bugs fixed, for large SMP servers (mainframes) MPSS Multiple page size support (not just hugepages)
  • 29. Big Differences: SmartOS, cont. libumem FireEngine Crossbow binary /proc Process swapping High-performing memory allocation library, with per-thread CPU caches High-performing TCP/IP stack enhancements, including vertical perimeters and IP fanout High-performing virtualized network interfaces, as used by OS virtualization Process statistics are binary (slightly more efficient) by default Apart from paging (what Linux calls swapping), Solaris can still swap out entire processes
  • 30. Big Differences: Linux vs SmartOS DTrace libumem Symbols Microstate futex Accounting Up-to-date Mature fully packages preemptive Applications CPU scalability likely()/unlikely() CONFIGurable RCU DynTicks System Libraries SLUB System Call Interface ZFS btrfs I/O Scheduler VFS Sockets File Systems TCP/UDP Volume Managers IP Block Device Interface Ethernet Scheduler Virtual Memory Resource Controls Device Drivers Zones Crossbow FireEngine More device drivers Process swapping Overcommit & OOM Killer MPSS Lazy TLB
  • 31. Small Differences • Smaller performance-related differences, tunables, bugs • Linux • glibc, better TCP defaults, better CPU affinity, perf stat, a working sar, htop, splice(), fadvise(), ionice, /usr/bin/time, mpstat %steal, voluntary preemption, swappiness, various accounting frameworks, tcp_tw_reuse/recycle, TCP tail loss probe, SO_REUSEPORT, ... • SmartOS • perf tools by default, kstat, vfsstat, iostat -e, ptime -m, CPU-only load averages, some STREAMS leftovers, ZFS SCSI cache flush by default, different TCP slow start default, ...
  • 32. Small Differences, cont. • Small differences change frequently: a feature is added to one kernel, then the other a year later; a difference may only exist for a short period of time. • These small kernel differences may still make a significant performance difference, but are classified as "small" based on engineering cost.
  • 33. System Similarities • It's important to note that many performance-related features are roughly equivalent: • Both are Unix-like systems: processes, kernel, syscalls, time sharing, preemption, virtual memory, paged virtual memory, demand paging, ... • Similar modern features: unified buffer cache, memory mapped files, multiprocessor support, CPU scheduling classes, CPU sets, 64-bit support, memory locality, resource controls, PIC profiler, epoll, ...
  • 34. Non Performance Differences • Linux • Open source (vs Oracle Solaris), "everyone knows it", embedded Linux, popular and well supported desktop/ laptop use... • SmartOS • SMF/FMA, contracts, privileges, mdb (postmortem debugging), gcore, crash dumps by default, ...
  • 35. WARNING The next sections are not suitable for those suffering Not Invented Here (NIH) syndrome, or those who are easily trolled
  • 36. What Solaris can learn from Linux performance
  • 37. What Solaris can learn from Linux performance • Packaging • Overcommit & OOM Killer • Community • SLUB • Compiler Options • Lazy TLB • likely()/unlikely() • TIME_WAIT Recycling • Tickless Kernel • sar • Process Swapping • KVM • Either learning what to do, or learning what not to do...
  • 38. Packaging • Linux package repositories are often well stocked and updated • Convenience aside, this can mean that users run newer software versions, along with the latest perf fixes • They find "Linux is faster", but the real difference is the version of: gcc, openssl, mysql, ... Solaris is unfairly blamed
  • 39. Packaging, cont. • Packaging is important and needs serious support • Dedicated staff, community • eg, Joyent has dedicated staff for the SmartOS package repo, which is based on pkgsrc from NetBSD • It's not just the operating system that matters; it's the ecosystem
  • 40. Community • A large community means: • Q/A sites have performance tips: stackoverflow, ... • Conference talks on performance (this one!), slides, video • Weird issues more likely found and fixed by someone else • More case studies shared: what tuning/config worked • A community helps people hear about the latest tools, tuning, and developments, and adopt them
  • 41. Community, cont. • Linux users expect to Google a question and find an answer on stackoverflow • Either foster a community to share content on tuning, tools, configuration, or, have staff to create such content. • Hire a good community manager!
  • 42. Compiler Options • Apps may compile with optimizations for Linux only. eg: Oh, ha ha ha • #ifdef Linux -O3 #else -O0 • Developers are often writing software on Linux, and that platform gets the most attention. (Works on my system.) • I've also seen 64-bit vs 32-bit. #ifdef Linux USE_FUTEX would be fine, since Solaris doesn't have them yet. • Last time I found compiler differences using Flame Graphs: Extra Function: UnzipDocid() Linux SmartOS
  • 43. Compiler Options, cont. • Can be addressed by tuning packages in the repo • Also file bugs/patches with developers to tune Makefiles • Someone has to do this, eg, package repo staff/community who find and do the workarounds anyway
  • 44. likely()/unlikely() • These become compiler hints (__builtin_expect) for branch prediction, and are throughout the Linux kernel: net/ipv4/tcp_output.c, tcp_transmit_skb(): [...] [...] if (likely(clone_it)) { if (unlikely(skb_cloned(skb))) skb = pskb_copy(skb, gfp_mask); else skb = skb_clone(skb, gfp_mask); if (unlikely(!skb)) return -ENOBUFS; } • The Solaris kernel doesn't do this yet • If the kernel is built using profile feedback instead – which should be even better – I don't know about it • The actual perf difference is likely to be small
  • 45. likely()/unlikely(), cont. • Could be adopted by kernel engineering • Might help readability, might not
  • 46. Tickless Kernel • Linux does this already (DynTicks), which reduces interrupts and improves processor power saving (good for laptops and embedded devices) • Solaris still has a clock() routine, to perform various kernel housekeeping functions • Called by default at 100 Hertz • If hires_tick=1, at 1000 Hertz • I've occasionally encountered perf issues involving 10 ms latencies, that don't exist on Linux • ... which become 1 ms latencies after setting hires_tick=1
  • 47. Tickless Kernel, cont. • Sun/Oracle did start work on this years ago...
  • 48. Process Swapping • Linux doesn't do it. Linux "swapping" means paging. • Process swapping made sense on the PDP-11/20, where the maximum process size was 64 Kbytes • Paging was added later in BSD, but the swapping code remained
  • 49. Process Swapping, cont. • Consider ditching it • All that time learning what swapping is could be spent learning more useful features
  • 50. Overcommit & OOM Killer • On Linux, malloc() may never fail • No virtual memory limit (main memory + swap) like Solaris by default. Tunable using vm.overcommit_memory • More user memory can be allocated than can be stored. May be great for small devices, running applications that sparsely use the memory they allocate • Don't worry, if Linux runs very low on available main memory, a sacrificial process is identified by the kernel and killed by the Out Of Memory (OOM) Killer, based on an OOM score • OOM score was just added to htop (1.0.2, Jan 2014):
  • 51. Overcommit & OOM Killer, cont. • Solaris can learn why not to do this (cautionary tale) • If an important app depended on this, and couldn't be fixed, the kernel could have an overcommit option that wasn't default • ... this is why so much new code doesn't check for ENOMEM
  • 52. SLUB • Linux integrated the Solaris kernel SLAB allocator, then later simplified it: The SLUB allocator • Removed object queues and per-CPU caches, leaving NUMA optimization to the page allocator's free lists • Worth considering?
  • 53. Lazy TLB • Lazy TLB mode: a way to delay TLB updates (shootdowns) • munmap() heavy workloads on Solaris can experience heavy HAT CPU cross calls. Linux doesn't seem to have this problem. TLB Lazy TLB As seen by Solaris Correct Reckless As seen by Linux Paranoid Fast
  • 54. Lazy TLB, cont. • This difference needs to be investigated, quantified, and possibly fixed (tunable?)
  • 55. TIME_WAIT Recycling • A localhost HTTP benchmark on Solaris: # netstat -s 1 | grep ActiveOpen tcpActiveOpens =728004 tcpActiveOpens = 0 tcpActiveOpens = 4939 tcpActiveOpens = 5849 tcpActiveOpens = 1341 tcpActiveOpens = 1006 tcpActiveOpens = 872 tcpActiveOpens = 932 tcpActiveOpens = 879 tcpActiveOpens = 562 tcpActiveOpens = 613 tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens tcpPassiveOpens =726547 = 0 = 4939 = 5798 = 1292 = 1008 = 870 = 932 = 879 = 586 = 594 Fast Slow • Connection rate drops by 5x due to sessions in TIME_WAIT • Linux avoids this by recycling better (tcp_tw_reuse/recycle) • Usually doesn't hurt production workloads, as it must be a flood of connections from a single host to a single port. It comes up in benchmarks/evaluations.
  • 56. TIME_WAIT Recycling, cont. • Improve tcp_time_wait_processing() • This is being fixed for illumos/SmartOS
  • 57. sar • Linux sar is awesome, and has extra options: $ sar -n 11:16:34 11:16:35 11:16:35 11:16:35 11:16:35 11:16:35 11:16:35 11:16:35 DEV -n TCP -n ETCP 1 PM IFACE rxpck/s PM eth0 104.00 PM eth1 7.00 PM ip6tnl0 0.00 PM lo 0.00 PM ip_vti0 0.00 PM sit0 0.00 PM tunl0 0.00 11:16:34 PM 11:16:35 PM active/s passive/s 0.00 0.00 11:16:34 PM 11:16:35 PM atmptf/s 0.00 txpck/s 675.00 0.00 0.00 0.00 0.00 0.00 0.00 iseg/s 99.00 rxkB/s 7.35 0.38 0.00 0.00 0.00 0.00 0.00 txkB/s 984.72 0.00 0.00 0.00 0.00 0.00 0.00 rxcmp/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 oseg/s 681.00 estres/s retrans/s isegerr/s 0.00 0.00 0.00 orsts/s 0.00 • -n DEV: network interface statistics • -n TCP: TCP statistics • -n ETCP: TCP error statistics • Linux sar's other metrics are also far less buggy txcmp/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 rxmcst/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  • 58. sar, cont. • Sar must be fixed for the 21st century • Use the Linux sar options and column names, which follow a neat convention
  • 59. KVM • The KVM type 2 hypervisor originated for Linux • While Zones are faster, KVM can run different kernels (Linux) • vs Type 1 hypervisors (Xen): • KVM has better perf observability, as it can use the regular OS tools • KVM can use OS resource controls, just like any other process
  • 60. KVM, cont. • illumos/SmartOS learned this, Joyent ported KVM! • Oracle Solaris doesn't have it yet
  • 61. What Linux can learn from Solaris performance
  • 62. What Linux can learn from Solaris performance • ZFS • Zones • STREAMS • Symbols • prstat -mLc • vfsstat • DTrace • Culture • Either learning what to do, or learning what not to do...
  • 63. ZFS • More performance features than you can shake a stick at: • Pooled storage, COW, logging (batching writes), ARC, variable block sizes, dynamic striping, intelligent prefetch, multiple prefetch streams, snapshots, ZIO pipeline, compression (lzjb can improve perf by reducing I/O load), SLOG, L2ARC, vdev cache, data deduplication (possibly better cache reach) • The Adaptive Replacement Cache (ARC) can make a big difference: it can resist perturbations (backups) and stay warm • ZFS I/O throttling (in illumos/SmartOS) throttles disk I/O at the VFS layer, to solve cloud noisy neighbor issues • ZFS is Mature. Widespread use in criticial environments
  • 64. ZFS, cont. • Linux has been learning about ZFS for a while • http://zfsonlinux.org/ • btrfs
  • 65. Zones • Ancestry: chroot  FreeBSD jails  Solaris Zones • OS Virtualization. Zero I/O path overheads.
  • 66. Zones, cont. • Compare to HW Virtualization: • This shows the initial I/O control flow. There are optimizations/ variants for improving the HW Virt I/O path, esp for Xen.
  • 67. Zones, cont. • Comparing 1 GB instances on Joyent • Max network throughput: • KVM: 400 Mbits/sec • Zones: 4.54 Gbits/sec (over 10x) • Max network IOPS: • KVM: 18,000 packets/sec • Zones: 78,000 packets/sec (over 4x) • Numbers go much higher for larger instances • http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen
  • 68. Zones, cont. • Performance analysis for Zones is also easy. Analyze the applications as usual: Operating System Applications Zone . analyze ... System Libraries System Call Interface Kernel VFS Sockets File Systems TCP/UDP Volume Managers IP Block Device Interface Ethernet Resource Controls Device Drivers Firmware Metal Scheduler Virtual Memory
  • 69. Zones, cont. Host Applications • Compared QEMU to HW Virt (KVM): analyze Guest Applications System Libraries System Call Interface VFS File Systems TCP/UDP Volume Managers IP Block Device Interface Linux kernel Sockets Ethernet Scheduler ... Virtual Memory Resource Controls correlate Device Drivers observability boundary Virtual Devices System Libraries System Call Interface host kernel KVM VFS Sockets File Systems TCP/UDP Volume Managers IP Block Device Interface Ethernet Resource Controls Device Drivers Firmware Metal Scheduler Virtual Memory
  • 70. Zones, cont. • Linux has been learning: LXC & cgroups, but not widespread adoption yet. Docker will likely drive adoption.
  • 71. STREAMS • AT&T modular I/O subsystem • Like Unix shell pipes, but for kernel messages. Can push modules into the stream to customize processing • Introduced (fully) in Unix 8th Ed Research Unix, became SVr4 STREAMS, and was used by Solaris for network TCP/IP stack • With greater demands for TCP/IP performance, the overheads of STREAMS reduced scalability • Sun switched high-performing paths to be direct function calls
  • 72. STREAMS, cont. • A cautionary tale: not good for high performance code paths
  • 73. Symbols • Compilers on Linux strip symbols by default, making perf profiler output inscrutable without the dbgsym packages 57.14% sshd libc-2.15.so [.] connect | --- connect | |--25.00%-- 0x7ff3c1cddf29 | |--25.00%-- 0x7ff3bfe82761 What?? | 0x7ff3bfe82b7c | |--25.00%-- 0x7ff3bfe82dfc --25.00%-- [...] • Linux compilers also drop frame pointers, making stacks hard to profile. Please use -fno-omit-frame-pointer to stop this. • as a workaround, perf_events has "-g dwarf" for libunwind • Solaris keeps symbols and stacks, and often has CTF too, making Mean Time To Flame Graph very fast
  • 74. Symbols, cont. Flame Graphs need symbols and stacks
  • 75. Symbols, cont. • Keep symbols and frame pointers. Faster resolution for performance analysis and troubleshooting.
  • 76. prstat -mLc • Per-thread time broken down into states, from a top-like tool: $ prstat -mLc 1 PID USERNAME USR SYS 63037 root 83 16 12927 root 14 49 63037 root 0.5 0.6 [...] TRP 0.1 0.0 0.0 TFL 0.0 0.0 0.0 DFL 0.0 0.0 0.0 LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 0.0 0.0 0.5 30 243 45K 0 node/1 0.0 34 2.9 6K 365 .1M 0 ab/1 3.7 95 0.4 1K 0 1K 0 node/2 • These columns narrow an investigation immediately, and have been critical for solving countless issues. Unsung hero of Solaris performance analysis • Well suited for the Thread State Analysis (TSA) methodology, which I've taught in class, and has helped students get started and fix unknown perf issues • http://www.brendangregg.com/tsamethod.html
  • 77. prstat -mLc, cont. • Linux has various thread states: delay accounting, I/O accounting, schedstats. Can they be added to htop? See TSA Method for use case and desired metrics.
  • 78. vfsstat • VFS-level iostat (added to SmartOS, not Solaris): $ vfsstat -M 1 r/s w/s Mr/s Mw/s ractv wactv read_t writ_t 761.0 267.1 15.4 1.6 0.0 0.0 12.0 24.7 4076.8 2796.0 41.7 2.3 0.1 0.0 16.6 3.1 4945.1 2807.4 157.1 2.3 0.1 0.0 25.2 3.4 3550.9 1910.4 109.7 1.6 0.4 0.0 112.9 3.3 [...] %r 0 6 12 39 %w 0 0 0 0 d/s 1.3 0.0 0.0 0.0 del_t 23.5 0.0 0.0 0.0 zone 5716a5b6 5716a5b6 5716a5b6 5716a5b6 • Shows what the applications request from the file system, and the true performance that they experience • iostat includes asynchronous I/O • vfsstat sees issues iostat can't: • lock contention • resource control throttling Applications System Libraries System Call Interface vfsstat VFS File Systems Volume Managers iostat Block Device Interface Device Drivers Storage Devices
  • 79. vfsstat, cont. • Add vfsstat, or VFS metrics to sar.
  • 80. DTrace • Programmable, real-time, dynamic and static tracing, for performance analysis and troubleshooting, in dev and production • Used on Solaris, illumos/SmartOS, Mac OS X, FreeBSD, ... • Solve virtually any perf issue. eg, fix the earlier Perl 15% delta, no matter where the problem is. Without DTrace's capabilities, you may have to wear that 15%. • Users can write their own custom DTrace one-liners and scripts, or use/modify others (eg, mine).
  • 81. DTrace: illumos Scripts • Some of my DTrace scripts from the DTraceToolkit, DTrace book... cifs*.d, iscsi*.d :Services nfsv3*.d, nfsv4*.d ssh*.d, httpd*.d Language Providers: Databases: fswho.d, fssnoop.d sollife.d solvfssnoop.d dnlcsnoop.d zfsslower.d ziowait.d ziostacks.d spasync.d metaslab_free.d hotuser, umutexmax.d, lib*.d node*.d, erlang*.d, j*.d, js*.d php*.d, pl*.d, py*.d, rb*.d, sh*.d mysql*.d, postgres*.d, redis*.d, riak*.d Applications DBs, all server types, ... System Libraries System Call Interface ide*.d, mpt*.d VFS Sockets File Systems IP Block Device Interface Ethernet Scheduler TCP/UDP Volume Managers iosnoop, iotop disklatency.d satacmds.d satalatency.d scsicmds.d scsilatency.d sdretry.d, sdqueue.d opensnoop, statsnoop errinfo, dtruss, rwtop rwsnoop, mmap.d, kill.d shellsnoop, zonecalls.d weblatency.d, fddist Device Drivers Virtual Memory priclass.d, pridist.d cv_wakeup_slow.d displat.d, capslat.d minfbypid.d pgpginbypid.d macops.d, ixgbecheck.d ngesnoop.d, ngelink.d soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.d sotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.d ipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.d tcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.d tcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.d tcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.d tcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.d
  • 82. DTrace, cont. • What Linux needs to learn about DTrace: Feature #1 is production safety • There should be NO risk of panics or freezes. It should be an everyday tool like top(1). • Related to production safety is the minimization of overheads, which can be done with in-kernel summaries. Some of the Linux tools need to learn how to do this, too, as the overheads of dump & post-analysis can get too high. • Features aren't features if users don't use them
  • 83. DTrace, cont. • Linux might get DTrace-like capabilities via: • dtrace4linux • perf_events • ktap • SystemTap • LTTng • The Linux kernel has the necessary frameworks which are sourced by these tools: tracepoints, kprobes, uprobes • ... and another thing Linux can learn: • DTrace has a memorable unofficial mascot (the ponycorn by Deirdré Straughan, using General Zoi's pony creator). She's created some for the Linux tools too...
  • 84. dtrace4linux • Two DTrace ports in development for Linux: • 1. dtrace4linux • https://github.com/dtrace4linux/linux • Mostly by Paul Fox • Not safe for production use yet; I've used it to solve issues by first reproducing them in the lab • 2. Oracle Enterprise Linux DTrace • Has been steady progress. Oracle Linux 6.5 featured "full DTrace integration" (Dec 2013)
  • 85. dtrace4linux: Example • Tracing ext4 read/write calls with size distributions (bytes): #!/usr/sbin/dtrace -s fbt::vfs_read:entry, fbt::vfs_write:entry /stringof(((struct file *)arg0)->f_path.dentry->d_sb->s_type->name) == "ext4"/ { @[execname, probefunc + 4] = quantize(arg2); } dtrace:::END { printa("n } %s %s (bytes)%@d", @); # ./ext4rwsize.d dtrace: script './ext4rwsize.d' matched 3 probes ^C CPU ID FUNCTION:NAME 1 2 :END [...] vi read (bytes) value ------------- Distribution ------------128 | 256 | 512 |@@@@@@@ 1024 |@ 2048 | 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8192 | count 0 1 17 2 0 75 0
  • 86. dtrace4linux: Example • Tracing TCP retransmits (tcpretransmit.d for 3.11.0-17): #!/usr/sbin/dtrace -qs dtrace:::BEGIN { trace("Tracing TCP retransmits... Ctrl-C to end.n"); } fbt::tcp_retransmit_skb:entry { this->so = (struct sock *)arg0; this->d = (unsigned char *)&this->so->__sk_common; /* 1st is skc_daddr */ printf("%Y: retransmit to %d.%d.%d.%d, by:", walltimestamp, this->d[0], this->d[1], this->d[2], this->d[3]); stack(99); } # ./tcpretransmit.d Tracing TCP retransmits... Ctrl-C to end. 1970 Jan 1 12:24:45: retransmit to 50.95.220.155, by: kernel`tcp_retransmit_skb kernel`dtrace_int3_handler+0xcc kernel`dtrace_int3+0x3a kernel`tcp_retransmit_skb+0x1 that kernel`tcp_retransmit_timer+0x276 used to kernel`tcp_write_timer kernel`tcp_write_timer_handler+0xa0 kernel`tcp_write_timer+0x6c work... kernel`call_timer_fn+0x36 kernel`tcp_write_timer kernel`run_timer_softirq+0x1fd kernel`__do_softirq+0xf7 kernel`call_softirq+0x1c [...]
  • 87. perf_events • In the Linux tree. perf-tools package. Can do sampling, static and dynamic tracing, with stack traces and local variables • Often involves an enablecollectdumpanalyze cycle • A powerful profiler, loaded with features (eg, libunwind stacks!) • Isn't programmable, and so has limited ability for processing data in-kernel. Does counts. • You can post-process in userland, but the overheads of passing all event data incurs overhead; can be Gbytes of data
  • 88. perf_events: Example • Dynamic tracing of tcp_sendmsg() with size: # perf probe --add 'tcp_sendmsg size' [...] # perf record -e probe:tcp_sendmsg -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.052 MB perf.data (~2252 samples) ] # perf script # ======== # captured on: Fri Jan 31 23:49:55 2014 # hostname : dev1 # os release : 3.13.1-ubuntu-12-opt [...] # ======== # sshd sshd sshd sshd sshd sshd [...] 1301 1301 2371 2372 2372 2372 [001] [001] [000] [000] [001] [001] 502.424719: 502.424814: 502.952590: 503.025023: 503.203776: 503.281312: probe:tcp_sendmsg: probe:tcp_sendmsg: probe:tcp_sendmsg: probe:tcp_sendmsg: probe:tcp_sendmsg: probe:tcp_sendmsg: (ffffffff81505d80) (ffffffff81505d80) (ffffffff81505d80) (ffffffff81505d80) (ffffffff81505d80) (ffffffff81505d80) size=b0 size=40 size=27 size=3c0 size=98 size=2d0
  • 89. ktap • A new static/dynamic tracing tool for Linux • Lightweight, simple, based on lua. Uses bytecode for programmable and safe tracing • Suitable for use on embedded Linux • http://www.ktap.org • Features are limited (still in development), but I've been impressed so far • In development, so I can't recommend production use yet
  • 90. ktap: Example • Summarize read() syscalls by return value (size/err): # ktap -e 's = {}; trace syscalls:sys_exit_read { s[arg2] += 1 } trace_end { histogram(s); }' ^C value ------------- Distribution -------------11 |@@@@@@@@@@@@@@@@@@@@@@@@ 18 |@@@@@@ histogram 72 |@@ 1024 |@ of a key/ 0 | 2 | value table 446 | 515 | 48 | count 50 13 6 4 2 2 1 1 1 • Write scripts (excerpt from syslatl.kp, highlighting time delta): trace syscalls:sys_exit_* { if (self[tid()] == nil) { return } delta = (gettimeofday_us() - self[tid()]) / (step * 1000) if (delta > max) { max = delta } lats[delta] += 1 self[tid()] = nil }
  • 91. ktap: Setup • Installing on Ubuntu (~5 minutes): # # # # # # apt-get install git gcc make git clone https://github.com/ktap/ktap cd ktap make make install make load • Example dynamic tracing of tcp_sendmsg() and stacks: # ktap -e 's = ptable(); trace probe:tcp_sendmsg { s[backtrace(12, -1)] <<< 1 } trace_end { for (k, v in pairs(s)) { print(k, count(v), "n"); } }' Tracing... Hit Ctrl-C to end. ^C ftrace_regs_call sock_aio_write do_sync_write vfs_write SyS_write system_call_fastpath 17
  • 92. SystemTap • Sampling, static and dynamic tracing, fully programmable • The most featured of all the tools. Does some things that DTrace can't (eg, loops). • http://sourceware.org/systemtap • Has its own tracing language, which is compiled (gcc) into kernel modules (slow; safe?) • I used it a lot in 2011, and had problems with panics/freezes; never felt safe to run it on my customer's production systems • Needs vmlinux/debuginfo
  • 93. SystemTap: Setup • Setting up a SystemTap on ubuntu (2014): # ./opensnoop.stp semantic error: while resolving probe point: identifier 'syscall' at ./ opensnoop.stp:11:7 source: probe syscall.open helpful tips... ^ semantic error: no match Pass 2: analysis failed. [man error::pass2] Tip: /usr/share/doc/systemtap/README.Debian should help you get started. # more /usr/share/doc/systemtap/README.Debian [...] supported yet, see Debian bug #691167). To use systemtap you need to manually install the linux-image-*-dbg and linux-header-* packages that match your running kernel. To simplify this task you can use the stap-prep command. Please always run this before reporting a bug. # stap-prep You need package linux-image-3.11.0-17-generic-dbgsym but it does not seem to be available Ubuntu -dbgsym packages are typically in a separate repository Follow https://wiki.ubuntu.com/DebuggingProgramCrash to add this repository
  • 94. SystemTap: Setup, cont. • After following ubuntu's DebuggingProgramCrash site: # apt-get install linux-image-3.11.0-17-generic-dbgsym Reading package lists... Done Building dependency tree but my perf issue Reading state information... Done is happening now... The following NEW packages will be installed: linux-image-3.11.0-17-generic-dbgsym 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 834 MB of archives. After this operation, 2,712 MB of additional disk space will be used. Get:1 http://ddebs.ubuntu.com/ saucy-updates/main linux-image-3.11.0-17generic-dbgsym amd64 3.11.0-17.31 [834 MB] 0% [1 linux-image-3.11.0-17-generic-dbgsym 1,581 kB/834 MB 0%] 215 kB/s 1h 4min 37s • In fairness: • 1. The Red Hat SystemTap developer's primary focus is to get it working on Red Hat (where they say it works fine) • 2. Lack of CTF isn't SystemTap's fault, as said earlier
  • 95. SystemTap: Example • opensnoop.stp: #!/usr/bin/stap probe begin { printf("n%6s %6s %16s %sn", "UID", "PID", "COMM", "PATH"); } probe syscall.open { printf("%6d %6d %16s %sn", uid(), pid(), execname(), filename); } • Output: # ./opensnoop.stp UID PID 0 11108 0 11108 0 11108 0 11108 0 11108 0 11108 [...] COMM sshd sshd sshd sshd sshd sshd PATH <unknown> <unknown> /lib/x86_64-linux-gnu/libwrap.so.0 /lib/x86_64-linux-gnu/libpam.so.0 /lib/x86_64-linux-gnu/libselinux.so.1 /usr/lib/x86_64-linux-gnu/libck-connector.so.0
  • 96. LTTng • Profiling, static and dynamic tracing • Based on Linux Trace Toolkit (LTT), which dabbled with dynamic tracing (DProbes) in 2001 • Involves an enablestartstopview cycle • Designed to be highly efficient • I haven't used it properly yet, so I don't have an informed opinion (sorry LTTng, not your fault)
  • 97. LTTng, cont. • Example sequence: # # # # # # lttng lttng lttng lttng lttng lttng create session1 enable-event sched_process_exec -k start stop view destroy session1
  • 98. DTrace, cont. • 2014 is an exciting year for dynamic tracing and Linux – one of these may reach maturity and win!
  • 99. DTrace, final word • What Oracle Solaris can learn from dtrace4linux: • Dynamic tracing is crippled without source code • Oracle could give customers scripts to run, but customers lose any practical chance of writing them themselves • If the dtrace4linux port is completed, it will be more useful than Oracle Solaris DTrace (unless they open source it again)
  • 100. Culture • Sun Microsystems, out of necessity, developed a performance engineering culture that had an appetite for understanding and measuring the system: data-driven analysis • If your several-million-dollar Ultra Enterprise 10000 doesn’t perform well and your company is losing nontrivial sums of money every minute because of it, you call Sun Service and start demanding answers. – System Performance Tuning [Musumeci 02] • Includes the diagnostic cycle: • hypothesis  instrumentation  data  hypothesis • Some areas of Linux are already learning this (and some areas of Solaris never did)
  • 101. Culture, cont. • Linux perf issues are often debugged using only top(1), *stat, sar(1), strace(1), and tcpdump(8). These leave many areas not measured. top layer Kernel strace layer If only it were this simple... tcpdump layer • What about the other tools and metrics that are part of Linux? perf_events, tracepoints/kprobes/uprobes, schedstats, I/O accounting, blktrace, etc.
  • 102. Culture, cont. • Understand the system, and measure if at all possible • Hypothesis  instrumentation  data  hypothesis • Use perf_events (and others once they are stable/safe) • strace(1) is intermediate, not advanced • High performance doesn't just mean hardware, system, and config. It foremost means analysis of performance limiters.
  • 103. What Both can Learn
  • 104. What Both can Learn • Get better at benchmarking
  • 105. Benchmarking • How Linux vs Solaris performance is often compared • Results incorrect or misleading almost 100% of the time • Get reliable benchmark results by active benchmarking: • Analyze performance of all components during the benchmark, to identify limiters • Takes time: benchmarks may take weeks, not days • Opposite of passive benchmarking: fire and forget • If SmartOS loses a benchmark, my management demands answers, and I almost always overturn the result with analysis • Test variance as well as steady workloads. Measure jitter. These differ between systems as well.
  • 107. Out-of-the-Box • Out-of-the-box, which is faster, Linux or Solaris?
  • 108. Out-of-the-Box • Out-of-the-box, which is faster, Linux or Solaris? • There are many differences, it's basically a crapshoot. I've seen everything from 5% to 5x differences either way • It really depends on workload and platform characteristics: • CPU usage, FS I/O, TCP I/O, TCP connect rates, default TCP tunables, synchronous writes, lock calls, library calls, multithreading, multiprocessor, network driver support, ... • From prior Linux vs SmartOS comparisons, it's hard to pick a winner, but in general my expectations are: • SmartOS: for heavy file system or network I/O workloads • Linux: for CPU-bound workloads
  • 109. Out-of-the-Box Relevance • Out-of-the-box performance isn't that interesting: if you care about performance, why not do some analysis and tuning?
  • 110. In Practice • With some analysis and tuning, which is faster?
  • 111. In Practice • With some analysis and tuning, which is faster? • Depends on the workload, and which differences matter to you • With analysis, I can usually make SmartOS beat Linux • DTrace and microstate accounting give me a big advantage: I can analyze and fix all the small differences (which sometimes exist as apps are developed for Linux) • Although, perf/ktap/... are catching up • I can do the same and make Linux beat SmartOS, but it's much more time-consuming without an equivalent DTrace • On the same hardware, it's more about the performance engineer than the kernel. DTrace doesn't run itself.
  • 112. At Joyent • Joyent builds high performance SmartOS instances that frequently beat Linux, but that's due to more than just the OS. We use: • Config: OS virtualization (Zones), all instances on ZFS • Hardware: 10 GbE networks, plenty of DRAM for the ARC • Analysis: DTrace to quickly root-cause issues and tune • With SmartOS (ZFS, DTrace, Zones, KVM) configured for performance, and with some analysis, I expect to win most head-to-head performance comparisons
  • 113. Learning From Linux • Joyent has also been learning from Linux to improve SmartOS. • Package repos: dedicated staff • Community: dedicated staff • Compiler options: dedicated repo staff • KVM: ported! • TCP TIME_WAIT: fixed localhost; more fixes to come • sar: fix understood • More to do...
  • 114. References • General Zoi's Awesome Pony Creator: http:// generalzoi.deviantart.com/art/Pony-Creator-v3-397808116 • More perf examples: http://www.brendangregg.com/perf.html • More ktap examples: http://www.brendangregg.com/ktap.html • http://www.ktap.org/doc/tutorial.html • Flame Graphs: http://www.brendangregg.com/flamegraphs.html • Diagnostic cycle: http://dtrace.org/resources/bmc/cec_analytics.pdf • http://www.brendangregg.com/activebenchmarking.html • https://blogs.oracle.com/OTNGarage/entry/doing_more_with_dtrace_on
  • 115. Thank You • More info: • illumos: http://illumos.org • SmartOS: http://smartos.org • DTrace: http://dtrace.org • Joyent: http://joyent.com • Systems Performance book: http://www.brendangregg.com/sysperf.html • Me: http://www.brendangregg.com, http://dtrace.org/blogs/ brendan, @brendangregg, brendan@joyent.com