SlideShare a Scribd company logo
ZFS Performance
Analysis and Tools
Brendan Gregg
Lead Performance Engineer
brendan@joyent.com
@brendangregg

ZFS Day
October, 2012
whoami

•
•
•

G’Day, I’m Brendan
These days I do systems performance analysis of the cloud
Regarding ZFS:

•

Perf analysis of ZFS (mostly using DTrace) for 5+ years,
both enterprise and cloud usage

•
•

Wrote many DTrace-based ZFS perf analysis tools
including those in the DTrace book

Developed ZFS L2ARC while at Sun
Who is Joyent

•
•

Cloud computing provider (public cloud + software)
Use ZFS as much as possible:

•
•
•

Host storage
Guest storage: OS virtualization (SmartOS), and KVM guests
(Linux, Windows)

We use ZFS because

•

Reliability: checksums, COW

•

Features: snapshots, clones, compression, ...

•

Performance: large ARCs

•

It can boil oceans
Joyent, cont.

•
•

We build tools for ZFS automation and observability.
Performance is a key company feature.

•

Need to solve FS/disk issues fast.
Agenda

•

My top 12 tools for ZFS performance analysis (unsorted):

•
•
•
•
•
•
•
•
•
•
•
•

iostat
vfsstat
zfsslower.d
iosnoop
iostacks
metaslab_free.d
spasync.d
arcstat
arcaccess.d
latency counters
scatter plots
heat maps (CA)

For cloud computing
from within a Zone, add:

•
•

mysqld_pid_fslatency.d
syscall with fi_fs == zfs
Functional diagram: full stack

•

Unix 101

Process
User-Land
Kernel

logical I/O

VFS
ZFS

...

Block Device Interface
physical I/O

Disks

Syscall
Interface
Functional diagram: full stack, cont.

•

Unix 102

User-Land
Kernel

sync.

Process
Syscall
Interface
VFS

ZFS

...

Block Device Interface
Disks

iostat(1)
often async:
write buffering,
read ahead
Functional diagram: full stack, cont.

•

DTrace 301

mysql_pid_fslatency.d
syscall with
fi_fs == zfs

Process
User-Land
Kernel
VFS

zioslower.d
spasync.d
metaslab_free.d
arcstat.pl
arcaccess.d

Syscall
Interface
vfsstat

ZFS

...
iostacks.d

Block Device Interface

kernel drivers as needed
see DTrace book chap 4

Disks

iostat
iosnoop
ZFS Internals

•
•

That’s just my top 12
Use more as needed
ZFS Internals

•
•

That’s just my top 12
Use more as needed

DTRACE
 ALL
THE
 THINGS!

http://hub.opensolaris.org/bin/view/Community+Group+zfs/source
http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
iostat

•

Block-device level (almost disk-level) I/O statistics:
$ iostat -xnz 1
[...]
r/s
0.0
1.0

w/s
11.0
381.0

r/s
0.0
1.0

w/s
9.0
154.9

r/s
0.0
6.0

w/s
8.0
0.0

extended device statistics
kr/s
kw/s wait actv wsvc_t asvc_t %w %b device
0.0
52.0 0.0 0.0
0.0
1.0
0
1 c0t0d0
16.0 43325.5 0.0 4.0
0.0
10.4
1 12 c0t1d0
extended device statistics
kr/s
kw/s wait actv wsvc_t asvc_t %w %b device
0.0
34.0 0.0 0.0
0.0
0.1
0
0 c0t0d0
16.0 1440.5 0.0 2.0
0.0
12.6
0 10 c0t1d0
extended device statistics
kr/s
kw/s wait actv wsvc_t asvc_t %w %b device
0.0
36.0 0.0 0.0
0.0
0.0
0
0 c0t0d0
96.0
0.0 0.0 0.0
0.0
7.9
0
4 c0t1d0

ZFS-Disk Workload

Disk Resulting Performance
iostat, cont.

•

Effective tool for a class of disk issues, especially:
$ iostat -xnz 1
[...]
r/s
0.0
r/s
0.0

w/s
0.0

r/s
0.0

w/s
0.0

r/s
0.0

w/s
0.0

r/s
0.0

•

w/s
0.0

w/s
0.0

extended
kr/s
kw/s
0.0
0.0
extended
kr/s
kw/s
0.0
0.0
extended
kr/s
kw/s
0.0
0.0
extended
kr/s
kw/s
0.0
0.0
extended
kr/s
kw/s
0.0
0.0

device statistics
wait actv wsvc_t asvc_t
0.0 4.0
0.0
0.0
device statistics
wait actv wsvc_t asvc_t
0.0 4.0
0.0
0.0
device statistics
wait actv wsvc_t asvc_t
0.0 4.0
0.0
0.0
device statistics
wait actv wsvc_t asvc_t
0.0 4.0
0.0
0.0
device statistics
wait actv wsvc_t asvc_t
0.0 4.0
0.0
0.0

Disks “out to lunch” (PERC ECC error)

%w %b device
0 100 c0t0d0
%w %b device
0 100 c0t0d0
%w %b device
0 100 c0t0d0
%w %b device
0 100 c0t0d0
%w %b device
0 100 c0t0d0
iostat, cont.

•

Minor nits:

•

•
•

does not show read/write latency separately. ZFS TXG flushes
drag up the latency, which looks alarming, but are
asynchronous. Can use DTrace for the split.
no higher level context: PID, ZFS dataset, file pathname, ...
(not its role)

Major problem (although, not iostat’s fault): commonly confused
with application-level (logical) I/O.

•

The I/O rates, sizes, and latency, can dramatically differ
between logical file system I/O and physical disk I/O.

•

Users commonly draw the wrong conclusions when only
provided with iostat statistics to understand a system.
iostat, cont.

•

iostat output (or disk kstats) graphed by various monitoring software

So MANY graphs!
iostat, cont.
mysql_pid_fslatency.d

For so LITTLE visibility

syscall with
fi_fs == zfs

Process

This leaves many perf issues unsolved
User-Land
Kernel
VFS

zioslower.d
spasync.d
metaslab_free.d
arcstat.pl
arcaccess.d

ZFS

Syscall
Interface
vfsstat

...
iostacks.d

Block Device Interface

kernel drivers as needed
see DTrace book chap 4

Disks

iostat
iosnoop
vfsstat

•

VFS-level I/O statistics (VFS-iostat):

# vfsstat -Z 1
r/s
w/s kr/s
1.2
2.8
0.6
0.1
0.0
0.1
0.1
0.0
0.1
0.0
0.0
0.0
0.3
0.1
0.1
5.9
0.2
0.5
0.1
0.0
0.1
0.1
0.0
0.1
0.2
0.8
0.5

kw/s ractv wactv read_t writ_t
0.2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.3
0.0
0.0
0.0
0.0
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.6
0.0
0.0
0.0
0.0

App-ZFS Workload

%r
0
0
0
0
0
0
0
0
0

%w
0
0
0
0
0
0
0
0
0

ZFS Resulting Performance

d/s
0.0
0.0
0.0
0.0
0.0
5.0
0.0
0.0
0.0

del_t
0.0
34.9
46.5
16.5
27.6
11.3
132.0
40.7
31.9

ZFS I/O
Throttling

zone
global (0)
9cc2d0d3 (2)
72188ca0 (3)
4d2a62bb (4)
8bbc4000 (5)
d305ee44 (6)
9897c8f5 (7)
5f3c7d9e (9)
22ef87fc (10)
vfsstat, cont.

•
•

Good high-level summary of logical I/O: application FS workload
Summarizes by zone

•
•
•

Impetus was observability for cloud “noisy neighbors”
Shows affect of ZFS I/O throttling (performance isolation)

Summarizes performance applications actually experience!

•
•

Usually a lot better than disk-level, due to ZFS caching (ARC,
L2ARC) and buffering

Required kernel changes, new kstats (thanks Bill Pijewski)
zfsslower.d

•

ZFS reads/writes slower than 10 ms:

# ./zfsslower.d 10
TIME
2012 Sep 30 04:56:22
2012 Sep 30 04:56:23
2012 Sep 30 04:56:24
2012 Sep 30 04:56:24
2012 Sep 30 04:56:25
2012 Sep 30 04:56:25
2012 Sep 30 04:56:25
2012 Sep 30 04:56:26
2012 Sep 30 04:56:26
2012 Sep 30 04:56:27
[...]

•

PROCESS
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
cat
beam.smp
beam.smp

D
R
R
R
R
R
R
R
R
R
R

KB
2
4
3
2
0
2
2
5
2
2

ms
12
15
11
12
21
18
13
10
11
12

FILE
/var/db/riak/leveldb/.../205788.sst
/var/db/riak/leveldb/.../152831.sst
/var/db/riak/leveldb/.../220432.sst
/var/db/riak/leveldb/.../208619.sst
/var/db/riak/leveldb/.../210710.sst
/var/db/riak/leveldb/.../217757.sst
/var/db/riak/leveldb/.../191942.sst
/db/run/beam.smp.pid
/var/db/riak/leveldb/.../220389.sst
/var/db/riak/leveldb/.../186749.sst

Traces at VFS level to show the true application suffered I/O time

•

allows immediate confirm/deny of FS (incl. disk) based issue
zfsslower.d

•

ZFS reads/writes slower than 100 ms:

# ./zfsslower.d 100
TIME
2012 Sep 30 05:01:17
2012 Sep 30 05:01:54
2012 Sep 30 05:02:35
2012 Sep 30 05:02:35
2012 Sep 30 05:02:35
2012 Sep 30 05:02:40
2012 Sep 30 05:03:11
2012 Sep 30 05:03:38
[...]

PROCESS
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp

less frequent

D
R
R
R
R
R
R
R
R

KB
2
1
2
2
1
1
0
1

ms
144
149
188
159
178
172
200
142

FILE
/var/db/riak/leveldb/.../238108.sst
/var/db/riak/leveldb/.../186222.sst
/var/db/riak/leveldb/.../200051.sst
/var/db/riak/leveldb/.../209376.sst
/var/db/riak/leveldb/.../203436.sst
/var/db/riak/leveldb/.../204688.sst
/var/db/riak/leveldb/.../219837.sst
/var/db/riak/leveldb/.../222443.sst
zfsslower.d, cont.

•

All ZFS read/writes:

# ./zfsslower.d
TIME
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
2012 Sep 30 04:46:09
[...]

PROCESS
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
nginx
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp

D
R
R
R
R
R
W
R
R
R
R
R
R
R
R
R
R
R
R

KB
2
2
2
1
3
0
4
2
0
56
0
2
5
2
3
0
41
0

ms
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0

FILE
/var/db/riak/leveldb/.../221844.sst
/var/db/riak/leveldb/.../221155.sst
/var/db/riak/leveldb/.../215917.sst
/var/db/riak/leveldb/.../190134.sst
/var/db/riak/leveldb/.../234539.sst
/db/log/apps/prod17_nginx_access.log
/var/db/riak/leveldb/.../205368.sst
/var/db/riak/leveldb/.../199665.sst
/var/db/riak/leveldb/.../177866.sst
/var/db/riak/leveldb/.../177866.sst
/var/db/riak/leveldb/.../177866.sst
/var/db/riak/leveldb/.../177866.sst
/var/db/riak/leveldb/.../218473.sst
/var/db/riak/leveldb/.../210712.sst
/var/db/riak/leveldb/.../234194.sst
/var/db/riak/leveldb/.../233302.sst
/var/db/riak/leveldb/.../233302.sst
/var/db/riak/leveldb/.../233302.sst
zfsslower.d, cont.

•

Written in DTrace
[...]
fbt::zfs_read:entry,
fbt::zfs_write:entry
{
self-path = args[0]-v_path;
self-kb = args[1]-uio_resid / 1024;
self-start = timestamp;
}
fbt::zfs_read:return,
fbt::zfs_write:return
/self-start  (timestamp - self-start) = min_ns/
{
this-iotime = (timestamp - self-start) / 1000000;
this-dir = probefunc == zfs_read ? R : W;
printf(%-20Y %-16s %1s %4d %6d %sn, walltimestamp,
execname, this-dir, self-kb, this-iotime,
self-path != NULL ? stringof(self-path) : null);
}
[...]

•

zfsslower.d, also on github, originated from the DTrace book
zfsslower.d, cont.

•

Traces VFS/ZFS interface (kernel)
from usr/src/uts/common/fs/zfs/zfs_vnops.c:
/*
* Regular file vnode operations template
*/
vnodeops_t *zfs_fvnodeops;
const fs_operation_def_t zfs_fvnodeops_template[] = {
VOPNAME_OPEN,
{ .vop_open = zfs_open },
VOPNAME_CLOSE,
{ .vop_close = zfs_close },
VOPNAME_READ,
{ .vop_read = zfs_read },
VOPNAME_WRITE,
{ .vop_write = zfs_write },
VOPNAME_IOCTL,
{ .vop_ioctl = zfs_ioctl },
VOPNAME_GETATTR,
{ .vop_getattr = zfs_getattr },
[...]
iosnoop

•

Traces block-device-level I/O:
# ./iosnoop
UID
PID
103 5891
103 5891
103 5891
103 5891
103 5891
[...]

•

D
BLOCK
SIZE
R 238517878 131072
R 544800120 131072
R 286317766 131072
R 74515841 131072
R 98341339 131072

COMM
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp

PATHNAME
none
none
none
none
none

Has many options:
# ./iosnoop -Dots
STIME(us)
TIME(us)
787809430864
787809435385
787809472876
787809478812
787809479681
787809483470
787809484451
787809490703
787809489555
787809497167
787809491122
787809498010
[...]

DELTA(us)
4520
5935
3788
6251
7611
6888

DTIME(us)
4539
5955
3809
6266
6463
842

UID
103
103
103
103
103
103

PID
5891
5891
5891
5891
5891
5891

D
R
R
R
R
R
R

BLOCK
128302372
143783200
84913822
14964144
283398320
288468148

SIZE
131072
131072
131072
131072
131072
131072

COMM
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp
beam.smp

PATHNAME
none
none
none
none
none
none
iosnoop, cont.

•

Written in DTrace
[...]
io:genunix::done
/start_time[args[0]-b_edev, args[0]-b_blkno]/
{
[...]
/* fetch entry values */
this-dev = args[0]-b_edev;
this-blk = args[0]-b_blkno;
this-suid = start_uid[this-dev, this-blk];
this-spid = start_pid[this-dev, this-blk];
this-sppid = start_ppid[this-dev, this-blk];
self-sargs = (int)start_args[this-dev, this-blk] == 0 ?
 : start_args[this-dev, this-blk];
self-scomm = start_comm[this-dev, this-blk];
[...]
printf(%5d %5d %1s %8d %6d ,
this-suid, this-spid, args[0]-b_flags  B_READ ? R : W,
args[0]-b_blkno, args[0]-b_bcount);
[...]

•

From the DTraceToolkit
iosnoop, cont.

•

370 lines of code, mostly to process options:
USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]
[-m mount_point] [-n name] [-p PID]
iosnoop
# default output
-a
# print all data (mostly)
-A
# dump all data, space delimited
-D
# print time delta, us (elapsed)
-e
# print device name
-g
# print command arguments
-i
# print device instance
-N
# print major and minor numbers
-o
# print disk delta time, us
-s
# print start time, us
-t
# print completion time, us
-v
# print completion time, string
-d device
# instance name to snoop
-f filename
# snoop this file only
-m mount_point # this FS only
-n name
# this process name only
-p PID
# this PID only
eg,
iosnoop -v
# human readable timestamps
iosnoop -N
# print major and minor numbers
iosnoop -m /
# snoop events on filesystem / only
iosnoop, cont.

•

I originally wrote it in 2004 to solve disk-I/O-by-process

•
•
•

Shipped by default on Mac OS X (/usr/bin), Oracle Solaris 11
I wrote a companion called iotop. A similar iotop tool later
appeared for Linux (via the blk tracing framework)

Dosen’t do ZFS pathnames yet (known issue; hackathon?)

•

Pathnames from the block-device layer was always a party
trick: relied on a cached vnode-v_path

•

ZFS aggregates I/O. At the block-device layer, no one vnode
responsible (or vnode pointer)
iosnoop, cont.

•

Locate latency origin:

Process
Syscall
Interface
VFS
zfsslower.d
Correlate

iosnoop

ZFS

...

Block Device Interface
Disks
iostacks.d

•

Just a one-liner really (could make an “iostacks.d”):
# dtrace -n 'io:::start { @[stack()] = count(); }'
dtrace: description 'io:::start ' matched 6 probes
^C
genunix`ldi_strategy+0x53
zfs`vdev_disk_io_start+0xcc
zfs`zio_vdev_io_start+0xab
zfs`zio_execute+0x88
zfs`zio_nowait+0x21
zfs`vdev_mirror_io_start+0xcd
zfs`zio_vdev_io_start+0x250
zfs`zio_execute+0x88
zfs`zio_nowait+0x21
zfs`arc_read_nolock+0x4f9
zfs`arc_read+0x96
zfs`dsl_read+0x44
zfs`dbuf_read_impl+0x166
zfs`dbuf_read+0xab
zfs`dmu_buf_hold_array_by_dnode+0x189
zfs`dmu_buf_hold_array+0x78
zfs`dmu_read_uio+0x5c
zfs`zfs_read+0x1a3
genunix`fop_read+0x8b
genunix`read+0x2a7
143
iostacks.d, cont.

•

Stack recognition chart:
# dtrace -n 'io:::start { @[stack()] = count(); }'
dtrace: description 'io:::start ' matched 6 probes
^C
genunix`ldi_strategy+0x53
zfs`vdev_disk_io_start+0xcc
zfs`zio_vdev_io_start+0xab
zfs`zio_execute+0x88
zfs`zio_nowait+0x21
zfs`vdev_mirror_io_start+0xcd
zfs`zio_vdev_io_start+0x250
zfs`zio_execute+0x88
zfs`zio_nowait+0x21
zfs`arc_read_nolock+0x4f9
zfs`arc_read+0x96
zfs`dsl_read+0x44
zfs`dbuf_read_impl+0x166
zfs`dbuf_read+0xab
zfs`dmu_buf_hold_array_by_dnode+0x189
zfs`dmu_buf_hold_array+0x78
zfs`dmu_read_uio+0x5c
zfs`zfs_read+0x1a3
genunix`fop_read+0x8b
genunix`read+0x2a7
143

syscall
read()
arc-miss
iostacks.d, cont.

•

Stack recognition chart:
# dtrace -n 'io:::start { @[stack()] = count(); }'
dtrace: description 'io:::start ' matched 6 probes
^C
genunix`ldi_strategy+0x53
zfs`vdev_disk_io_start+0xcc
zfs`zio_vdev_io_start+0xab
zfs`zio_execute+0x88
zfs`vdev_queue_io_done+0x70
zfs`zio_vdev_io_done+0x80
zfs`zio_execute+0x88
genunix`taskq_thread+0x2d0
unix`thread_start+0x8
395

ZIO
pipeline
iostacks.d, cont.

•

From zio, when parent == NULL (first zio):
# dtrace -n 'zio_create:entry /arg0 == NULL/ { @[stack()] = count(); }'
[...]
zfs`zio_null+0x77
zfs`zio_root+0x2d
zfs`dmu_buf_hold_array_by_dnode+0x113
zfs`dmu_buf_hold_array+0x78
zfs`dmu_write+0x80
zfs`space_map_sync+0x288
zfs`metaslab_sync+0x135
SPA
zfs`vdev_sync+0x7f
sync
zfs`spa_sync+0x38b
zfs`txg_sync_thread+0x204
unix`thread_start+0x8
7
iostacks.d, cont.

•

Can identify the reason for disk I/O

•
•

including unexplained additional I/O from ZFS

Times it doesn’t work (ZIO pipeline task pool), can try a similar
one-liner from fbt::zio_create:entry, for the stack trace from the
creation of all ZIO.
metaslab_free.d

•

Traces ZFS metaslab details:
# ./metaslab_free.d
Tracing ZFS metaslab alloc.

metaslab_df_free_pct = 4 %

2012 Sep 30 05:32:47 free %pct by allocations:
13
12
89
34
81
287
12
389
2012 Sep 30 05:32:49 free %pct by allocations:
89
15
81
265
12
378

•
•

ZFS pools are split into metaslabs (eg, 10 GB each)
Metaslabs have two allocation algorithms:
metaslab %free = metaslab_df_free_pct == first fit (fast)
metaslab %free  metaslab_df_free_pct == best fit (slow)
metaslab_free.d, cont.

•

Written in DTrace

dtrace:::BEGIN
{
printf(Tracing ZFS metaslab alloc.
`metaslab_df_free_pct);
}

metaslab_df_free_pct = %d %%n,

fbt::metaslab_df_alloc:entry
{
this-pct = args[0]-sm_space * 100 / args[0]-sm_size;
@[this-pct] = count();
}
profile:::tick-1s
{
printf(n%Y free %%pct by allocations:, walltimestamp);
printa(@);
trunc(@);
}

•

metaslab_free.d is also on github
metaslab_free.d, cont.

•
•
•

Shows if allocations are currently fast fit or best fit
Correlate to performance changes
Use while tuning

•

Can’t just try out a new setting and watch performance: perf
can improve if ZFS switched to a new, emptier, metaslab at
the same time. Script identifies if that occurred.
spasync.d

•

Traces ZFS SPA syncs:
# ./spasync.d
Tracing ZFS spa_sync() slower than 1 ms...
2012 Sep 30 05:56:07 zones
30 ms, 25 MB 464 I/O
2012 Sep 30 05:56:12 zones
22 ms, 15 MB 366 I/O
2012 Sep 30 05:56:17 zones
39 ms, 19 MB 361 I/O
2012 Sep 30 05:56:22 zones
143 ms, 31 MB 536 I/O
2012 Sep 30 05:56:27 zones
32 ms, 27 MB 465 I/O
2012 Sep 30 05:56:32 zones
27 ms, 24 MB 468 I/O
2012 Sep 30 05:56:37 zones
31 ms, 23 MB 489 I/O
2012 Sep 30 05:56:42 zones
200 ms, 85 MB 956 I/O
2012 Sep 30 05:56:47 zones
122 ms, 46 MB 593 I/O
2012 Sep 30 05:56:52 zones
48 ms, 18 MB 394 I/O
2012 Sep 30 05:56:57 zones
43 ms, 16 MB 409 I/O
2012 Sep 30 05:57:02 zones
33 ms, 27 MB 526 I/O
[...]

•

Check for correlations with I/O latency
spasync.d, cont.

•

Written in DTrace
[...]
fbt::spa_sync:entry
/!self-start/
{
in_spa_sync = 1;
self-start = timestamp;
self-spa = args[0];
}
io:::start
/in_spa_sync/
{
@io = count();
@bytes = sum(args[0]-b_bcount);
}
fbt::spa_sync:return
/self-start  (this-ms = (timestamp - self-start) / 1000000)  MIN_MS/
{
normalize(@bytes, 1048576);
printf(%-20Y %-10s %6d ms, , walltimestamp,
stringof(self-spa-spa_name), this-ms);
printa(%@d MB %@d I/On, @bytes, @io);
}
[...]

•

spasync.d is also on github
spasync.d, cont.

•

Simple and effective: helps diagnose TXG sync issues, providing
information to help guide tuning.

•

Long heritage: Matt Ahrens - Roch Bourbannias - Ben
Rockwood - Brendan Gregg
%iowait

•

I/O wait percent
# mpstat 1
CPU minf mjf xcal
0 483
0 427
1 625
0 153
2 269
0 702
3 281
0 161
4 173
0 577
5 172
0
98
6 139
0 678
7 140
0
91
8 124
0 527
[...]

intr ithr csw icsw migr smtx
2216 913 4415
6
79
45
4171 1748 2887
4
42
37
1720 636 2943
5
71
39
1433 529 2255
3
39
35
3361 1465 2565
3
50
38
1108 418 1640
2
27
30
2406 802 2509
4
60
34
1343 527 2196
2
36
31
1145 419 1676
2
43
29

srw syscl
0 9275
0 5909
0 6422
0 5269
0 4645
0 3699
0 5135
0 4879
0 3335

usr sys
7 11
7
8
5
8
5
5
4
6
3
3
4
6
4
4
4
4

wt idl
0 83
0 85
0 87
0 90
0 90
0 94
0 90
0 92
0 92
%iowait

•

I/O wait percent ... still zero in illumos!
# mpstat 1
CPU minf mjf xcal
0 483
0 427
1 625
0 153
2 269
0 702
3 281
0 161
4 173
0 577
5 172
0
98
6 139
0 678
7 140
0
91
8 124
0 527
[...]

intr ithr csw icsw migr smtx
2216 913 4415
6
79
45
4171 1748 2887
4
42
37
1720 636 2943
5
71
39
1433 529 2255
3
39
35
3361 1465 2565
3
50
38
1108 418 1640
2
27
30
2406 802 2509
4
60
34
1343 527 2196
2
36
31
1145 419 1676
2
43
29

srw syscl
0 9275
0 5909
0 6422
0 5269
0 4645
0 3699
0 5135
0 4879
0 3335

usr sys
7 11
7
8
5
8
5
5
4
6
3
3
4
6
4
4
4
4

wt idl
0 83
0 85
0 87
0 90
0 90
0 94
0 90
0 92
0 92
arcstat

•

ZFS ARC and L2ARC statistics:
$ ./arcstat 1
time read
04:45:47
0
04:45:49
15K
04:45:50
23K
04:45:51
65K
04:45:52
30K
[...]

•

miss
0
10
81
25
11

miss%
0
0
0
0
0

dmis
0
10
81
25
11

dm%
0
0
0
0
0

pmis
0
0
0
0
0

pm%
0
0
0
0
0

mmis
0
1
1
4
3

mm%
0
0
0
0
0

arcsz
14G
14G
14G
14G
14G

Statistics are per-interval:

•
•

read, miss: total ARC accesses, misses

•
•

dmis, pmis, mmis: misses for demand, prefetch, metadata

miss%, dm%, pm%, mm%: ARC miss percentage total,
demand, prefetch, metadata
arcsz, c: ARC size, ARC target size

c
14G
14G
14G
14G
14G
arcstat, cont.

•

Written in Perl, uses kstats (zfs::arcstats:):
[...]
sub snap_stats {
my %prev = %cur;
if ($kstat-update()) {
printf(State Changedn);
}
my $hashref_cur = $kstat-{zfs}{0}{arcstats};
%cur = %$hashref_cur;
foreach my $key (keys %cur) {
next if $key =~ /class/;
if (defined $prev{$key}) {
$d{$key} = $cur{$key} - $prev{$key};
[...]
sub calculate {
%v = ();
$v{time} = strftime(%H:%M:%S, localtime);
$v{hits} = $d{hits}/$int;
$v{miss} = $d{misses}/$int;
$v{read} = $v{hits} + $v{miss};
$v{hit%} = 100*$v{hits}/$v{read} if $v{read}  0;
$v{miss%} = 100 - $v{hit%} if $v{read}  0;
[...]

•

github.com/mharsch/arcstat
arcstat, cont.

•
•

Options to configure output, including L2ARC stats

•

Originally by Neelakanth Nadgir, then Jason Hellenthal (FreeBSD
port) and Mike Harsh

Crucial data when analyzing cache performance
(and easier to read than the raw form: kstat -pn arcstats)
arcaccess.d

•

ARC population age:

# ./arcaccess.d -n 'tick-10s { exit(0); }'
lbolt rate is 100 Hertz.
Tracing lbolts between ARC accesses...
value
-1
0
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432

------------- Distribution ------------- count
|
0
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 729988
|
3805
|
3038
|
2028
|
1428
|
1398
|
1618
|
2883
|
738
|
681
|
338
|
569
|
166
|
607
|
632
|
808
|
373
|
110
|
142
|
39
|
5
|
97
|
10
|
44
|
617
|
1
|
0

Age:
10 ms

1 second

1 minute

1 hour
1 day
arcaccess.d, cont.

•

Written in DTrace
#!/usr/sbin/dtrace -s
#pragma D option quiet
dtrace:::BEGIN
{
printf(lbolt rate is %d Hertz.n, `hz);
printf(Tracing lbolts between ARC accesses...);
}
fbt::arc_access:entry
{
self-ab = args[0];
self-lbolt = args[0]-b_arc_access;
}
fbt::arc_access:return
/self-lbolt/
{
@ = quantize(self-ab-b_arc_access - self-lbolt);
self-ab = 0;
self-lbolt = 0;
}

•

http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/
arcaccess.d, cont.

•

Shows population age of the ARC based on access time

•
•

Helps determine working set size and turn over rate

Previous example showed ARC was so large, it was able to keep
buffers that had not been accessed in over a day.

•

Turn over rate == low, working set size == entirely fits, likely
much smaller.
Latency counters

•

kstat -p zone_vfs:::*_ops, show latency outlier counts since boot:
$ kstat -p zone_vfs:::*_ops
zone_vfs:0:global:100ms_ops
zone_vfs:0:global:10ms_ops
zone_vfs:0:global:10s_ops
zone_vfs:0:global:1s_ops
zone_vfs:1:5cedb79e:100ms_ops
zone_vfs:1:5cedb79e:10ms_ops
zone_vfs:1:5cedb79e:10s_ops
zone_vfs:1:5cedb79e:1s_ops

•

13
2220
0
0
173367
64071247
0
430

reads/writes:
$ kstat -p zone_vfs:::reads zone_vfs:::writes
zone_vfs:0:global:reads 666148884
zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:reads 1791879457
zone_vfs:0:global:writes
46861078
zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:writes
356075500
Latency counters

•

kstat -p zone_vfs:::*_ops, show latency outlier counts since boot:
$ kstat -p zone_vfs:::*_ops
zone_vfs:0:global:100ms_ops
zone_vfs:0:global:10ms_ops
zone_vfs:0:global:10s_ops
zone_vfs:0:global:1s_ops
zone_vfs:1:5cedb79e:100ms_ops
zone_vfs:1:5cedb79e:10ms_ops
zone_vfs:1:5cedb79e:10s_ops
zone_vfs:1:5cedb79e:1s_ops

•

reads/writes:

13
2220
0
0
173367
64071247
0
430

yikes, 430 ops  1s!
however, this is a small fraction

$ kstat -p zone_vfs:::reads zone_vfs:::writes
zone_vfs:0:global:reads 666148884
zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:reads 1791879457
zone_vfs:0:global:writes
46861078
zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:writes
356075500

1s = 0.00002%
Latency counters, cont.

•
•

Added by Bill Pijewski of Joyent
Proven invaluable, usually to quickly eliminate ZFS (and disk) as a
source of issues, after the fact (when counters are near-zero)

•

Has had identified real ZFS/disk-level issues as well
Scatter plots
iosnoop -Dots  outfile
+
awk/perl
+
gnuplot/R
=

read = red, write = blue
60ms

latency

time (s)
Scatter plots, cont.

•

Has been an effective last-resort to investigate nasty PERC
issues: reads queueing behind writes

•

Has a big data problem: many points, x-y coordinates, takes time
 space to capture and render.

•

Heat maps solve this, and can be real-time
Heat maps

•

WHAT DOES IT MEAN?
Heat maps

•

WHAT DOES IT MEAN?

disk I/O
DRAM I/O

workload becomes cached
Heat maps

•

WHAT DOES IT MEAN?

bi-modal
distribution
Heat maps

•
•

File system latency is commonly bi-modal: cache hits vs misses
Average latency, stddev, 99th percentile, don’t make a lot of sense

•
•

Given line graphs of these, we would not be able to solve
many problems

Need to see the full distribution: heat maps

•

Also good at identifying outliers
Heat maps, cont.

•

Latency outliers, cloud wide (entire datacenter):
Heat maps, cont.

•

Latency outliers, cloud wide (entire datacenter):

Inconceivable
Very Bad
Bad
Good
Cloud computing: within the zone

•

Suspected “noisy neighbors” hurting ZFS performance

•
•

Need ZFS observability to confirm/deny issues from within a zone
(eg, Joyent SmartMachine), however, kernel access (including
DTrace io and fbt providers) is not available.

•
•

vfsstat (already covered) does work, as it is zone-aware

Cloud safe DTrace tools include:

•
•
•

ZFS I/O throttling was added to address this

mysql_pid_latency.d: tracing FS calls from within the app
syscall tracing when fds[].fi_fs == “zfs”

Scripts are in:
http://dtrace.org/blogs/brendan/2011/05/18/file-system-latency-part-3/
http://dtrace.org/blogs/brendan/2011/05/24/file-system-latency-part-4/
Recap
mysql_pid_fslatency.d
syscall with
fi_fs == zfs

Process
User-Land
Kernel
VFS

zioslower.d
spasync.d
metaslab_free.d
arcstat.pl
arcaccess.d

Syscall
Interface
vfsstat

ZFS

...
iostacks.d

Block Device Interface

kernel drivers as needed
see DTrace book chap 4

Disks

iostat
iosnoop
There’s more

•

There are more tools

•

zpool iostat

•

zpool status

•

•

zfs get all

•

•

degraded? resilvering?

recsize match workload? compression? dedup?

ziostat
There’s more, cont.

•

There are more tools

•

And much, much, more DTrace

DTRACE
 ALL
THE
 THINGS?

More Related Content

What's hot

Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performanceMariaDB plc
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리confluent
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014Brendan Gregg
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovMaksud Ibrahimov
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 

What's hot (20)

Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 

Viewers also liked

SharePoint Intranet Trends & Best Practices: 2010 and beyond
SharePoint Intranet Trends & Best Practices: 2010 and beyond SharePoint Intranet Trends & Best Practices: 2010 and beyond
SharePoint Intranet Trends & Best Practices: 2010 and beyond Optimus BT
 
New forever clean 9 booklet
New forever clean 9 bookletNew forever clean 9 booklet
New forever clean 9 bookletKatalin Hidvegi
 
RETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISING
RETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISINGRETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISING
RETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISINGDian Hasan
 
Best Practices in Maintenance and Reliability
Best Practices in Maintenance and ReliabilityBest Practices in Maintenance and Reliability
Best Practices in Maintenance and ReliabilityRicky Smith CMRP, CMRT
 
Circuit Analysis – DC Circuits
Circuit Analysis – DC CircuitsCircuit Analysis – DC Circuits
Circuit Analysis – DC CircuitsVesa Linja-aho
 
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfgRomy Cagampan
 
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...Probir Bidhan
 
Basics of c++ Programming Language
Basics of c++ Programming LanguageBasics of c++ Programming Language
Basics of c++ Programming LanguageAhmad Idrees
 
Engaging Learners with Technology
Engaging Learners with TechnologyEngaging Learners with Technology
Engaging Learners with TechnologyDean Shareski
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
Introductory Lecture on photography
Introductory Lecture on photographyIntroductory Lecture on photography
Introductory Lecture on photographyAditya Rao
 

Viewers also liked (20)

Sony corporation
Sony corporationSony corporation
Sony corporation
 
SharePoint Intranet Trends & Best Practices: 2010 and beyond
SharePoint Intranet Trends & Best Practices: 2010 and beyond SharePoint Intranet Trends & Best Practices: 2010 and beyond
SharePoint Intranet Trends & Best Practices: 2010 and beyond
 
New forever clean 9 booklet
New forever clean 9 bookletNew forever clean 9 booklet
New forever clean 9 booklet
 
RETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISING
RETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISINGRETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISING
RETAIL BRANDING: THE PSYCHOLOGY OF VISUAL MERCHANDISING
 
Best Practices in Maintenance and Reliability
Best Practices in Maintenance and ReliabilityBest Practices in Maintenance and Reliability
Best Practices in Maintenance and Reliability
 
Circuit Analysis – DC Circuits
Circuit Analysis – DC CircuitsCircuit Analysis – DC Circuits
Circuit Analysis – DC Circuits
 
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfg
 
Tweak Your Resume
Tweak Your ResumeTweak Your Resume
Tweak Your Resume
 
Cardiac cycle ppt (2)
Cardiac cycle ppt (2)Cardiac cycle ppt (2)
Cardiac cycle ppt (2)
 
Engineering Geology
Engineering GeologyEngineering Geology
Engineering Geology
 
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
 
Basics of c++ Programming Language
Basics of c++ Programming LanguageBasics of c++ Programming Language
Basics of c++ Programming Language
 
Engaging Learners with Technology
Engaging Learners with TechnologyEngaging Learners with Technology
Engaging Learners with Technology
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Pneumonia
Pneumonia Pneumonia
Pneumonia
 
Bacteriology
BacteriologyBacteriology
Bacteriology
 
Schedule Review
Schedule ReviewSchedule Review
Schedule Review
 
Beowulf
BeowulfBeowulf
Beowulf
 
Introductory Lecture on photography
Introductory Lecture on photographyIntroductory Lecture on photography
Introductory Lecture on photography
 
Process sequence of weaving
Process sequence of weavingProcess sequence of weaving
Process sequence of weaving
 

Similar to ZFSperftools2012

ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databasesahl0003
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storageAngelo Rajadurai
 
제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustreTommy Lee
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment StrategiesMongoDB
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
Running E-Business Suite Database on Oracle Database Appliance
Running E-Business Suite Database on Oracle Database ApplianceRunning E-Business Suite Database on Oracle Database Appliance
Running E-Business Suite Database on Oracle Database ApplianceMaris Elsins
 
OSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdfOSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdfnitinscribd
 
Jackpot! Attacking Arcade Machines
Jackpot! Attacking Arcade MachinesJackpot! Attacking Arcade Machines
Jackpot! Attacking Arcade MachinesPatrick Sayler
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Community
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Community
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS Gluster.org
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 

Similar to ZFSperftools2012 (20)

ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databases
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Deployment
DeploymentDeployment
Deployment
 
Running E-Business Suite Database on Oracle Database Appliance
Running E-Business Suite Database on Oracle Database ApplianceRunning E-Business Suite Database on Oracle Database Appliance
Running E-Business Suite Database on Oracle Database Appliance
 
OSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdfOSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdf
 
Jackpot! Attacking Arcade Machines
Jackpot! Attacking Arcade MachinesJackpot! Attacking Arcade Machines
Jackpot! Attacking Arcade Machines
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
 
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS  GlusterFS w/ Tiered XFS
GlusterFS w/ Tiered XFS
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 

More from Brendan Gregg

Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance AnalysisBrendan Gregg
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
 

More from Brendan Gregg (20)

Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 

Recently uploaded

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXUXDXConf
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 

Recently uploaded (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UX
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 

ZFSperftools2012

  • 1. ZFS Performance Analysis and Tools Brendan Gregg Lead Performance Engineer brendan@joyent.com @brendangregg ZFS Day October, 2012
  • 2. whoami • • • G’Day, I’m Brendan These days I do systems performance analysis of the cloud Regarding ZFS: • Perf analysis of ZFS (mostly using DTrace) for 5+ years, both enterprise and cloud usage • • Wrote many DTrace-based ZFS perf analysis tools including those in the DTrace book Developed ZFS L2ARC while at Sun
  • 3. Who is Joyent • • Cloud computing provider (public cloud + software) Use ZFS as much as possible: • • • Host storage Guest storage: OS virtualization (SmartOS), and KVM guests (Linux, Windows) We use ZFS because • Reliability: checksums, COW • Features: snapshots, clones, compression, ... • Performance: large ARCs • It can boil oceans
  • 4. Joyent, cont. • • We build tools for ZFS automation and observability. Performance is a key company feature. • Need to solve FS/disk issues fast.
  • 5. Agenda • My top 12 tools for ZFS performance analysis (unsorted): • • • • • • • • • • • • iostat vfsstat zfsslower.d iosnoop iostacks metaslab_free.d spasync.d arcstat arcaccess.d latency counters scatter plots heat maps (CA) For cloud computing from within a Zone, add: • • mysqld_pid_fslatency.d syscall with fi_fs == zfs
  • 6. Functional diagram: full stack • Unix 101 Process User-Land Kernel logical I/O VFS ZFS ... Block Device Interface physical I/O Disks Syscall Interface
  • 7. Functional diagram: full stack, cont. • Unix 102 User-Land Kernel sync. Process Syscall Interface VFS ZFS ... Block Device Interface Disks iostat(1) often async: write buffering, read ahead
  • 8. Functional diagram: full stack, cont. • DTrace 301 mysql_pid_fslatency.d syscall with fi_fs == zfs Process User-Land Kernel VFS zioslower.d spasync.d metaslab_free.d arcstat.pl arcaccess.d Syscall Interface vfsstat ZFS ... iostacks.d Block Device Interface kernel drivers as needed see DTrace book chap 4 Disks iostat iosnoop
  • 9. ZFS Internals • • That’s just my top 12 Use more as needed
  • 10. ZFS Internals • • That’s just my top 12 Use more as needed DTRACE
  • 13. iostat • Block-device level (almost disk-level) I/O statistics: $ iostat -xnz 1 [...] r/s 0.0 1.0 w/s 11.0 381.0 r/s 0.0 1.0 w/s 9.0 154.9 r/s 0.0 6.0 w/s 8.0 0.0 extended device statistics kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 52.0 0.0 0.0 0.0 1.0 0 1 c0t0d0 16.0 43325.5 0.0 4.0 0.0 10.4 1 12 c0t1d0 extended device statistics kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 34.0 0.0 0.0 0.0 0.1 0 0 c0t0d0 16.0 1440.5 0.0 2.0 0.0 12.6 0 10 c0t1d0 extended device statistics kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 36.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 96.0 0.0 0.0 0.0 0.0 7.9 0 4 c0t1d0 ZFS-Disk Workload Disk Resulting Performance
  • 14. iostat, cont. • Effective tool for a class of disk issues, especially: $ iostat -xnz 1 [...] r/s 0.0 r/s 0.0 w/s 0.0 r/s 0.0 w/s 0.0 r/s 0.0 w/s 0.0 r/s 0.0 • w/s 0.0 w/s 0.0 extended kr/s kw/s 0.0 0.0 extended kr/s kw/s 0.0 0.0 extended kr/s kw/s 0.0 0.0 extended kr/s kw/s 0.0 0.0 extended kr/s kw/s 0.0 0.0 device statistics wait actv wsvc_t asvc_t 0.0 4.0 0.0 0.0 device statistics wait actv wsvc_t asvc_t 0.0 4.0 0.0 0.0 device statistics wait actv wsvc_t asvc_t 0.0 4.0 0.0 0.0 device statistics wait actv wsvc_t asvc_t 0.0 4.0 0.0 0.0 device statistics wait actv wsvc_t asvc_t 0.0 4.0 0.0 0.0 Disks “out to lunch” (PERC ECC error) %w %b device 0 100 c0t0d0 %w %b device 0 100 c0t0d0 %w %b device 0 100 c0t0d0 %w %b device 0 100 c0t0d0 %w %b device 0 100 c0t0d0
  • 15. iostat, cont. • Minor nits: • • • does not show read/write latency separately. ZFS TXG flushes drag up the latency, which looks alarming, but are asynchronous. Can use DTrace for the split. no higher level context: PID, ZFS dataset, file pathname, ... (not its role) Major problem (although, not iostat’s fault): commonly confused with application-level (logical) I/O. • The I/O rates, sizes, and latency, can dramatically differ between logical file system I/O and physical disk I/O. • Users commonly draw the wrong conclusions when only provided with iostat statistics to understand a system.
  • 16. iostat, cont. • iostat output (or disk kstats) graphed by various monitoring software So MANY graphs!
  • 17. iostat, cont. mysql_pid_fslatency.d For so LITTLE visibility syscall with fi_fs == zfs Process This leaves many perf issues unsolved User-Land Kernel VFS zioslower.d spasync.d metaslab_free.d arcstat.pl arcaccess.d ZFS Syscall Interface vfsstat ... iostacks.d Block Device Interface kernel drivers as needed see DTrace book chap 4 Disks iostat iosnoop
  • 18. vfsstat • VFS-level I/O statistics (VFS-iostat): # vfsstat -Z 1 r/s w/s kr/s 1.2 2.8 0.6 0.1 0.0 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.3 0.1 0.1 5.9 0.2 0.5 0.1 0.0 0.1 0.1 0.0 0.1 0.2 0.8 0.5 kw/s ractv wactv read_t writ_t 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.6 0.0 0.0 0.0 0.0 App-ZFS Workload %r 0 0 0 0 0 0 0 0 0 %w 0 0 0 0 0 0 0 0 0 ZFS Resulting Performance d/s 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 del_t 0.0 34.9 46.5 16.5 27.6 11.3 132.0 40.7 31.9 ZFS I/O Throttling zone global (0) 9cc2d0d3 (2) 72188ca0 (3) 4d2a62bb (4) 8bbc4000 (5) d305ee44 (6) 9897c8f5 (7) 5f3c7d9e (9) 22ef87fc (10)
  • 19. vfsstat, cont. • • Good high-level summary of logical I/O: application FS workload Summarizes by zone • • • Impetus was observability for cloud “noisy neighbors” Shows affect of ZFS I/O throttling (performance isolation) Summarizes performance applications actually experience! • • Usually a lot better than disk-level, due to ZFS caching (ARC, L2ARC) and buffering Required kernel changes, new kstats (thanks Bill Pijewski)
  • 20. zfsslower.d • ZFS reads/writes slower than 10 ms: # ./zfsslower.d 10 TIME 2012 Sep 30 04:56:22 2012 Sep 30 04:56:23 2012 Sep 30 04:56:24 2012 Sep 30 04:56:24 2012 Sep 30 04:56:25 2012 Sep 30 04:56:25 2012 Sep 30 04:56:25 2012 Sep 30 04:56:26 2012 Sep 30 04:56:26 2012 Sep 30 04:56:27 [...] • PROCESS beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp cat beam.smp beam.smp D R R R R R R R R R R KB 2 4 3 2 0 2 2 5 2 2 ms 12 15 11 12 21 18 13 10 11 12 FILE /var/db/riak/leveldb/.../205788.sst /var/db/riak/leveldb/.../152831.sst /var/db/riak/leveldb/.../220432.sst /var/db/riak/leveldb/.../208619.sst /var/db/riak/leveldb/.../210710.sst /var/db/riak/leveldb/.../217757.sst /var/db/riak/leveldb/.../191942.sst /db/run/beam.smp.pid /var/db/riak/leveldb/.../220389.sst /var/db/riak/leveldb/.../186749.sst Traces at VFS level to show the true application suffered I/O time • allows immediate confirm/deny of FS (incl. disk) based issue
  • 21. zfsslower.d • ZFS reads/writes slower than 100 ms: # ./zfsslower.d 100 TIME 2012 Sep 30 05:01:17 2012 Sep 30 05:01:54 2012 Sep 30 05:02:35 2012 Sep 30 05:02:35 2012 Sep 30 05:02:35 2012 Sep 30 05:02:40 2012 Sep 30 05:03:11 2012 Sep 30 05:03:38 [...] PROCESS beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp less frequent D R R R R R R R R KB 2 1 2 2 1 1 0 1 ms 144 149 188 159 178 172 200 142 FILE /var/db/riak/leveldb/.../238108.sst /var/db/riak/leveldb/.../186222.sst /var/db/riak/leveldb/.../200051.sst /var/db/riak/leveldb/.../209376.sst /var/db/riak/leveldb/.../203436.sst /var/db/riak/leveldb/.../204688.sst /var/db/riak/leveldb/.../219837.sst /var/db/riak/leveldb/.../222443.sst
  • 22. zfsslower.d, cont. • All ZFS read/writes: # ./zfsslower.d TIME 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 2012 Sep 30 04:46:09 [...] PROCESS beam.smp beam.smp beam.smp beam.smp beam.smp nginx beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp D R R R R R W R R R R R R R R R R R R KB 2 2 2 1 3 0 4 2 0 56 0 2 5 2 3 0 41 0 ms 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 FILE /var/db/riak/leveldb/.../221844.sst /var/db/riak/leveldb/.../221155.sst /var/db/riak/leveldb/.../215917.sst /var/db/riak/leveldb/.../190134.sst /var/db/riak/leveldb/.../234539.sst /db/log/apps/prod17_nginx_access.log /var/db/riak/leveldb/.../205368.sst /var/db/riak/leveldb/.../199665.sst /var/db/riak/leveldb/.../177866.sst /var/db/riak/leveldb/.../177866.sst /var/db/riak/leveldb/.../177866.sst /var/db/riak/leveldb/.../177866.sst /var/db/riak/leveldb/.../218473.sst /var/db/riak/leveldb/.../210712.sst /var/db/riak/leveldb/.../234194.sst /var/db/riak/leveldb/.../233302.sst /var/db/riak/leveldb/.../233302.sst /var/db/riak/leveldb/.../233302.sst
  • 23. zfsslower.d, cont. • Written in DTrace [...] fbt::zfs_read:entry, fbt::zfs_write:entry { self-path = args[0]-v_path; self-kb = args[1]-uio_resid / 1024; self-start = timestamp; } fbt::zfs_read:return, fbt::zfs_write:return /self-start (timestamp - self-start) = min_ns/ { this-iotime = (timestamp - self-start) / 1000000; this-dir = probefunc == zfs_read ? R : W; printf(%-20Y %-16s %1s %4d %6d %sn, walltimestamp, execname, this-dir, self-kb, this-iotime, self-path != NULL ? stringof(self-path) : null); } [...] • zfsslower.d, also on github, originated from the DTrace book
  • 24. zfsslower.d, cont. • Traces VFS/ZFS interface (kernel) from usr/src/uts/common/fs/zfs/zfs_vnops.c: /* * Regular file vnode operations template */ vnodeops_t *zfs_fvnodeops; const fs_operation_def_t zfs_fvnodeops_template[] = { VOPNAME_OPEN, { .vop_open = zfs_open }, VOPNAME_CLOSE, { .vop_close = zfs_close }, VOPNAME_READ, { .vop_read = zfs_read }, VOPNAME_WRITE, { .vop_write = zfs_write }, VOPNAME_IOCTL, { .vop_ioctl = zfs_ioctl }, VOPNAME_GETATTR, { .vop_getattr = zfs_getattr }, [...]
  • 25. iosnoop • Traces block-device-level I/O: # ./iosnoop UID PID 103 5891 103 5891 103 5891 103 5891 103 5891 [...] • D BLOCK SIZE R 238517878 131072 R 544800120 131072 R 286317766 131072 R 74515841 131072 R 98341339 131072 COMM beam.smp beam.smp beam.smp beam.smp beam.smp PATHNAME none none none none none Has many options: # ./iosnoop -Dots STIME(us) TIME(us) 787809430864 787809435385 787809472876 787809478812 787809479681 787809483470 787809484451 787809490703 787809489555 787809497167 787809491122 787809498010 [...] DELTA(us) 4520 5935 3788 6251 7611 6888 DTIME(us) 4539 5955 3809 6266 6463 842 UID 103 103 103 103 103 103 PID 5891 5891 5891 5891 5891 5891 D R R R R R R BLOCK 128302372 143783200 84913822 14964144 283398320 288468148 SIZE 131072 131072 131072 131072 131072 131072 COMM beam.smp beam.smp beam.smp beam.smp beam.smp beam.smp PATHNAME none none none none none none
  • 26. iosnoop, cont. • Written in DTrace [...] io:genunix::done /start_time[args[0]-b_edev, args[0]-b_blkno]/ { [...] /* fetch entry values */ this-dev = args[0]-b_edev; this-blk = args[0]-b_blkno; this-suid = start_uid[this-dev, this-blk]; this-spid = start_pid[this-dev, this-blk]; this-sppid = start_ppid[this-dev, this-blk]; self-sargs = (int)start_args[this-dev, this-blk] == 0 ? : start_args[this-dev, this-blk]; self-scomm = start_comm[this-dev, this-blk]; [...] printf(%5d %5d %1s %8d %6d , this-suid, this-spid, args[0]-b_flags B_READ ? R : W, args[0]-b_blkno, args[0]-b_bcount); [...] • From the DTraceToolkit
  • 27. iosnoop, cont. • 370 lines of code, mostly to process options: USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename] [-m mount_point] [-n name] [-p PID] iosnoop # default output -a # print all data (mostly) -A # dump all data, space delimited -D # print time delta, us (elapsed) -e # print device name -g # print command arguments -i # print device instance -N # print major and minor numbers -o # print disk delta time, us -s # print start time, us -t # print completion time, us -v # print completion time, string -d device # instance name to snoop -f filename # snoop this file only -m mount_point # this FS only -n name # this process name only -p PID # this PID only eg, iosnoop -v # human readable timestamps iosnoop -N # print major and minor numbers iosnoop -m / # snoop events on filesystem / only
  • 28. iosnoop, cont. • I originally wrote it in 2004 to solve disk-I/O-by-process • • • Shipped by default on Mac OS X (/usr/bin), Oracle Solaris 11 I wrote a companion called iotop. A similar iotop tool later appeared for Linux (via the blk tracing framework) Dosen’t do ZFS pathnames yet (known issue; hackathon?) • Pathnames from the block-device layer was always a party trick: relied on a cached vnode-v_path • ZFS aggregates I/O. At the block-device layer, no one vnode responsible (or vnode pointer)
  • 29. iosnoop, cont. • Locate latency origin: Process Syscall Interface VFS zfsslower.d Correlate iosnoop ZFS ... Block Device Interface Disks
  • 30. iostacks.d • Just a one-liner really (could make an “iostacks.d”): # dtrace -n 'io:::start { @[stack()] = count(); }' dtrace: description 'io:::start ' matched 6 probes ^C genunix`ldi_strategy+0x53 zfs`vdev_disk_io_start+0xcc zfs`zio_vdev_io_start+0xab zfs`zio_execute+0x88 zfs`zio_nowait+0x21 zfs`vdev_mirror_io_start+0xcd zfs`zio_vdev_io_start+0x250 zfs`zio_execute+0x88 zfs`zio_nowait+0x21 zfs`arc_read_nolock+0x4f9 zfs`arc_read+0x96 zfs`dsl_read+0x44 zfs`dbuf_read_impl+0x166 zfs`dbuf_read+0xab zfs`dmu_buf_hold_array_by_dnode+0x189 zfs`dmu_buf_hold_array+0x78 zfs`dmu_read_uio+0x5c zfs`zfs_read+0x1a3 genunix`fop_read+0x8b genunix`read+0x2a7 143
  • 31. iostacks.d, cont. • Stack recognition chart: # dtrace -n 'io:::start { @[stack()] = count(); }' dtrace: description 'io:::start ' matched 6 probes ^C genunix`ldi_strategy+0x53 zfs`vdev_disk_io_start+0xcc zfs`zio_vdev_io_start+0xab zfs`zio_execute+0x88 zfs`zio_nowait+0x21 zfs`vdev_mirror_io_start+0xcd zfs`zio_vdev_io_start+0x250 zfs`zio_execute+0x88 zfs`zio_nowait+0x21 zfs`arc_read_nolock+0x4f9 zfs`arc_read+0x96 zfs`dsl_read+0x44 zfs`dbuf_read_impl+0x166 zfs`dbuf_read+0xab zfs`dmu_buf_hold_array_by_dnode+0x189 zfs`dmu_buf_hold_array+0x78 zfs`dmu_read_uio+0x5c zfs`zfs_read+0x1a3 genunix`fop_read+0x8b genunix`read+0x2a7 143 syscall read() arc-miss
  • 32. iostacks.d, cont. • Stack recognition chart: # dtrace -n 'io:::start { @[stack()] = count(); }' dtrace: description 'io:::start ' matched 6 probes ^C genunix`ldi_strategy+0x53 zfs`vdev_disk_io_start+0xcc zfs`zio_vdev_io_start+0xab zfs`zio_execute+0x88 zfs`vdev_queue_io_done+0x70 zfs`zio_vdev_io_done+0x80 zfs`zio_execute+0x88 genunix`taskq_thread+0x2d0 unix`thread_start+0x8 395 ZIO pipeline
  • 33. iostacks.d, cont. • From zio, when parent == NULL (first zio): # dtrace -n 'zio_create:entry /arg0 == NULL/ { @[stack()] = count(); }' [...] zfs`zio_null+0x77 zfs`zio_root+0x2d zfs`dmu_buf_hold_array_by_dnode+0x113 zfs`dmu_buf_hold_array+0x78 zfs`dmu_write+0x80 zfs`space_map_sync+0x288 zfs`metaslab_sync+0x135 SPA zfs`vdev_sync+0x7f sync zfs`spa_sync+0x38b zfs`txg_sync_thread+0x204 unix`thread_start+0x8 7
  • 34. iostacks.d, cont. • Can identify the reason for disk I/O • • including unexplained additional I/O from ZFS Times it doesn’t work (ZIO pipeline task pool), can try a similar one-liner from fbt::zio_create:entry, for the stack trace from the creation of all ZIO.
  • 35. metaslab_free.d • Traces ZFS metaslab details: # ./metaslab_free.d Tracing ZFS metaslab alloc. metaslab_df_free_pct = 4 % 2012 Sep 30 05:32:47 free %pct by allocations: 13 12 89 34 81 287 12 389 2012 Sep 30 05:32:49 free %pct by allocations: 89 15 81 265 12 378 • • ZFS pools are split into metaslabs (eg, 10 GB each) Metaslabs have two allocation algorithms: metaslab %free = metaslab_df_free_pct == first fit (fast) metaslab %free metaslab_df_free_pct == best fit (slow)
  • 36. metaslab_free.d, cont. • Written in DTrace dtrace:::BEGIN { printf(Tracing ZFS metaslab alloc. `metaslab_df_free_pct); } metaslab_df_free_pct = %d %%n, fbt::metaslab_df_alloc:entry { this-pct = args[0]-sm_space * 100 / args[0]-sm_size; @[this-pct] = count(); } profile:::tick-1s { printf(n%Y free %%pct by allocations:, walltimestamp); printa(@); trunc(@); } • metaslab_free.d is also on github
  • 37. metaslab_free.d, cont. • • • Shows if allocations are currently fast fit or best fit Correlate to performance changes Use while tuning • Can’t just try out a new setting and watch performance: perf can improve if ZFS switched to a new, emptier, metaslab at the same time. Script identifies if that occurred.
  • 38. spasync.d • Traces ZFS SPA syncs: # ./spasync.d Tracing ZFS spa_sync() slower than 1 ms... 2012 Sep 30 05:56:07 zones 30 ms, 25 MB 464 I/O 2012 Sep 30 05:56:12 zones 22 ms, 15 MB 366 I/O 2012 Sep 30 05:56:17 zones 39 ms, 19 MB 361 I/O 2012 Sep 30 05:56:22 zones 143 ms, 31 MB 536 I/O 2012 Sep 30 05:56:27 zones 32 ms, 27 MB 465 I/O 2012 Sep 30 05:56:32 zones 27 ms, 24 MB 468 I/O 2012 Sep 30 05:56:37 zones 31 ms, 23 MB 489 I/O 2012 Sep 30 05:56:42 zones 200 ms, 85 MB 956 I/O 2012 Sep 30 05:56:47 zones 122 ms, 46 MB 593 I/O 2012 Sep 30 05:56:52 zones 48 ms, 18 MB 394 I/O 2012 Sep 30 05:56:57 zones 43 ms, 16 MB 409 I/O 2012 Sep 30 05:57:02 zones 33 ms, 27 MB 526 I/O [...] • Check for correlations with I/O latency
  • 39. spasync.d, cont. • Written in DTrace [...] fbt::spa_sync:entry /!self-start/ { in_spa_sync = 1; self-start = timestamp; self-spa = args[0]; } io:::start /in_spa_sync/ { @io = count(); @bytes = sum(args[0]-b_bcount); } fbt::spa_sync:return /self-start (this-ms = (timestamp - self-start) / 1000000) MIN_MS/ { normalize(@bytes, 1048576); printf(%-20Y %-10s %6d ms, , walltimestamp, stringof(self-spa-spa_name), this-ms); printa(%@d MB %@d I/On, @bytes, @io); } [...] • spasync.d is also on github
  • 40. spasync.d, cont. • Simple and effective: helps diagnose TXG sync issues, providing information to help guide tuning. • Long heritage: Matt Ahrens - Roch Bourbannias - Ben Rockwood - Brendan Gregg
  • 41. %iowait • I/O wait percent # mpstat 1 CPU minf mjf xcal 0 483 0 427 1 625 0 153 2 269 0 702 3 281 0 161 4 173 0 577 5 172 0 98 6 139 0 678 7 140 0 91 8 124 0 527 [...] intr ithr csw icsw migr smtx 2216 913 4415 6 79 45 4171 1748 2887 4 42 37 1720 636 2943 5 71 39 1433 529 2255 3 39 35 3361 1465 2565 3 50 38 1108 418 1640 2 27 30 2406 802 2509 4 60 34 1343 527 2196 2 36 31 1145 419 1676 2 43 29 srw syscl 0 9275 0 5909 0 6422 0 5269 0 4645 0 3699 0 5135 0 4879 0 3335 usr sys 7 11 7 8 5 8 5 5 4 6 3 3 4 6 4 4 4 4 wt idl 0 83 0 85 0 87 0 90 0 90 0 94 0 90 0 92 0 92
  • 42. %iowait • I/O wait percent ... still zero in illumos! # mpstat 1 CPU minf mjf xcal 0 483 0 427 1 625 0 153 2 269 0 702 3 281 0 161 4 173 0 577 5 172 0 98 6 139 0 678 7 140 0 91 8 124 0 527 [...] intr ithr csw icsw migr smtx 2216 913 4415 6 79 45 4171 1748 2887 4 42 37 1720 636 2943 5 71 39 1433 529 2255 3 39 35 3361 1465 2565 3 50 38 1108 418 1640 2 27 30 2406 802 2509 4 60 34 1343 527 2196 2 36 31 1145 419 1676 2 43 29 srw syscl 0 9275 0 5909 0 6422 0 5269 0 4645 0 3699 0 5135 0 4879 0 3335 usr sys 7 11 7 8 5 8 5 5 4 6 3 3 4 6 4 4 4 4 wt idl 0 83 0 85 0 87 0 90 0 90 0 94 0 90 0 92 0 92
  • 43. arcstat • ZFS ARC and L2ARC statistics: $ ./arcstat 1 time read 04:45:47 0 04:45:49 15K 04:45:50 23K 04:45:51 65K 04:45:52 30K [...] • miss 0 10 81 25 11 miss% 0 0 0 0 0 dmis 0 10 81 25 11 dm% 0 0 0 0 0 pmis 0 0 0 0 0 pm% 0 0 0 0 0 mmis 0 1 1 4 3 mm% 0 0 0 0 0 arcsz 14G 14G 14G 14G 14G Statistics are per-interval: • • read, miss: total ARC accesses, misses • • dmis, pmis, mmis: misses for demand, prefetch, metadata miss%, dm%, pm%, mm%: ARC miss percentage total, demand, prefetch, metadata arcsz, c: ARC size, ARC target size c 14G 14G 14G 14G 14G
  • 44. arcstat, cont. • Written in Perl, uses kstats (zfs::arcstats:): [...] sub snap_stats { my %prev = %cur; if ($kstat-update()) { printf(State Changedn); } my $hashref_cur = $kstat-{zfs}{0}{arcstats}; %cur = %$hashref_cur; foreach my $key (keys %cur) { next if $key =~ /class/; if (defined $prev{$key}) { $d{$key} = $cur{$key} - $prev{$key}; [...] sub calculate { %v = (); $v{time} = strftime(%H:%M:%S, localtime); $v{hits} = $d{hits}/$int; $v{miss} = $d{misses}/$int; $v{read} = $v{hits} + $v{miss}; $v{hit%} = 100*$v{hits}/$v{read} if $v{read} 0; $v{miss%} = 100 - $v{hit%} if $v{read} 0; [...] • github.com/mharsch/arcstat
  • 45. arcstat, cont. • • Options to configure output, including L2ARC stats • Originally by Neelakanth Nadgir, then Jason Hellenthal (FreeBSD port) and Mike Harsh Crucial data when analyzing cache performance (and easier to read than the raw form: kstat -pn arcstats)
  • 46. arcaccess.d • ARC population age: # ./arcaccess.d -n 'tick-10s { exit(0); }' lbolt rate is 100 Hertz. Tracing lbolts between ARC accesses... value -1 0 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 ------------- Distribution ------------- count | 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 729988 | 3805 | 3038 | 2028 | 1428 | 1398 | 1618 | 2883 | 738 | 681 | 338 | 569 | 166 | 607 | 632 | 808 | 373 | 110 | 142 | 39 | 5 | 97 | 10 | 44 | 617 | 1 | 0 Age: 10 ms 1 second 1 minute 1 hour 1 day
  • 47. arcaccess.d, cont. • Written in DTrace #!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { printf(lbolt rate is %d Hertz.n, `hz); printf(Tracing lbolts between ARC accesses...); } fbt::arc_access:entry { self-ab = args[0]; self-lbolt = args[0]-b_arc_access; } fbt::arc_access:return /self-lbolt/ { @ = quantize(self-ab-b_arc_access - self-lbolt); self-ab = 0; self-lbolt = 0; } • http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/
  • 48. arcaccess.d, cont. • Shows population age of the ARC based on access time • • Helps determine working set size and turn over rate Previous example showed ARC was so large, it was able to keep buffers that had not been accessed in over a day. • Turn over rate == low, working set size == entirely fits, likely much smaller.
  • 49. Latency counters • kstat -p zone_vfs:::*_ops, show latency outlier counts since boot: $ kstat -p zone_vfs:::*_ops zone_vfs:0:global:100ms_ops zone_vfs:0:global:10ms_ops zone_vfs:0:global:10s_ops zone_vfs:0:global:1s_ops zone_vfs:1:5cedb79e:100ms_ops zone_vfs:1:5cedb79e:10ms_ops zone_vfs:1:5cedb79e:10s_ops zone_vfs:1:5cedb79e:1s_ops • 13 2220 0 0 173367 64071247 0 430 reads/writes: $ kstat -p zone_vfs:::reads zone_vfs:::writes zone_vfs:0:global:reads 666148884 zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:reads 1791879457 zone_vfs:0:global:writes 46861078 zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:writes 356075500
  • 50. Latency counters • kstat -p zone_vfs:::*_ops, show latency outlier counts since boot: $ kstat -p zone_vfs:::*_ops zone_vfs:0:global:100ms_ops zone_vfs:0:global:10ms_ops zone_vfs:0:global:10s_ops zone_vfs:0:global:1s_ops zone_vfs:1:5cedb79e:100ms_ops zone_vfs:1:5cedb79e:10ms_ops zone_vfs:1:5cedb79e:10s_ops zone_vfs:1:5cedb79e:1s_ops • reads/writes: 13 2220 0 0 173367 64071247 0 430 yikes, 430 ops 1s! however, this is a small fraction $ kstat -p zone_vfs:::reads zone_vfs:::writes zone_vfs:0:global:reads 666148884 zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:reads 1791879457 zone_vfs:0:global:writes 46861078 zone_vfs:1:5cedb79e-51c6-41b3-b99d-358b33:writes 356075500 1s = 0.00002%
  • 51. Latency counters, cont. • • Added by Bill Pijewski of Joyent Proven invaluable, usually to quickly eliminate ZFS (and disk) as a source of issues, after the fact (when counters are near-zero) • Has had identified real ZFS/disk-level issues as well
  • 52. Scatter plots iosnoop -Dots outfile + awk/perl + gnuplot/R = read = red, write = blue 60ms latency time (s)
  • 53. Scatter plots, cont. • Has been an effective last-resort to investigate nasty PERC issues: reads queueing behind writes • Has a big data problem: many points, x-y coordinates, takes time space to capture and render. • Heat maps solve this, and can be real-time
  • 55. Heat maps • WHAT DOES IT MEAN? disk I/O DRAM I/O workload becomes cached
  • 56. Heat maps • WHAT DOES IT MEAN? bi-modal distribution
  • 57. Heat maps • • File system latency is commonly bi-modal: cache hits vs misses Average latency, stddev, 99th percentile, don’t make a lot of sense • • Given line graphs of these, we would not be able to solve many problems Need to see the full distribution: heat maps • Also good at identifying outliers
  • 58. Heat maps, cont. • Latency outliers, cloud wide (entire datacenter):
  • 59. Heat maps, cont. • Latency outliers, cloud wide (entire datacenter): Inconceivable Very Bad Bad Good
  • 60. Cloud computing: within the zone • Suspected “noisy neighbors” hurting ZFS performance • • Need ZFS observability to confirm/deny issues from within a zone (eg, Joyent SmartMachine), however, kernel access (including DTrace io and fbt providers) is not available. • • vfsstat (already covered) does work, as it is zone-aware Cloud safe DTrace tools include: • • • ZFS I/O throttling was added to address this mysql_pid_latency.d: tracing FS calls from within the app syscall tracing when fds[].fi_fs == “zfs” Scripts are in: http://dtrace.org/blogs/brendan/2011/05/18/file-system-latency-part-3/ http://dtrace.org/blogs/brendan/2011/05/24/file-system-latency-part-4/
  • 61. Recap mysql_pid_fslatency.d syscall with fi_fs == zfs Process User-Land Kernel VFS zioslower.d spasync.d metaslab_free.d arcstat.pl arcaccess.d Syscall Interface vfsstat ZFS ... iostacks.d Block Device Interface kernel drivers as needed see DTrace book chap 4 Disks iostat iosnoop
  • 62. There’s more • There are more tools • zpool iostat • zpool status • • zfs get all • • degraded? resilvering? recsize match workload? compression? dedup? ziostat
  • 63. There’s more, cont. • There are more tools • And much, much, more DTrace DTRACE
  • 66. DTrace book, Chapter 4 Disks Disk I/O subsystem scripts Chapter 5 is FS includes ZFS scripts
  • 67. Future • More tools • as needed, working on customer issues
  • 68. Future • More tools • • as needed, working on customer issues But not too many... Wenger giant
  • 69. Thank you! • • • email: brendan@joyent.com twitter: @brendangregg Resources: • http://dtrace.org/blogs/brendan • http://dtrace.org/blogs/brendan/2012/01/09/activity-of-the-zfs-arc/ • http://dtrace.org/blogs/brendan/2011/06/03/file-system-latency-part-5/ • https://github.com/brendangregg/dtrace-cloud-tools/tree/master/fs • http://www.dtracebook.com • More on latency heat maps: http://queue.acm.org/detail.cfm?id=1809426