ATO Linux Performance 2018

Brendan Gregg
Brendan GreggSenior Performance Architect at Netflix
Linux Performance
2018
Brendan Gregg
Senior Performance Architect
Oct 2018
http://neuling.org/linux-next-size.html
https://kernelnewbies.org/Linux_4.18
https://lwn.net/Kernel/
Post frequency:
4 per year
4 per week
http://vger.kernel.org/vger-lists.html
#linux-kernel
LKML400 per day
https://meltdownattack.com/
Cloud Hypervisor
(patches)
Cloud Hypervisor
(patches)
Linux Kernel
(KPTI)
Linux Kernel
(KPTI)
CPU
(microcode)
CPU
(microcode)
Application
(retpolne)
Application
(retpolne)
KPTI Linux 4.15
& backports
Server A: 31353 MySQL queries/sec
Server B: 22795 queries/sec (27% slower)
serverA# mpstat 1
Linux 4.14.12-virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU)
01:09:13 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:09:14 AM all 86.89 0.00 13.08 0.00 0.00 0.00 0.00 0.00 0.00 0.03
01:09:15 AM all 86.77 0.00 13.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:09:16 AM all 86.93 0.00 13.02 0.00 0.00 0.00 0.03 0.00 0.00 0.03
[...]
serverB# mpstat 1
Linux 4.14.12-virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU)
01:09:44 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:09:45 AM all 82.94 0.00 17.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:09:46 AM all 82.78 0.00 17.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:09:47 AM all 83.14 0.00 16.86 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[...]
CPUCPU MMUMMU Main
Memory
Main
Memory
TLBTLB
Virtual
Address
Physical
Address
hit miss
(walk) Page
Table
Page
Table
Linux KPTI patches for Meltdown flush the Translation
Lookaside Buffer
Server A: TLB miss walks 3.5%
Server B: TLB miss walks 19.2% (16% higher)
serverA# ./tlbstat 1
K_CYCLES K_INSTR IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC K_ITLBCYC DTLB% ITLB%
95913667 99982399 1.04 86588626 115441706 1507279 1837217 1.57 1.92
95810170 99951362 1.04 86281319 115306404 1507472 1842313 1.57 1.92
95844079 100066236 1.04 86564448 115555259 1511158 1845661 1.58 1.93
95978588 100029077 1.04 86187531 115292395 1508524 1845525 1.57 1.92
[...]
serverB# ./tlbstat 1
K_CYCLES K_INSTR IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC K_ITLBCYC DTLB% ITLB%
95911236 80317867 0.84 911337888 719553692 10476524 7858141 10.92 8.19
95927861 80503355 0.84 913726197 721751988 10518488 7918261 10.96 8.25
95955825 80533254 0.84 912994135 721492911 10524675 7929216 10.97 8.26
96067221 80443770 0.84 912009660 720027006 10501926 7911546 10.93 8.24
[...]
http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html
Enhanced BPF
Kernel
kprobeskprobes
uprobesuprobes
tracepointstracepoints
socketssockets
SDN ConfigurationSDN Configuration
User-Defined BPF Programs
…
Event TargetsRuntime
also known as just "BPF"
Linux 4.*
perf_eventsperf_events
BPF
actions
BPF
actions
BPFBPF
verifierverifier
DDoS MitigationDDoS Mitigation
Intrusion DetectionIntrusion Detection
Container SecurityContainer Security
ObservabilityObservability
Firewalls (bpfilter)Firewalls (bpfilter)
Device DriversDevice Drivers
eBPF is solving new things: off-CPU + wakeup analysis
eBPF bcc Linux 4.4+
https://github.com/iovisor/bcc
e.g., identify multimodal disk I/O latency and outliers
with bcc/eBPF biolatency
# biolatency -mT 10
Tracing block device I/O... Hit Ctrl-C to end.
19:19:04
msecs : count distribution
0 -> 1 : 238 |********* |
2 -> 3 : 424 |***************** |
4 -> 7 : 834 |********************************* |
8 -> 15 : 506 |******************** |
16 -> 31 : 986 |****************************************|
32 -> 63 : 97 |*** |
64 -> 127 : 7 | |
128 -> 255 : 27 |* |
19:19:14
msecs : count distribution
0 -> 1 : 427 |******************* |
2 -> 3 : 424 |****************** |
[…]
bcc/eBPF programs are laborious: biolatency
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>
typedef struct disk_key {
char disk[DISK_NAME_LEN];
u64 slot;
} disk_key_t;
BPF_HASH(start, struct request *);
STORAGE
// time block I/O
int trace_req_start(struct pt_regs *ctx, struct request *req)
{
u64 ts = bpf_ktime_get_ns();
start.update(&req, &ts);
return 0;
}
// output
int trace_req_completion(struct pt_regs *ctx, struct request *req)
{
u64 *tsp, delta;
// fetch timestamp and calculate delta
tsp = start.lookup(&req);
if (tsp == 0) {
return 0; // missed issue
}
delta = bpf_ktime_get_ns() - *tsp;
FACTOR
// store as histogram
STORE
start.delete(&req);
return 0;
}
"""
# code substitutions
if args.milliseconds:
bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000000;')
label = "msecs"
else:
bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000;')
label = "usecs"
if args.disks:
bpf_text = bpf_text.replace('STORAGE',
'BPF_HISTOGRAM(dist, disk_key_t);')
bpf_text = bpf_text.replace('STORE',
'disk_key_t key = {.slot = bpf_log2l(delta)}; ' +
'void *__tmp = (void *)req->rq_disk->disk_name; ' +
'bpf_probe_read(&key.disk, sizeof(key.disk), __tmp); ' +
'dist.increment(key);')
else:
bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);')
bpf_text = bpf_text.replace('STORE',
'dist.increment(bpf_log2l(delta));')
if debug or args.ebpf:
print(bpf_text)
if args.ebpf:
exit()
# load BPF program
b = BPF(text=bpf_text)
if args.queued:
b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start")
else:
b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start")
b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start")
b.attach_kprobe(event="blk_account_io_completion",
fn_name="trace_req_completion")
print("Tracing block device I/O... Hit Ctrl-C to end.")
# output
exiting = 0 if args.interval else 1
dist = b.get_table("dist")
while (1):
try:
sleep(int(args.interval))
except KeyboardInterrupt:
exiting = 1
print()
if args.timestamp:
print("%-8sn" % strftime("%H:%M:%S"), end="")
dist.print_log2_hist(label, "disk")
dist.clear()
countdown -= 1
if exiting or countdown == 0:
exit()
… rewritten in bpftrace (launched Oct 2018)!
#!/usr/local/bin/bpftrace
BEGIN
{
printf("Tracing block device I/O... Hit Ctrl-C to end.n");
}
kprobe:blk_account_io_start
{
@start[arg0] = nsecs;
}
kprobe:blk_account_io_completion
/@start[arg0]/
{
@usecs = hist((nsecs - @start[arg0]) / 1000);
delete(@start[arg0]);
}
eBPF bpftrace (aka BPFtrace) Linux 4.9+
https://github.com/iovisor/bpftrace
# Syscall count by program
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
# Read size distribution by process:
bpftrace -e 'tracepoint:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'
# Files opened by process
bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %sn", comm,
str(args->filename)); }'
# Trace kernel function
bpftrace -e 'kprobe:do_nanosleep { printf(“sleep by %s”, comm); }'
# Trace user-level function
Bpftrace -e 'uretprobe:/bin/bash:readline { printf(“%sn”, str(retval)); }’
…
Good for one-liners & short scripts; bcc is good for complex tools
bpftrace Internals
eBPF XDP
https://www.netronome.com/blog/frnog-30-faster-networking-la-francaise/
Linux 4.8+
eBPF bpfilter
https://lwn.net/Articles/747551/
Linux 4.18+
ipfwadm (1.2.1)
ipchains (2.2.10)
iptables
nftables (3.13)
bpfilter (4.18+)
jit-compiled
NIC offloading
BBR
TCP congestion control algorithm
Bottleneck Bandwidth and RTT
1% packet loss: we see 3x better throughput
Linux 4.9
https://twitter.com/amernetflix/status/892787364598132736
https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/ https://queue.acm.org/detail.cfm?id=3022184
Kyber
Multiqueue block I/O scheduler
Tune target read & write latency
Up to 300x lower 99th
latencies in our testing
Linux 4.12
reads (sync)reads (sync) dispatchdispatch
writes (async)writes (async) dispatchdispatch
completions
queue size adjustqueue size adjustKyber (simplified)
https://lwn.net/Articles/720675/
Hist Triggers
Linux 4.17
https://www.kernel.org/doc/html/latest/trace/histogram.html
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
# trigger info:
hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048
[active]
[…]
{ stacktrace:
__kmalloc+0x11b/0x1b0
seq_buf_alloc+0x1b/0x50
seq_read+0x2cc/0x370
proc_reg_read+0x3d/0x80
__vfs_read+0x28/0xe0
vfs_read+0x86/0x140
SyS_read+0x46/0xb0
system_call_fastpath+0x12/0x6a
} hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768
ftrace
advanced
summaries
PSI
Pressure Stall Information
More saturation metrics!
Linux 4.?
not merged yet
https://lwn.net/Articles/759781/
Resource
Utilization
(%)
Saturation
Errors
X
The USE Method
/proc/pressure/cpu
/proc/pressure/memory
/proc/pressure/io
10-, 60-, and 300-second averages
More perf 4.4 - 4.19 (2016 - 2018)
●
TCP listener lockless (4.4)
●
copy_file_range() (4.5)
●
madvise() MADV_FREE (4.5)
●
epoll multithread scalability (4.5)
●
Kernel Connection Multiplexor (4.6)
●
Writeback management (4.10)
●
Hybrid block polling (4.10)
●
BFQ I/O scheduler (4.12)
●
Async I/O improvements (4.13)
●
In-kernel TLS acceleration (4.13)
●
Socket MSG_ZEROCOPY (4.14)
●
Asynchronous buffered I/O (4.14)
●
Longer-lived TLB entries with PCID (4.14)
●
mmap MAP_SYNC (4.15)
●
Software-interrupt context hrtimers (4.16)
●
Idle loop tick efficiency (4.17)
●
perf_event_open() [ku]probes (4.17)
●
AF_XDP sockets (4.18)
●
Block I/O latency controller (4.19)
●
CAKE for bufferbloat (4.19)
●
New async I/O polling (4.19)
… and many minor improvements to:
• perf
• CPU scheduling
• futexes
• NUMA
• Huge pages
• Slab allocation
• TCP, UDP
• Drivers
• Processor support
• GPUs
Take Aways
1. Run latest
2. Browse major features
eg, https://kernelnewbies.org/Linux_4.19
Some Linux perf Resources
- http://www.brendangregg.com/linuxperf.html
- https://kernelnewbies.org/LinuxChanges
- https://lwn.net/Kernel
- https://github.com/iovisor/bcc
- http://blog.stgolabs.net/search/label/linux
- http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html
1 of 26

Recommended

re:Invent 2019 BPF Performance Analysis at Netflix by
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
5.5K views65 slides
LISA2019 Linux Systems Performance by
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
374.3K views64 slides
NetConf 2018 BPF Observability by
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
2.7K views25 slides
Linux 4.x Tracing: Performance Analysis with bcc/BPF by
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
10.7K views70 slides
Velocity 2017 Performance analysis superpowers with Linux eBPF by
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
735.7K views54 slides
Linux Performance 2018 (PerconaLive keynote) by
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
426.3K views19 slides

More Related Content

What's hot

UM2019 Extended BPF: A New Type of Software by
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
33.1K views48 slides
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter by
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterIvan Babrou
910 views57 slides
LISA17 Container Performance Analysis by
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance AnalysisBrendan Gregg
9.3K views69 slides
Performance Wins with BPF: Getting Started by
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
2K views24 slides
Security Monitoring with eBPF by
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
7K views27 slides
eBPF Trace from Kernel to Userspace by
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
8.5K views74 slides

What's hot(20)

UM2019 Extended BPF: A New Type of Software by Brendan Gregg
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
Brendan Gregg33.1K views
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter by Ivan Babrou
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
Ivan Babrou910 views
LISA17 Container Performance Analysis by Brendan Gregg
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
Brendan Gregg9.3K views
Performance Wins with BPF: Getting Started by Brendan Gregg
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
Brendan Gregg2K views
eBPF Trace from Kernel to Userspace by SUSE Labs Taipei
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei8.5K views
LSFMM 2019 BPF Observability by Brendan Gregg
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg8.3K views
Linux 4.x Tracing Tools: Using BPF Superpowers by Brendan Gregg
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg210.2K views
Systems@Scale 2021 BPF Performance Getting Started by Brendan Gregg
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg1.5K views
Performance Tuning EC2 Instances by Brendan Gregg
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
Brendan Gregg171.6K views
Container Performance Analysis by Brendan Gregg
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
Brendan Gregg448.6K views
bcc/BPF tools - Strategy, current tools, future challenges by IO Visor Project
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
IO Visor Project1K views
Kernel Recipes 2017: Performance Analysis with BPF by Brendan Gregg
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg5.3K views
Tuning parallelcodeonsolaris005 by dflexer
Tuning parallelcodeonsolaris005Tuning parallelcodeonsolaris005
Tuning parallelcodeonsolaris005
dflexer521 views
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started by Anne Nicolas
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Anne Nicolas44.2K views
LPC2019 BPF Tracing Tools by Brendan Gregg
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
Brendan Gregg1.8K views
Linux kernel-rootkit-dev - Wonokaerun by idsecconf
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
idsecconf2.6K views

Similar to ATO Linux Performance 2018

OSSNA 2017 Performance Analysis Superpowers with Linux BPF by
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
5.1K views68 slides
USENIX ATC 2017 Performance Superpowers with Enhanced BPF by
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPFBrendan Gregg
7.1K views60 slides
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend... by
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Anne Nicolas
479 views61 slides
Debugging linux issues with eBPF by
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
1.7K views43 slides
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,... by
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
9.9K views14 slides
test by
testtest
testWentingLiu4
34 views64 slides

Similar to ATO Linux Performance 2018(20)

OSSNA 2017 Performance Analysis Superpowers with Linux BPF by Brendan Gregg
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg5.1K views
USENIX ATC 2017 Performance Superpowers with Enhanced BPF by Brendan Gregg
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Brendan Gregg7.1K views
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend... by Anne Nicolas
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Anne Nicolas479 views
Debugging linux issues with eBPF by Ivan Babrou
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
Ivan Babrou1.7K views
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,... by Ontico
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Ontico9.9K views
Playing BBR with a userspace network stack by Hajime Tazaki
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
Hajime Tazaki2.1K views
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou... by PROIDEA
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PROIDEA34 views
Crash_Report_Mechanism_In_Tizen by Lex Yu
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
Lex Yu538 views
YOW2020 Linux Systems Performance by Brendan Gregg
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg1.9K views
Profiling your Applications using the Linux Perf Tools by emBO_Conference
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference9.7K views
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014 by Amazon Web Services
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Amazon Web Services36.8K views
Disruptive IP Networking with Intel DPDK on Linux by Naoto MATSUMOTO
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
Naoto MATSUMOTO15.8K views
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ... by confluent
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
confluent2.6K views
BPF: Tracing and more by Brendan Gregg
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg200.3K views
Debugging Ruby by Aman Gupta
Debugging RubyDebugging Ruby
Debugging Ruby
Aman Gupta7.2K views
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ... by PROIDEA
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PROIDEA76 views

More from Brendan Gregg

YOW2021 Computing Performance by
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
2K views108 slides
Performance Wins with eBPF: Getting Started (2021) by
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
1.4K views30 slides
Computing Performance: On the Horizon (2021) by
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
92.7K views113 slides
BPF Internals (eBPF) by
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
15.3K views122 slides
YOW2018 CTO Summit: Working at netflix by
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
3.5K views21 slides
FlameScope 2018 by
FlameScope 2018FlameScope 2018
FlameScope 2018Brendan Gregg
2.7K views25 slides

More from Brendan Gregg(10)

YOW2021 Computing Performance by Brendan Gregg
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
Brendan Gregg2K views
Performance Wins with eBPF: Getting Started (2021) by Brendan Gregg
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg1.4K views
Computing Performance: On the Horizon (2021) by Brendan Gregg
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
Brendan Gregg92.7K views
BPF Internals (eBPF) by Brendan Gregg
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg15.3K views
YOW2018 CTO Summit: Working at netflix by Brendan Gregg
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
Brendan Gregg3.5K views
How Netflix Tunes EC2 Instances for Performance by Brendan Gregg
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg524.1K views
Kernel Recipes 2017: Using Linux perf at Netflix by Brendan Gregg
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg1.5M views
EuroBSDcon 2017 System Performance Analysis Methodologies by Brendan Gregg
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
Brendan Gregg15.8K views
USENIX ATC 2017: Visualizing Performance with Flame Graphs by Brendan Gregg
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
Brendan Gregg672.9K views

Recently uploaded

Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
34 views35 slides
Design Driven Network Assurance by
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
15 views42 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
53 views38 slides
The Research Portal of Catalonia: Growing more (information) & more (services) by
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)CSUC - Consorci de Serveis Universitaris de Catalunya
80 views25 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
36 views43 slides
6g - REPORT.pdf by
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdfLiveplex
10 views23 slides

Recently uploaded(20)

Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman36 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf

ATO Linux Performance 2018

  • 1. Linux Performance 2018 Brendan Gregg Senior Performance Architect Oct 2018
  • 3. https://kernelnewbies.org/Linux_4.18 https://lwn.net/Kernel/ Post frequency: 4 per year 4 per week http://vger.kernel.org/vger-lists.html #linux-kernel LKML400 per day
  • 5. Cloud Hypervisor (patches) Cloud Hypervisor (patches) Linux Kernel (KPTI) Linux Kernel (KPTI) CPU (microcode) CPU (microcode) Application (retpolne) Application (retpolne) KPTI Linux 4.15 & backports
  • 6. Server A: 31353 MySQL queries/sec Server B: 22795 queries/sec (27% slower) serverA# mpstat 1 Linux 4.14.12-virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU) 01:09:13 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 01:09:14 AM all 86.89 0.00 13.08 0.00 0.00 0.00 0.00 0.00 0.00 0.03 01:09:15 AM all 86.77 0.00 13.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:09:16 AM all 86.93 0.00 13.02 0.00 0.00 0.00 0.03 0.00 0.00 0.03 [...] serverB# mpstat 1 Linux 4.14.12-virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU) 01:09:44 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 01:09:45 AM all 82.94 0.00 17.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:09:46 AM all 82.78 0.00 17.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:09:47 AM all 83.14 0.00 16.86 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [...]
  • 7. CPUCPU MMUMMU Main Memory Main Memory TLBTLB Virtual Address Physical Address hit miss (walk) Page Table Page Table Linux KPTI patches for Meltdown flush the Translation Lookaside Buffer
  • 8. Server A: TLB miss walks 3.5% Server B: TLB miss walks 19.2% (16% higher) serverA# ./tlbstat 1 K_CYCLES K_INSTR IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC K_ITLBCYC DTLB% ITLB% 95913667 99982399 1.04 86588626 115441706 1507279 1837217 1.57 1.92 95810170 99951362 1.04 86281319 115306404 1507472 1842313 1.57 1.92 95844079 100066236 1.04 86564448 115555259 1511158 1845661 1.58 1.93 95978588 100029077 1.04 86187531 115292395 1508524 1845525 1.57 1.92 [...] serverB# ./tlbstat 1 K_CYCLES K_INSTR IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC K_ITLBCYC DTLB% ITLB% 95911236 80317867 0.84 911337888 719553692 10476524 7858141 10.92 8.19 95927861 80503355 0.84 913726197 721751988 10518488 7918261 10.96 8.25 95955825 80533254 0.84 912994135 721492911 10524675 7929216 10.97 8.26 96067221 80443770 0.84 912009660 720027006 10501926 7911546 10.93 8.24 [...]
  • 10. Enhanced BPF Kernel kprobeskprobes uprobesuprobes tracepointstracepoints socketssockets SDN ConfigurationSDN Configuration User-Defined BPF Programs … Event TargetsRuntime also known as just "BPF" Linux 4.* perf_eventsperf_events BPF actions BPF actions BPFBPF verifierverifier DDoS MitigationDDoS Mitigation Intrusion DetectionIntrusion Detection Container SecurityContainer Security ObservabilityObservability Firewalls (bpfilter)Firewalls (bpfilter) Device DriversDevice Drivers
  • 11. eBPF is solving new things: off-CPU + wakeup analysis
  • 12. eBPF bcc Linux 4.4+ https://github.com/iovisor/bcc
  • 13. e.g., identify multimodal disk I/O latency and outliers with bcc/eBPF biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs : count distribution 0 -> 1 : 238 |********* | 2 -> 3 : 424 |***************** | 4 -> 7 : 834 |********************************* | 8 -> 15 : 506 |******************** | 16 -> 31 : 986 |****************************************| 32 -> 63 : 97 |*** | 64 -> 127 : 7 | | 128 -> 255 : 27 |* | 19:19:14 msecs : count distribution 0 -> 1 : 427 |******************* | 2 -> 3 : 424 |****************** | […]
  • 14. bcc/eBPF programs are laborious: biolatency # define BPF program bpf_text = """ #include <uapi/linux/ptrace.h> #include <linux/blkdev.h> typedef struct disk_key { char disk[DISK_NAME_LEN]; u64 slot; } disk_key_t; BPF_HASH(start, struct request *); STORAGE // time block I/O int trace_req_start(struct pt_regs *ctx, struct request *req) { u64 ts = bpf_ktime_get_ns(); start.update(&req, &ts); return 0; } // output int trace_req_completion(struct pt_regs *ctx, struct request *req) { u64 *tsp, delta; // fetch timestamp and calculate delta tsp = start.lookup(&req); if (tsp == 0) { return 0; // missed issue } delta = bpf_ktime_get_ns() - *tsp; FACTOR // store as histogram STORE start.delete(&req); return 0; } """ # code substitutions if args.milliseconds: bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000000;') label = "msecs" else: bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000;') label = "usecs" if args.disks: bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist, disk_key_t);') bpf_text = bpf_text.replace('STORE', 'disk_key_t key = {.slot = bpf_log2l(delta)}; ' + 'void *__tmp = (void *)req->rq_disk->disk_name; ' + 'bpf_probe_read(&key.disk, sizeof(key.disk), __tmp); ' + 'dist.increment(key);') else: bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);') bpf_text = bpf_text.replace('STORE', 'dist.increment(bpf_log2l(delta));') if debug or args.ebpf: print(bpf_text) if args.ebpf: exit() # load BPF program b = BPF(text=bpf_text) if args.queued: b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start") else: b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start") b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start") b.attach_kprobe(event="blk_account_io_completion", fn_name="trace_req_completion") print("Tracing block device I/O... Hit Ctrl-C to end.") # output exiting = 0 if args.interval else 1 dist = b.get_table("dist") while (1): try: sleep(int(args.interval)) except KeyboardInterrupt: exiting = 1 print() if args.timestamp: print("%-8sn" % strftime("%H:%M:%S"), end="") dist.print_log2_hist(label, "disk") dist.clear() countdown -= 1 if exiting or countdown == 0: exit()
  • 15. … rewritten in bpftrace (launched Oct 2018)! #!/usr/local/bin/bpftrace BEGIN { printf("Tracing block device I/O... Hit Ctrl-C to end.n"); } kprobe:blk_account_io_start { @start[arg0] = nsecs; } kprobe:blk_account_io_completion /@start[arg0]/ { @usecs = hist((nsecs - @start[arg0]) / 1000); delete(@start[arg0]); }
  • 16. eBPF bpftrace (aka BPFtrace) Linux 4.9+ https://github.com/iovisor/bpftrace # Syscall count by program bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' # Read size distribution by process: bpftrace -e 'tracepoint:syscalls:sys_exit_read { @[comm] = hist(args->ret); }' # Files opened by process bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %sn", comm, str(args->filename)); }' # Trace kernel function bpftrace -e 'kprobe:do_nanosleep { printf(“sleep by %s”, comm); }' # Trace user-level function Bpftrace -e 'uretprobe:/bin/bash:readline { printf(“%sn”, str(retval)); }’ … Good for one-liners & short scripts; bcc is good for complex tools
  • 19. eBPF bpfilter https://lwn.net/Articles/747551/ Linux 4.18+ ipfwadm (1.2.1) ipchains (2.2.10) iptables nftables (3.13) bpfilter (4.18+) jit-compiled NIC offloading
  • 20. BBR TCP congestion control algorithm Bottleneck Bandwidth and RTT 1% packet loss: we see 3x better throughput Linux 4.9 https://twitter.com/amernetflix/status/892787364598132736 https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/ https://queue.acm.org/detail.cfm?id=3022184
  • 21. Kyber Multiqueue block I/O scheduler Tune target read & write latency Up to 300x lower 99th latencies in our testing Linux 4.12 reads (sync)reads (sync) dispatchdispatch writes (async)writes (async) dispatchdispatch completions queue size adjustqueue size adjustKyber (simplified) https://lwn.net/Articles/720675/
  • 22. Hist Triggers Linux 4.17 https://www.kernel.org/doc/html/latest/trace/histogram.html # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] […] { stacktrace: __kmalloc+0x11b/0x1b0 seq_buf_alloc+0x1b/0x50 seq_read+0x2cc/0x370 proc_reg_read+0x3d/0x80 __vfs_read+0x28/0xe0 vfs_read+0x86/0x140 SyS_read+0x46/0xb0 system_call_fastpath+0x12/0x6a } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768 ftrace advanced summaries
  • 23. PSI Pressure Stall Information More saturation metrics! Linux 4.? not merged yet https://lwn.net/Articles/759781/ Resource Utilization (%) Saturation Errors X The USE Method /proc/pressure/cpu /proc/pressure/memory /proc/pressure/io 10-, 60-, and 300-second averages
  • 24. More perf 4.4 - 4.19 (2016 - 2018) ● TCP listener lockless (4.4) ● copy_file_range() (4.5) ● madvise() MADV_FREE (4.5) ● epoll multithread scalability (4.5) ● Kernel Connection Multiplexor (4.6) ● Writeback management (4.10) ● Hybrid block polling (4.10) ● BFQ I/O scheduler (4.12) ● Async I/O improvements (4.13) ● In-kernel TLS acceleration (4.13) ● Socket MSG_ZEROCOPY (4.14) ● Asynchronous buffered I/O (4.14) ● Longer-lived TLB entries with PCID (4.14) ● mmap MAP_SYNC (4.15) ● Software-interrupt context hrtimers (4.16) ● Idle loop tick efficiency (4.17) ● perf_event_open() [ku]probes (4.17) ● AF_XDP sockets (4.18) ● Block I/O latency controller (4.19) ● CAKE for bufferbloat (4.19) ● New async I/O polling (4.19) … and many minor improvements to: • perf • CPU scheduling • futexes • NUMA • Huge pages • Slab allocation • TCP, UDP • Drivers • Processor support • GPUs
  • 25. Take Aways 1. Run latest 2. Browse major features eg, https://kernelnewbies.org/Linux_4.19
  • 26. Some Linux perf Resources - http://www.brendangregg.com/linuxperf.html - https://kernelnewbies.org/LinuxChanges - https://lwn.net/Kernel - https://github.com/iovisor/bcc - http://blog.stgolabs.net/search/label/linux - http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html