Performance Tuning EC2 Instances

Brendan Gregg
Brendan GreggSenior Performance Architect at Netflix
PFC306 
Brendan Gregg, Performance Engineering, Netflix 
November 12, 2014 | Las Vegas, NV
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
S3 
EC2 
Cassandra 
Applications 
(Services) 
EVCache 
ELB 
Elasticsearch 
SES SQS
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Start 
i2 Select memory to 
cache working set 
Find best 
balance
ASG-v011 
… 
Instance 
Instance 
Instance 
ASG Cluster 
prod1 
ASG-v010 
… 
Instance 
Instance 
Instance 
Canary 
ELB
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Select instance families Select resources 
From any desired 
resource, see 
types & cost
eg, 8 vCPU:
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Acceptable Headroom Unacceptable
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Cost per hour 
Services
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
# schedtool –B PID
vm.swappiness = 0 # from 60
# echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise
vm.dirty_ratio = 80 # from 40 
vm.dirty_background_ratio = 5 # from 10 
vm.dirty_expire_centisecs = 12000 # from 3000 
mount -o defaults,noatime,discard,nobarrier …
/sys/block/*/queue/rq_affinity2 
/sys/block/*/queue/scheduler noop 
/sys/block/*/queue/nr_requests256 
/sys/block/*/queue/read_ahead_kb 256 
mdadm –chunk=64 ...
net.core.somaxconn = 1000 
net.core.netdev_max_backlog = 5000 
net.core.rmem_max = 16777216 
net.core.wmem_max = 16777216 
net.ipv4.tcp_wmem = 4096 12582912 16777216 
net.ipv4.tcp_rmem = 4096 12582912 16777216 
net.ipv4.tcp_max_syn_backlog = 8096 
net.ipv4.tcp_slow_start_after_idle = 0 
net.ipv4.tcp_tw_reuse = 1 
net.ipv4.ip_local_port_range = 10240 65535 
net.ipv4.tcp_abort_on_overflow = 1 # maybe
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Resource 
Utilization 
X (%)
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Application 
System Libraries 
System Calls 
Kernel 
Devices
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
$ sar -n TCP,ETCP,DEV 1 
Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_ (8 CPU) 
09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 
09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.00 
09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00 
09:10:43 PM active/s passive/s iseg/s oseg/s 
09:10:44 PM 21.00 4.00 4107.00 22511.00 
09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s 
09:10:44 PM 0.00 0.00 36.00 0.00 1.00 
[…]
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Stack frame 
Mouse-over 
frames to 
quantify 
Ancestry
# git clone https://github.com/brendangregg/FlameGraph 
# cd FlameGraph 
# perf record -F 99 -ag -- sleep 60 
# perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg
Performance Tuning EC2 Instances
Broken 
Java stacks 
(missing 
frame 
pointer) 
Kernel 
TCP/IP 
GC 
Idle 
thread 
Time 
Locks 
epoll
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
# ./iosnoop –ts 
Tracing block I/O. Ctrl-C to end. 
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms 
5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62 
5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42 
5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48 
5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43 
[…] 
# ./iosnoop –h 
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration] 
-d device # device string (eg, "202,1) 
-i iotype # match type (eg, '*R*' for all reads) 
-n name # process name to match on I/O issue 
-p PID # PID to match on I/O issue 
-Q # include queueing time in LATms 
-s # include start time of I/O (s) 
-t # include completion time of I/O (s) 
[…]
Performance Tuning EC2 Instances
# perf record –e skb:consume_skb –ag -- sleep 10 
# perf report 
[...] 
74.42% swapper [kernel.kallsyms] [k] consume_skb 
| 
--- consume_skb 
arp_process 
arp_rcv 
__netif_receive_skb_core 
__netif_receive_skb 
netif_receive_skb 
virtnet_poll 
net_rx_action 
__do_softirq 
irq_exit 
do_IRQ 
ret_from_intr 
[…] 
Summarizing stack traces for a 
tracepoint 
perf_events can do many things, 
it is hard to pick just one example
Performance Tuning EC2 Instances
ec2-guest# ./showboost 
CPU MHz : 2500 
Turbo MHz : 2900 (10 active) 
Turbo Ratio : 116% (10 active) 
CPU 0 summary every 5 seconds... 
Real CPU MHz 
TIME C0_MCYC C0_ACYC UTIL RATIO MHz 
06:11:35 6428553166 7457384521 51% 116% 2900 
06:11:40 6349881107 7365764152 50% 115% 2899 
06:11:45 6240610655 7239046277 49% 115% 2899 
[...]
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Region App Breakdowns 
Metrics 
Options 
Interactive 
Graph 
Summary Statistics
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
Utilization Saturation 
Errors 
Per device 
Breakdowns
Performance Tuning EC2 Instances
Performance Tuning EC2 Instances
http://aws.amazon.com/ec2/instance-types/ 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html 
http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance 
http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html 
http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html 
http://www.brendangregg.com/linuxperf.html 
http://www.slideshare.net/brendangregg/linux-performance-tools-2014 
http://www.brendangregg.com/USEmethod/use-linux.html 
http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html 
https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools
Performance Tuning EC2 Instances
Talk Time Title 
PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability 
BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix 
PFC-306 Wednesday, 3:30pm Performance Tuning EC2 
DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source 
Tools can accelerate and scale your services 
ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale 
PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The 
Pros and Cons of Micro Services Architectures 
ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems 
APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud
Performance Tuning EC2 Instances
1 of 81

Recommended

re:Invent 2019 BPF Performance Analysis at Netflix by
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
5.5K views65 slides
LISA2019 Linux Systems Performance by
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
374.3K views64 slides
Container Performance Analysis by
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
448.6K views75 slides
YOW2020 Linux Systems Performance by
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
1.9K views64 slides
Performance Wins with eBPF: Getting Started (2021) by
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
1.4K views30 slides
Linux Systems Performance 2016 by
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
504.5K views72 slides

More Related Content

What's hot

BPF Internals (eBPF) by
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
15.3K views122 slides
USENIX ATC 2017: Visualizing Performance with Flame Graphs by
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
672.9K views66 slides
How to use KASAN to debug memory corruption in OpenStack environment- (2) by
How to use KASAN to debug memory corruption in OpenStack environment- (2)How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)Gavin Guo
8K views32 slides
(CMP402) Amazon EC2 Instances Deep Dive by
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep DiveAmazon Web Services
27.4K views59 slides
Broken Linux Performance Tools 2016 by
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
822.7K views95 slides
Linux BPF Superpowers by
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
422.8K views60 slides

What's hot(20)

BPF Internals (eBPF) by Brendan Gregg
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg15.3K views
USENIX ATC 2017: Visualizing Performance with Flame Graphs by Brendan Gregg
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
Brendan Gregg672.9K views
How to use KASAN to debug memory corruption in OpenStack environment- (2) by Gavin Guo
How to use KASAN to debug memory corruption in OpenStack environment- (2)How to use KASAN to debug memory corruption in OpenStack environment- (2)
How to use KASAN to debug memory corruption in OpenStack environment- (2)
Gavin Guo8K views
Broken Linux Performance Tools 2016 by Brendan Gregg
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
Brendan Gregg822.7K views
Linux BPF Superpowers by Brendan Gregg
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
Brendan Gregg422.8K views
BPF: Tracing and more by Brendan Gregg
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg200.3K views
Debug dpdk process bottleneck & painpoints by Vipin Varghese
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
Vipin Varghese819 views
eBPF Trace from Kernel to Userspace by SUSE Labs Taipei
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei8.5K views
Kernel_Crash_Dump_Analysis by Buland Singh
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
Buland Singh1.9K views
DPDK: Multi Architecture High Performance Packet Processing by Michelle Holley
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley9.1K views
Lisa12 methodologies by Brendan Gregg
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
Brendan Gregg16.3K views
Linux Performance Analysis: New Tools and Old Secrets by Brendan Gregg
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg603.9K views
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started by Anne Nicolas
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Anne Nicolas44.2K views
Linux Performance Profiling and Monitoring by Georg Schönberger
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
Georg Schönberger8.5K views
Understanding a kernel oops and a kernel panic by Joseph Lu
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
Joseph Lu2.4K views
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt by Anne Nicolas
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Anne Nicolas5.6K views
Meet cute-between-ebpf-and-tracing by Viller Hsiao
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
Viller Hsiao8.8K views

Viewers also liked

Velocity 2015 linux perf tools by
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
1.1M views142 slides
Velocity 2017 Performance analysis superpowers with Linux eBPF by
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
735.7K views54 slides
Blazing Performance with Flame Graphs by
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
323.6K views170 slides
SREcon 2016 Performance Checklists for SREs by
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
206.5K views79 slides
ACM Applicative System Methodology 2016 by
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
158K views57 slides
Stop the Guessing: Performance Methodologies for Production Systems by
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
32.1K views58 slides

Viewers also liked(20)

Velocity 2015 linux perf tools by Brendan Gregg
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
Brendan Gregg1.1M views
Velocity 2017 Performance analysis superpowers with Linux eBPF by Brendan Gregg
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg735.7K views
Blazing Performance with Flame Graphs by Brendan Gregg
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
Brendan Gregg323.6K views
SREcon 2016 Performance Checklists for SREs by Brendan Gregg
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
Brendan Gregg206.5K views
ACM Applicative System Methodology 2016 by Brendan Gregg
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
Brendan Gregg158K views
Stop the Guessing: Performance Methodologies for Production Systems by Brendan Gregg
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg32.1K views
Netflix: From Clouds to Roots by Brendan Gregg
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
Brendan Gregg67.8K views
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013 by Amazon Web Services
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Amazon Web Services13.2K views
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ... by Amazon Web Services
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
Amazon Web Services2.2K views
AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo... by Amazon Web Services
AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...
AWS re:Invent 2016: Design Patterns for High Availability: Lessons from Amazo...
Amazon Web Services4.6K views
No data loss pipeline with apache kafka by Jiangjie Qin
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
Jiangjie Qin23.9K views
RxNetty vs Tomcat Performance Results by Brendan Gregg
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
Brendan Gregg13K views
Linux 4.x Tracing: Performance Analysis with bcc/BPF by Brendan Gregg
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg10.7K views
G1 Garbage Collector: Details and Tuning by Simone Bordet
G1 Garbage Collector: Details and TuningG1 Garbage Collector: Details and Tuning
G1 Garbage Collector: Details and Tuning
Simone Bordet18.7K views
Am I reading GC logs Correctly? by Tier1 App
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
Tier1 App2K views
Troubleshooting PostgreSQL Streaming Replication by Alexey Lesovsky
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky9.6K views
Row Pattern Matching in SQL:2016 by Markus Winand
Row Pattern Matching in SQL:2016Row Pattern Matching in SQL:2016
Row Pattern Matching in SQL:2016
Markus Winand109.4K views
Designing Tracing Tools by Brendan Gregg
Designing Tracing ToolsDesigning Tracing Tools
Designing Tracing Tools
Brendan Gregg5.7K views
Java Performance Analysis on Linux with Flame Graphs by Brendan Gregg
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg24.7K views

Similar to Performance Tuning EC2 Instances

(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014 by
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014Amazon Web Services
36.8K views81 slides
Performance tweaks and tools for Linux (Joe Damato) by
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
2.1K views113 slides
Debugging Ruby Systems by
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
4.3K views94 slides
Debugging Ruby by
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
7.2K views46 slides
ATO Linux Performance 2018 by
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
3.3K views26 slides
Performance tuning jvm by
Performance tuning jvmPerformance tuning jvm
Performance tuning jvmPrem Kuppumani
1.7K views22 slides

Similar to Performance Tuning EC2 Instances(20)

(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014 by Amazon Web Services
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Amazon Web Services36.8K views
Performance tweaks and tools for Linux (Joe Damato) by Ontico
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Ontico2.1K views
Debugging Ruby Systems by Engine Yard
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
Engine Yard4.3K views
Debugging Ruby by Aman Gupta
Debugging RubyDebugging Ruby
Debugging Ruby
Aman Gupta7.2K views
ATO Linux Performance 2018 by Brendan Gregg
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
Brendan Gregg3.3K views
PerfUG 3 - perfs système by Ludovic Piot
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
Ludovic Piot1.1K views
May2010 hex-core-opt by Jeff Larkin
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin459 views
Debugging linux issues with eBPF by Ivan Babrou
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
Ivan Babrou1.7K views
Java/Spring과 Node.js의공존 by 동수 장
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
동수 장28.9K views
SiteGround Tech TeamBuilding by Marian Marinov
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
Marian Marinov795 views
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ... by PROIDEA
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PROIDEA76 views
Deep Dive on Amazon EC2 Instances (March 2017) by Julien SIMON
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
Julien SIMON1.9K views
Direct Code Execution - LinuxCon Japan 2014 by Hajime Tazaki
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
Hajime Tazaki1.3K views
Практический опыт профайлинга и оптимизации производительности Ruby-приложений by Olga Lavrentieva
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Olga Lavrentieva1.6K views
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,... by Ontico
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Ontico9.9K views

More from Brendan Gregg

YOW2021 Computing Performance by
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
2K views108 slides
IntelON 2021 Processor Benchmarking by
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
1K views17 slides
Systems@Scale 2021 BPF Performance Getting Started by
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
1.5K views30 slides
Performance Wins with BPF: Getting Started by
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
2K views24 slides
UM2019 Extended BPF: A New Type of Software by
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
33.1K views48 slides
LPC2019 BPF Tracing Tools by
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
1.8K views9 slides

More from Brendan Gregg(20)

YOW2021 Computing Performance by Brendan Gregg
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
Brendan Gregg2K views
IntelON 2021 Processor Benchmarking by Brendan Gregg
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
Brendan Gregg1K views
Systems@Scale 2021 BPF Performance Getting Started by Brendan Gregg
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg1.5K views
Performance Wins with BPF: Getting Started by Brendan Gregg
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
Brendan Gregg2K views
UM2019 Extended BPF: A New Type of Software by Brendan Gregg
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
Brendan Gregg33.1K views
LPC2019 BPF Tracing Tools by Brendan Gregg
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
Brendan Gregg1.8K views
LSFMM 2019 BPF Observability by Brendan Gregg
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
Brendan Gregg8.3K views
YOW2018 CTO Summit: Working at netflix by Brendan Gregg
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
Brendan Gregg3.5K views
YOW2018 Cloud Performance Root Cause Analysis at Netflix by Brendan Gregg
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg142.4K views
NetConf 2018 BPF Observability by Brendan Gregg
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
Brendan Gregg2.7K views
Linux Performance 2018 (PerconaLive keynote) by Brendan Gregg
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg426.3K views
How Netflix Tunes EC2 Instances for Performance by Brendan Gregg
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg524.1K views
LISA17 Container Performance Analysis by Brendan Gregg
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
Brendan Gregg9.3K views
Kernel Recipes 2017: Performance Analysis with BPF by Brendan Gregg
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg5.3K views
EuroBSDcon 2017 System Performance Analysis Methodologies by Brendan Gregg
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
Brendan Gregg15.8K views
OSSNA 2017 Performance Analysis Superpowers with Linux BPF by Brendan Gregg
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg5.1K views
USENIX ATC 2017 Performance Superpowers with Enhanced BPF by Brendan Gregg
USENIX ATC 2017 Performance Superpowers with Enhanced BPFUSENIX ATC 2017 Performance Superpowers with Enhanced BPF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Brendan Gregg7.1K views

Recently uploaded

Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
280 views86 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
16 views3 slides
Serverless computing with Google Cloud (2023-24) by
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)wesley chun
11 views33 slides
Network Source of Truth and Infrastructure as Code revisited by
Network Source of Truth and Infrastructure as Code revisitedNetwork Source of Truth and Infrastructure as Code revisited
Network Source of Truth and Infrastructure as Code revisitedNetwork Automation Forum
27 views45 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide

Recently uploaded(20)

Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software280 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman36 views
Future of AR - Facebook Presentation by ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views

Performance Tuning EC2 Instances

  • 1. PFC306 Brendan Gregg, Performance Engineering, Netflix November 12, 2014 | Las Vegas, NV
  • 9. S3 EC2 Cassandra Applications (Services) EVCache ELB Elasticsearch SES SQS
  • 13. Start i2 Select memory to cache working set Find best balance
  • 14. ASG-v011 … Instance Instance Instance ASG Cluster prod1 ASG-v010 … Instance Instance Instance Canary ELB
  • 17. Select instance families Select resources From any desired resource, see types & cost
  • 26. Cost per hour Services
  • 37. vm.swappiness = 0 # from 60
  • 38. # echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise
  • 39. vm.dirty_ratio = 80 # from 40 vm.dirty_background_ratio = 5 # from 10 vm.dirty_expire_centisecs = 12000 # from 3000 mount -o defaults,noatime,discard,nobarrier …
  • 40. /sys/block/*/queue/rq_affinity2 /sys/block/*/queue/scheduler noop /sys/block/*/queue/nr_requests256 /sys/block/*/queue/read_ahead_kb 256 mdadm –chunk=64 ...
  • 41. net.core.somaxconn = 1000 net.core.netdev_max_backlog = 5000 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 net.ipv4.tcp_rmem = 4096 12582912 16777216 net.ipv4.tcp_max_syn_backlog = 8096 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 10240 65535 net.ipv4.tcp_abort_on_overflow = 1 # maybe
  • 42. echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
  • 51. Application System Libraries System Calls Kernel Devices
  • 54. $ sar -n TCP,ETCP,DEV 1 Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_ (8 CPU) 09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.00 09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00 09:10:43 PM active/s passive/s iseg/s oseg/s 09:10:44 PM 21.00 4.00 4107.00 22511.00 09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s 09:10:44 PM 0.00 0.00 36.00 0.00 1.00 […]
  • 59. Stack frame Mouse-over frames to quantify Ancestry
  • 60. # git clone https://github.com/brendangregg/FlameGraph # cd FlameGraph # perf record -F 99 -ag -- sleep 60 # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg
  • 62. Broken Java stacks (missing frame pointer) Kernel TCP/IP GC Idle thread Time Locks epoll
  • 65. # ./iosnoop –ts Tracing block I/O. Ctrl-C to end. STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms 5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62 5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42 5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48 5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43 […] # ./iosnoop –h USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration] -d device # device string (eg, "202,1) -i iotype # match type (eg, '*R*' for all reads) -n name # process name to match on I/O issue -p PID # PID to match on I/O issue -Q # include queueing time in LATms -s # include start time of I/O (s) -t # include completion time of I/O (s) […]
  • 67. # perf record –e skb:consume_skb –ag -- sleep 10 # perf report [...] 74.42% swapper [kernel.kallsyms] [k] consume_skb | --- consume_skb arp_process arp_rcv __netif_receive_skb_core __netif_receive_skb netif_receive_skb virtnet_poll net_rx_action __do_softirq irq_exit do_IRQ ret_from_intr […] Summarizing stack traces for a tracepoint perf_events can do many things, it is hard to pick just one example
  • 69. ec2-guest# ./showboost CPU MHz : 2500 Turbo MHz : 2900 (10 active) Turbo Ratio : 116% (10 active) CPU 0 summary every 5 seconds... Real CPU MHz TIME C0_MCYC C0_ACYC UTIL RATIO MHz 06:11:35 6428553166 7457384521 51% 116% 2900 06:11:40 6349881107 7365764152 50% 115% 2899 06:11:45 6240610655 7239046277 49% 115% 2899 [...]
  • 72. Region App Breakdowns Metrics Options Interactive Graph Summary Statistics
  • 75. Utilization Saturation Errors Per device Breakdowns
  • 78. http://aws.amazon.com/ec2/instance-types/ http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html http://www.brendangregg.com/linuxperf.html http://www.slideshare.net/brendangregg/linux-performance-tools-2014 http://www.brendangregg.com/USEmethod/use-linux.html http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools
  • 80. Talk Time Title PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and Service Reliability BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix PFC-306 Wednesday, 3:30pm Performance Tuning EC2 DEV-309 Wednesday, 3:30pm From Asgard to Zuul, How Netflix’s proven Open Source Tools can accelerate and scale your services ARC-317 Wednesday, 4:30pm Maintaining a Resilient Front-Door at Massive Scale PFC-304 Wednesday, 4:30pm Effective Inter-process Communications in the Cloud: The Pros and Cons of Micro Services Architectures ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems APP-310 Friday, 9:00am Scheduling using Apache Mesos in the Cloud