Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
November 12, 2014 | Las Vegas, NV 
PFC306 
Brendan Gregg, Performance Engineering, Netflix
S3 
EC2 
Cassandra 
EVCache 
Applications 
(Services) 
ELB 
Elasticsearch 
SQS 
SES
Start 
i2 
Select memory to cacheworking set 
Find best 
balance
Instance 
Instance 
Instance 
… 
ASG-v011 
Instance 
Instance 
Instance 
… 
ASG-v010 
ASG Clusterprod1 
Canary 
ELB
Select instance families 
Select resources 
From any desired 
resource, see 
types & cost
eg, 8 vCPU:
Headroom 
Unacceptable 
Acceptable
Services 
Cost per hour
# schedtool–B PID
vm.swappiness= 0 # from 60
# echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise
vm.dirty_ratio= 80 # from 40 
vm.dirty_background_ratio= 5 # from 10 
vm.dirty_expire_centisecs= 12000 # from 3000 
mount ...
/sys/block/*/queue/rq_affinity2 
/sys/block/*/queue/schedulernoop 
/sys/block/*/queue/nr_requests256 
/sys/block/*/queue/r...
net.core.somaxconn= 1000 
net.core.netdev_max_backlog= 5000 
net.core.rmem_max= 16777216 
net.core.wmem_max= 16777216 
net...
echo tsc> /sys/devices/system/clocksource/clocksource0/current_clocksource
Resource 
Utilization 
(%) 
X
Application 
System Libraries 
System Calls 
Kernel 
Devices
$sar -n TCP,ETCP,DEV 1 
Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_(8 CPU) 
09:10:43 PM IFACE rxpck/s txpck/s rxkB/s ...
Stack frame 
Mouse-over 
frames toquantify 
Ancestry
# gitclone https://github.com/brendangregg/FlameGraph 
# cd FlameGraph 
# perfrecord -F 99 -ag--sleep 60 
# perfscript | ....
Broken 
Java stacks 
(missing 
frame 
pointer) 
Kernel 
TCP/IP 
GC 
Idle 
thread 
Time 
Locks 
epoll
# ./iosnoop–ts 
TracingblockI/O. Ctrl-C to end. 
STARTsENDsCOMM PID TYPE DEV BLOCK BYTES LATms 
5982800.302061 5982800.302...
# perfrecord –e skb:consume_skb–ag--sleep 10 
# perfreport 
[...] 
74.42% swapper [kernel.kallsyms] [k] consume_skb 
| 
--...
ec2-guest# ./showboost 
CPU MHz : 2500 
Turbo MHz : 2900 (10 active) 
Turbo Ratio : 116% (10 active) 
CPU 0 summary every ...
Region 
App 
Breakdowns 
Metrics 
Options 
Interactive 
Graph 
Summary Statistics
Utilization 
Saturation 
Errors 
Per device 
Breakdowns
http://aws.amazon.com/ec2/instance-types/ http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.htmlhttp://doc...
Talk 
Time 
Title 
PFC-305 
Wednesday, 1:15pm 
Embracing Failure: Fault Injection and ServiceReliability 
BDT-403 
Wednesd...
http://bit.ly/awsevals
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
Upcoming SlideShare
Loading in …5
×

(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

33,661 views

Published on

Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix's use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.

Published in: Technology

(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014

  1. 1. November 12, 2014 | Las Vegas, NV PFC306 Brendan Gregg, Performance Engineering, Netflix
  2. 2. S3 EC2 Cassandra EVCache Applications (Services) ELB Elasticsearch SQS SES
  3. 3. Start i2 Select memory to cacheworking set Find best balance
  4. 4. Instance Instance Instance … ASG-v011 Instance Instance Instance … ASG-v010 ASG Clusterprod1 Canary ELB
  5. 5. Select instance families Select resources From any desired resource, see types & cost
  6. 6. eg, 8 vCPU:
  7. 7. Headroom Unacceptable Acceptable
  8. 8. Services Cost per hour
  9. 9. # schedtool–B PID
  10. 10. vm.swappiness= 0 # from 60
  11. 11. # echo never > /sys/kernel/mm/transparent_hugepage/enabled # from madvise
  12. 12. vm.dirty_ratio= 80 # from 40 vm.dirty_background_ratio= 5 # from 10 vm.dirty_expire_centisecs= 12000 # from 3000 mount -o defaults,noatime,discard,nobarrier…
  13. 13. /sys/block/*/queue/rq_affinity2 /sys/block/*/queue/schedulernoop /sys/block/*/queue/nr_requests256 /sys/block/*/queue/read_ahead_kb256 mdadm–chunk=64 ...
  14. 14. net.core.somaxconn= 1000 net.core.netdev_max_backlog= 5000 net.core.rmem_max= 16777216 net.core.wmem_max= 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 net.ipv4.tcp_rmem = 4096 12582912 16777216 net.ipv4.tcp_max_syn_backlog = 8096 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 10240 65535 net.ipv4.tcp_abort_on_overflow = 1 # maybe
  15. 15. echo tsc> /sys/devices/system/clocksource/clocksource0/current_clocksource
  16. 16. Resource Utilization (%) X
  17. 17. Application System Libraries System Calls Kernel Devices
  18. 18. $sar -n TCP,ETCP,DEV 1 Linux 3.2.55 (test-e4f1a80b) 08/18/2014 _x86_64_(8 CPU) 09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 09:10:44 PM lo14.00 14.00 1.34 1.34 0.00 0.00 0.00 09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00 09:10:43 PM active/s passive/s iseg/s oseg/s 09:10:44 PM 21.00 4.00 4107.00 22511.00 09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s 09:10:44 PM 0.00 0.00 36.00 0.00 1.00 […]
  19. 19. Stack frame Mouse-over frames toquantify Ancestry
  20. 20. # gitclone https://github.com/brendangregg/FlameGraph # cd FlameGraph # perfrecord -F 99 -ag--sleep 60 # perfscript | ./stackcollapse-perf.pl| ./flamegraph.pl> perf.svg
  21. 21. Broken Java stacks (missing frame pointer) Kernel TCP/IP GC Idle thread Time Locks epoll
  22. 22. # ./iosnoop–ts TracingblockI/O. Ctrl-C to end. STARTsENDsCOMM PID TYPE DEV BLOCK BYTES LATms 5982800.302061 5982800.302679 supervise1809 W 202,1 17039600 4096 0.62 5982800.302423 5982800.302842 supervise1809 W 202,1 17039608 4096 0.42 5982800.304962 5982800.305446 supervise1801 W 202,1 17039616 4096 0.48 5982800.305250 5982800.305676 supervise1801 W 202,1 17039624 4096 0.43 […] # ./iosnoop–h USAGE: iosnoop[-hQst] [-d device] [-iiotype] [-p PID] [-n name] [duration] -d device # device string (eg, "202,1) -iiotype# match type (eg, '*R*' for all reads) -n name # process name to match on I/O issue -p PID # PID to match on I/O issue -Q # include queueingtime in LATms -s # include start time of I/O (s) -t # include completion time of I/O (s) […]
  23. 23. # perfrecord –e skb:consume_skb–ag--sleep 10 # perfreport [...] 74.42% swapper [kernel.kallsyms] [k] consume_skb | ---consume_skb arp_process arp_rcv __netif_receive_skb_core __netif_receive_skb netif_receive_skb virtnet_poll net_rx_action __do_softirq irq_exit do_IRQ ret_from_intr […] Summarizing stack traces for atracepoint perf_eventscan do many things, it is hard to pick just one example
  24. 24. ec2-guest# ./showboost CPU MHz : 2500 Turbo MHz : 2900 (10 active) Turbo Ratio : 116% (10 active) CPU 0 summary every 5 seconds... TIME C0_MCYC C0_ACYC UTIL RATIO MHz 06:11:35 6428553166 7457384521 51% 116% 2900 06:11:40 6349881107 7365764152 50% 115% 2899 06:11:45 6240610655 7239046277 49% 115% 2899 [...] Real CPU MHz
  25. 25. Region App Breakdowns Metrics Options Interactive Graph Summary Statistics
  26. 26. Utilization Saturation Errors Per device Breakdowns
  27. 27. http://aws.amazon.com/ec2/instance-types/ http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.htmlhttp://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.htmlhttp://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformancehttp://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.htmlhttp://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.htmlhttp://www.brendangregg.com/linuxperf.htmlhttp://www.slideshare.net/brendangregg/linux-performance-tools-2014http://www.brendangregg.com/USEmethod/use-linux.htmlhttp://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.htmlhttps://github.com/brendangregg/FlameGraphhttps://github.com/brendangregg/perf-tools
  28. 28. Talk Time Title PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and ServiceReliability BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix PFC-306 Wednesday, 3:30pm Performance Tuning EC2 DEV-309 Wednesday, 3:30pm From Asgardto Zuul, How Netflix’s proven Open Source Tools can accelerateand scale your services ARC-317 Wednesday, 4:30pm Maintaining a ResilientFront-Door at Massive Scale PFC-304 Wednesday, 4:30pm Effective Inter-process Communicationsin the Cloud: The Pros and Cons of Micro Services Architectures ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems APP-310 Friday, 9:00am Scheduling using Apache Mesosin the Cloud
  29. 29. http://bit.ly/awsevals

×