Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(CMP402) Amazon EC2 Instances Deep Dive

20,571 views

Published on

Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance.  We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.

Published in: Technology
  • Be the first to comment

(CMP402) Amazon EC2 Instances Deep Dive

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. John Phillips, Principal Product Manager, Amazon EC2 October 7, 2015 CMP402 Amazon EC2 Instances Deep Dive Delivering System Performance
  2. 2. InstancesAPIs Networking EC2 EC2 Purchase Options Amazon Elastic Compute Cloud is Big
  3. 3. Host Server Hypervisor Guest 1 Guest 2 Guest n Amazon EC2 Instances
  4. 4. 2006 2008 2010 2012 2014 2016 m1.small m1.large m1.xlarge c1.medium c1.xlarge m2.xlarge m2.4xlarge m2.2xlarge cc1.4xlarge t1.micro cg1.4xlarge cc2.8xlarge m1.medium hi1.4xlarge m3.xlarge m3.2xlarge hs1.8xlarge cr1.8xlarge c3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarge g2.2xlarge i2.xlarge i2.2xlarge i2.4xlarge i2.4xlarge m3.medium m3.large r3.large r3.xlarge r3.2xlarge r3.4xlarge r3.8xlarge t2.micro t2.small t2.med c4.large c4.xlarge c4.2xlarge c4.4xlarge c4.8xlarge d2.xlarge d2.2xlarge d2.4xlarge d2.8xlarge g2.8xlarge t2.large m4.large m4.xlarge m4.2xlarge m4.4xlarge m4.10xlarge Amazon EC2 Instances History
  5. 5. What to Expect from the Session • Defining system performance and how it is characterized for different workloads • How Amazon EC2 instances deliver performance while providing flexibility and agility • How to make the most of your EC2 instance experience through the lens of several instance types
  6. 6. Defining Performance
  7. 7. • Servers are hired to do jobs • Performance is measured differently depending on the job Hiring a Server ?
  8. 8. • What performance means depend on your perspective: – Response time – Throughput – Consistency Defining Performance: Perspective Matters Application System libraries System calls Kernel Devices Workload
  9. 9. Simple Performance Model for Single Thread • Using CPU: executing (in user mode) • Not using CPU: waiting for turn on CPU, waiting for disk or network I/O, thread locks, memory paging, or for more work.
  10. 10. Performance Factors Resource Performance factors Key indicators CPU Sockets, number of cores, clock frequency, bursting capability CPU utilization, run queue length Memory Memory capacity Free memory, anonymous paging, thread swapping Network interface Max bandwidth, packet rate Receive throughput, transmit throughput over max bandwidth Disks Input / output operations per second, throughput Wait queue length, device utilization, device errors
  11. 11. Resource Utilization • For given performance, how efficiently are resources being used • Something at 100% utilization can’t accept any more work • Low utilization can indicate more resource is being purchased than needed
  12. 12. Example: Web Application • MediaWiki installed on Apache with 140 pages of content • Load increased in intervals over time
  13. 13. Example: Web Application • Memory stats
  14. 14. Example: Web Application • Disk stats
  15. 15. Example: Web Application • Network stats
  16. 16. Example: Web Application • CPU stats
  17. 17. • Picking an instance is tantamount to resource performance tuning • Give back instances as easily as you can acquire new ones • Find an ideal instance type and workload combination Instance Selection = Performance Tuning
  18. 18. Delivering Compute Performance with Amazon EC2 Instances
  19. 19. CPU Instructions and Protection Levels Kernel Application • CPU has at least two protection levels: ring0 and ring1 • Privileged instructions can’t be executed in user mode to protect system. Applications leverage system calls to the kernel.
  20. 20. Example: Web application system calls
  21. 21. X86 CPU Virtualization: Prior to Intel VT-x VMM Application Kernel PV • Binary translation for privileged instructions • Para-virtualization (PV) • PV requires going through the VMM, adding latency • Applications that are system call bound are most affected
  22. 22. X86 CPU Virtualization: After Intel VT-x Kernel Application VMM PV-HVM • Hardware assisted virtualization (HVM) • PV-HVM uses PV drivers opportunistically for operations that are slow emulated: • e.g. network and block I/O
  23. 23. Tip: Use PV-HVM AMIs with EBS
  24. 24. Time Keeping Explained • Time keeping in an instance is deceptively hard • gettimeofday(), clock_gettime(), QueryPerformanceCounter() • The TSC • CPU counter, accessible from userspace • Requires calibration, vDSO • Invariant on Sandy Bridge+ processors • Xen pvclock; does not support vDSO • On current generation instances, use TSC as clocksource
  25. 25. Tip: Use TSC as clocksource
  26. 26. CPU Performance and Scheduling • Hypervisor ensures every guest receives CPU time • Fixed allocation • Uncapped vs. capped • Variable allocation • Different schedulers can be used depending on the goal • Fairness • Response time / deadline • Shares
  27. 27. Review: C4 Instances Custom Intel E5-2666 v3 at 2.9 GHz P-state and C-state controls Model vCPU Memory (GiB) EBS (Mbps) c4.large 2 3.75 500 c4.xlarge 4 7.5 750 c4.2xlarge 8 15 1,000 c4.4xlarge 16 30 2,000 c4.8xlarge 36 60 4,000
  28. 28. • By entering deeper idle states, non-idle cores can achieve up to 300MHz higher clock frequencies • But… deeper idle states require more time to exit, may not be appropriate for latency sensitive workloads What’s new in C4: P-state and C-state control
  29. 29. Tip: P-state control for AVX2 • If an application makes heavy use of AVX2 on all cores, the processor may attempt to draw more power than it should • Processor will transparently reduce frequency • Frequent changes of CPU frequency can slow an application
  30. 30. Review: T2 Instances • Lowest cost EC2 Instance at $0.013 per hour • Burstable performance • Fixed allocation enforced with CPU Credits Model vCPU CPU Credits / Hour Memory (GiB) Storage t2.micro 1 6 1 EBS Only t2.small 1 12 2 EBS Only t2.medium 2 24 4 EBS Only t2.large 2 36 8 EBS Only
  31. 31. How Credits Work Baseline Rate Credit Balance • A CPU Credit provides the performance of a full CPU core for one minute • An instance earns CPU credits at a steady rate • An instance consumes credits when active • Credits expire (leak) after 24 hours Burst Rate
  32. 32. Tip: Monitor CPU credit balance
  33. 33. Monitoring CPU Performance in Guest • Indicators that work is being done • User time • System time (kernel mode) • Wait I/O, threads blocked on disk I/O • Else, Idle • What happens if OS is scheduled off the CPU?
  34. 34. Tip: How to interpret Steal Time • Fixed CPU allocations of CPU can be offered through CPU caps • Steal time happens when CPU cap is enforced • Leverage CloudWatch metrics
  35. 35. Delivering I/O Performance with Amazon EC2 Instances
  36. 36. I/O and Devices Virtualization • Scheduling I/O requests between virtual devices and shared physical hardware • Split driver model • Intel VT-d • Direct pass through and IOMMU for dedicated devices • Enhanced Networking
  37. 37. Hardware Split Driver Model Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling
  38. 38. Split Driver Model • Each virtual device has two main components • Communication ring buffer • An event channel signaling activity in the ring buffer • Data is transferred through shared pages • Shared pages requires inter domain permissions, or granting
  39. 39. Review: I2 Instances 16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD 365K random read IOPS for 32 vCPU instance Model vCPU Memory (GiB) Storage Read IOPS Write IOPS i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000 i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000 i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000 i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000
  40. 40. Granting in pre-3.8.0 Kernels • Requires “grant mapping” prior to 3.8.0 • Grant mappings are expensive operations due to TLB flushes read(fd, buffer,…)
  41. 41. • Grant mappings are setup in a pool once • Data is copied in and out of the grant pool read(fd, buffer…) Granting in 3.8.0+ Kernels, Persistent and Indirect Copy to and from grant pool
  42. 42. Tip: Use 3.8+ kernel • Amazon Linux 13.09 or later • Ubuntu 14.04 or later • RHEL7 or later • Etc.
  43. 43. Event Handling • Guest vCPUs are interrupted to process events. • Pre-2.6.36 kernels: notifications went to a single virtual hardware interrupt • Post-2.6.36 kernels: allow instance to tell hypervisor to deliver notification to a specific vCPU for balancing • Check "dmesg" for the following text: "Xen HVM callback vector for event delivery is enabled“ • Also, check version of irqbalance is 1.0.7 or higher
  44. 44. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  45. 45. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  46. 46. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  47. 47. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  48. 48. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  49. 49. Device Pass Through: Enhanced Networking • SR-IOV eliminates need for driver domain • Physical network device exposes virtual function to instance • Requires a specialized driver, which means: • Your instance OS needs to know about it • EC2 needs to be told your instance can use it
  50. 50. Hardware After Enhanced Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  51. 51. Hardware After Enhanced Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  52. 52. Hardware After Enhanced Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  53. 53. Tip: Use Enhanced Networking • Highest packets-per-second • Lowest variance in latency • Instance OS must support it • Look for SR-IOV property of instance or image
  54. 54. Summary
  55. 55. • Find an instance type and workload combination – Define performance – Monitor resource utilization – Make changes Instance Selection = Performance Tuning
  56. 56. • PV-HVM • Time keeping: use TSC • C state and P state controls • Monitor T2 CPU credits • Persistent grants for I/O performance • Event callbacks and IRQ balancing • Enhanced Networking Recap: Getting the Most Out of EC2 Instances
  57. 57. Next steps • Visit the EC2 Instance Documentation • Come visit us in the Developer Chat to hear more
  58. 58. Thank you!
  59. 59. Remember to complete your evaluations!

×