Lessons Learned Managing Large AWS Environments

3,622 views
3,847 views

Published on

How to you optimize management of 500+ AWS servers? In this presentation I share my experiences using Amazon Web Servers covering techniques for webscale. Learn how to optimized your cost, handle security, automate and be prepared for handling failure.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,622
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
22
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Lessons Learned Managing Large AWS Environments

  1. 1. Lessons learned managing large AWS Environments Ronald Bradford http://ronaldbradford.com @RonaldBradford 2013.06
  2. 2. SCOPE Consulting experiences with AWS Several different clients Largest - 500+ servers Some 40-50+ servers Some 2-5 servers LAMP/RoR/RDS/Windows EffectiveMySQL.com - Performance, Scalability & Business Continuity
  3. 3. ABOUT MySELF Ronald BRADFORD Enterprise Data Architecture 24 years with RDBMS - 13 years with MySQL Using AWS 4+ years Published author - 4 books Accomplished presenter - 8 years Work at Independent MySQL Consultant EffectiveMySQL.com - Performance, Scalability & Business Continuity
  4. 4. Covering 1. Products 5. Instrumentation 2. Cost 6. Failure 3. Web Scale 4. Security EffectiveMySQL.com - Performance, Scalability & Business Continuity
  5. 5. 1 AWS Products & Ecosystem EffectiveMySQL.com - Performance, Scalability & Business Continuity
  6. 6. ABOUT AWS AMAZON WEB SERVICES 30+ Many, many products and features EC2, S3, EBS, ELB, RDS, EMR,VPC, CDN, SWF, SQS, SES, SNS, IAM, ... Mechanical Turk Flexible Payments Service (FPS) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  7. 7. AWS CONSOLE May 2013 Aug 2012 EffectiveMySQL.com - Performance, Scalability & Business Continuity
  8. 8. Announcements Product Announcements http://aws.amazon.com/about-aws/newsletters/ Pricing Changes New instance types New features (e.g. IOPS) New Products (e.g. Redshift/ OpsWorks) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  9. 9. Announcements Product Announcements Examples in presentation http://aws.amazon.com/about-aws/newsletters/ Pricing Changes New instance types New features (e.g. IOPS) New Products (e.g. Redshift/ OpsWorks) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  10. 10. ECOSYSTEM AWS Marketplace Over 800 https://aws.amazon.com/marketplace/ EffectiveMySQL.com - Performance, Scalability & Business Continuity
  11. 11. Product growth When I started No RDS, In-memory Cache, DynamoDB, Glacier No Elastic Beanstalk, OpsWorks No management console EffectiveMySQL.com - Performance, Scalability & Business Continuity
  12. 12. 2 AWS Costs EffectiveMySQL.com - Performance, Scalability & Business Continuity
  13. 13. operating cost Are you monitoring your costs? Daily Hourly EffectiveMySQL.com - Performance, Scalability & Business Continuity
  14. 14. Operating Cost $ ec2_cost.sh https://github.com/ronaldbradford/aws EffectiveMySQL.com - Performance, Scalability & Business Continuity
  15. 15. Operating Cost $29,000 p.m. $ ec2_cost.sh https://github.com/ronaldbradford/aws EffectiveMySQL.com - Performance, Scalability & Business Continuity
  16. 16. Your Money What is AWS costing you? Instance types/sizes http://aws.amazon.com/ec2/instance-types Cost options http://aws.amazon.com/ec2/pricing EffectiveMySQL.com - Performance, Scalability & Business Continuity
  17. 17. Instance Types General-purpose Compute-optimized Memory-optimized Storage-optimized GPU EffectiveMySQL.com - Performance, Scalability & Business Continuity
  18. 18. $ Instance Prices Large Instance (m1.large) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  19. 19. $ Instance Prices Large Instance (m1.large) On Demand $0.24 Reserved $0.136 * + Annual contract ( +$ 0.043) Spot $0.03+ * Can be terminated (budget) Per hour investment EffectiveMySQL.com - Performance, Scalability & Business Continuity
  20. 20. $ Instance Prices Large Instance (m1.large) On Demand $0.24 Reserved $0.136 * + Annual contract ( +$ 0.043) Spot $0.03+ * Can be terminated (budget) Per hour investment EffectiveMySQL.com - Performance, Scalability & Business Continuity
  21. 21. $ Instance Prices Large Instance (m1.large) On Demand $0.24 Reserved $0.136 * + Annual contract ( +$ 0.043) Spot $0.03+ * Can be terminated (budget) Per hour investment EffectiveMySQL.com - Performance, Scalability & Business Continuity
  22. 22. $ Instance Prices Large Instance (m1.large) Was $0.32 til 11/19/2012 Was $0.26 til 1/16/2013 On Demand $0.24 Reserved $0.136 * + Annual contract ( +$ 0.043) Spot $0.03+ * Can be terminated (budget) Light/Medium/Heavy utilization Per hour investment 40% saving up to 80+% saving EffectiveMySQL.com - Performance, Scalability & Business Continuity
  23. 23. SPOT EXAMPLE One hour (24 cents) 1 x Large - Reserved 7.5G, 4 CPUs, 850G 8 x Large - Spot or 1 x Eight Extra Large - Spot (cc2.8xlarge) 60G, 88 CPUs, 3.4T,10Gb NIC EffectiveMySQL.com - Performance, Scalability & Business Continuity
  24. 24. SPOT EXAMPLE One hour (24 cents) 1 x Large - Reserved 7.5G, 4 CPUs, 850G 8 x Large - Spot price has changed 3 times in 8 months or 1 x Eight Extra Large - Spot (cc2.8xlarge) 60G, 88 CPUs, 3.4T,10Gb NIC EffectiveMySQL.com - Performance, Scalability & Business Continuity
  25. 25. SPOT HISTORY 3c to 10c Zone A 3c to 40c Zone B 2013 $ ec2-describe-spot-price-history -t m1.large -d Linux/UNIX SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE SPOTINSTANCEPRICE 0.030000 0.100000 0.030000 0.100000 0.030000 0.030000 0.100000 0.060000 0.030000 0.240000 0.030000 0.030000 0.045000 0.050000 0.400000 0.260000 0.030000 0.030000 0.026000 2013-05-28T17:20:41-0500 2013-05-28T17:07:02-0500 2013-05-28T16:37:51-0500 2013-05-28T16:31:03-0500 2013-05-28T16:24:48-0500 2013-05-28T16:24:48-0500 2013-05-28T16:15:03-0500 2013-05-28T16:08:34-0500 2013-05-28T16:01:59-0500 2013-05-28T15:55:12-0500 2013-05-28T15:48:32-0500 2013-05-28T15:42:07-0500 2013-05-28T15:35:47-0500 2013-05-28T15:35:47-0500 2013-05-28T15:29:15-0500 2013-05-28T15:22:47-0500 2013-05-28T15:16:01-0500 2013-05-28T15:16:01-0500 2013-05-28T15:09:30-0500 m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX us-east-1a us-east-1a us-east-1a us-east-1a us-east-1d us-east-1a us-east-1a us-east-1d us-east-1b us-east-1b us-east-1b us-east-1a us-east-1a us-east-1b us-east-1b us-east-1b us-east-1d us-east-1a us-east-1a EffectiveMySQL.com - Performance, Scalability & Business Continuity
  26. 26. SPOT HISTORY 2.6c to 17c (1/2 of 34c) One AZ only 2012 $ ec2-describe-spot-price-history -t m1.large -d Linux/UNIX 0.0260 0.0260 0.0290 0.0370 0.0600 0.1700 0.1600 0.0600 0.0900 0.0260 0.0260 2012-09-27T09:45:46-0800 2012-09-27T09:45:46-0800 2012-09-27T09:38:37-0800 2012-09-27T09:38:37-0800 2012-09-27T09:31:29-0800 2012-09-27T09:31:29-0800 2012-09-27T09:24:20-0800 2012-09-27T09:17:11-0800 2012-09-27T09:17:11-0800 2012-09-27T09:09:55-0800 2012-09-27T09:09:55-0800 m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large m1.large Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX Linux/UNIX us-east-1b us-east-1d us-east-1b us-east-1d us-east-1b us-east-1d us-east-1d us-east-1b us-east-1d us-east-1c us-east-1b EffectiveMySQL.com - Performance, Scalability & Business Continuity
  27. 27. Using SPOTS Is your volume predicable? Splitting on-demand/spot instances Can work be done asynchronously? i.e. can be queued Is work restartable? EffectiveMySQL.com - Performance, Scalability & Business Continuity
  28. 28. Using SPOTS Is your volume predicable? Splitting on-demand/spot instances Can work be done asynchronously? i.e. can be queued Is work restartable? WARNING: Not for general workloads EffectiveMySQL.com - Performance, Scalability & Business Continuity
  29. 29. Instance sizes Evaluating the right instance size What is your bottleneck? EffectiveMySQL.com - Performance, Scalability & Business Continuity
  30. 30. Instance sizes Evaluating the right instance size What is your bottleneck? Developing a tool to recommend savings EffectiveMySQL.com - Performance, Scalability & Business Continuity
  31. 31. TRUSTED ADVISOR AWS now offers Trusted Advisor Recommendations to save money Improve performance Close security problems http://aws.amazon.com/premiumsupport/trustedadvisor/ EffectiveMySQL.com - Performance, Scalability & Business Continuity
  32. 32. COST SAVINGS Other players http://www.newvem.com/ http://www.cloudyn.com/ EffectiveMySQL.com - Performance, Scalability & Business Continuity
  33. 33. OTHER COST SAvings CDN - Cloudfront Bandwidth http://aws.amazon.com/cloudfront/ Reduce response size (e.g. 10%) Storage old EBS snapshots Remove unused instances NEW: Announced 1/9/2103 CloudWatch Alarm Actions EffectiveMySQL.com - Performance, Scalability & Business Continuity
  34. 34. 3 Web Scale (hint: no humans) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  35. 35. ABOUT WEB SCALE GUI = #FAIL CLI is necessary Manual CLI use is slow Automation in crucial Parallel EffectiveMySQL.com - Performance, Scalability & Business Continuity
  36. 36. AWS CLI’s Different for EC2, ELB, RDS etc Updated frequently (i.e. monthly) $ git clone https://github.com/ronaldbradford/aws.git $ cd aws/scripts $ ./aws_cli_configure.sh EffectiveMySQL.com - Performance, Scalability & Business Continuity
  37. 37. AWS CLI’s Different for EC2, ELB, RDS etc Updated frequently (i.e. monthly) Simple helper $ git clone https://github.com/ronaldbradford/aws.git $ cd aws/scripts $ ./aws_cli_configure.sh EffectiveMySQL.com - Performance, Scalability & Business Continuity
  38. 38. RTFM http://aws.amazon.com/archives/Amazon-EC2 EffectiveMySQL.com - Performance, Scalability & Business Continuity
  39. 39. Identifiers Access Key ID Private Access Key X.509 Certificates (2 of) Private (*) & Public AWS Account ID Canonical User ID https://portal.aws.amazon.com/gp/aws/securityCredentials EffectiveMySQL.com - Performance, Scalability & Business Continuity
  40. 40. CLI Examples Launch Script Demand/Spot or switch between Verify SSH Verify MySQL Verify replication in sync Add to ELB EffectiveMySQL.com - Performance, Scalability & Business Continuity
  41. 41. CLI Examples Audit Script Consolidates information Parallel operations Unused EC2/EBS etc Feeds reporting ELB/EC2 usage EffectiveMySQL.com - Performance, Scalability & Business Continuity
  42. 42. CLI EXAMPLES Others Cost Measurement Cloning (optimizes scale-up) Move servers between load balancers Spot History graphing Spot History email alerts EffectiveMySQL.com - Performance, Scalability & Business Continuity
  43. 43. 4 AWS Security EffectiveMySQL.com - Performance, Scalability & Business Continuity
  44. 44. SECURITY Do not give away the front door keys Do not open all the windows EffectiveMySQL.com - Performance, Scalability & Business Continuity
  45. 45. SECURITY OPTIONS Keypairs Learn the different benefits Security groups Virtual Private Cloud (VPC) Identity and Access Management (IAM) Multi-factor authentication http://aws.amazon.com/mfa/ EffectiveMySQL.com - Performance, Scalability & Business Continuity
  46. 46. SECURITY TIPS Restrict open access to port 80/443 Jump box Restrict IP Access Additional authentication Per user SSH authentication Do not use keypair EffectiveMySQL.com - Performance, Scalability & Business Continuity
  47. 47. products Many Others (AWS Summit 2013) Cloudaware Dome9 Enstratius SafeNet AlertLogic EffectiveMySQL.com - Performance, Scalability & Business Continuity
  48. 48. 5 Instrumentation EffectiveMySQL.com - Performance, Scalability & Business Continuity
  49. 49. Instrumentation EffectiveMySQL.com - Performance, Scalability & Business Continuity
  50. 50. Instrumentation What is important to you? EffectiveMySQL.com - Performance, Scalability & Business Continuity
  51. 51. Instrumentation What is important to you? All server stats EffectiveMySQL.com - Performance, Scalability & Business Continuity
  52. 52. Instrumentation What is important to you? All server stats Sampling issues EffectiveMySQL.com - Performance, Scalability & Business Continuity
  53. 53. Instrumentation What is important to you? All server stats Sampling issues Deceiving averages (frequency) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  54. 54. REQUESTS PER SEC 5 second averages, not 1 minute sample https://github.com/ronaldbradford/reqstat EffectiveMySQL.com - Performance, Scalability & Business Continuity
  55. 55. REQUESTS PER SEC 5 second averages, not 1 minute sample https://github.com/ronaldbradford/reqstat EffectiveMySQL.com - Performance, Scalability & Business Continuity
  56. 56. REQUESTS PER SEC -1,500 RPS 5 second averages, not 1 minute sample https://github.com/ronaldbradford/reqstat EffectiveMySQL.com - Performance, Scalability & Business Continuity
  57. 57. outliers EffectiveMySQL.com - Performance, Scalability & Business Continuity
  58. 58. outliers I care about these EffectiveMySQL.com - Performance, Scalability & Business Continuity
  59. 59. TESTING End to end testing critical Network latency ELB performance EffectiveMySQL.com - Performance, Scalability & Business Continuity
  60. 60. products AWS Cloudwatch Many Others (AWS Summit 2013) Datadog AppDynamics Boundary CopperEgg EffectiveMySQL.com - Performance, Scalability & Business Continuity
  61. 61. products AWS Cloudwatch What features matter? Many Others (AWS Summit 2013) Datadog AppDynamics Boundary CopperEgg EffectiveMySQL.com - Performance, Scalability & Business Continuity
  62. 62. 6 Failure EffectiveMySQL.com - Performance, Scalability & Business Continuity
  63. 63. FAILURE EffectiveMySQL.com - Performance, Scalability & Business Continuity
  64. 64. FAILURE Instances fail EffectiveMySQL.com - Performance, Scalability & Business Continuity
  65. 65. FAILURE Instances fail Outages occur AWS scheduled reboots EffectiveMySQL.com - Performance, Scalability & Business Continuity
  66. 66. FAILURE Instances fail Outages occur AWS scheduled reboots Be prepared Chaos Monkey http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html EffectiveMySQL.com - Performance, Scalability & Business Continuity
  67. 67. CONCLUSION EffectiveMySQL.com - Performance, Scalability & Business Continuity
  68. 68. CONCLUSION Cost Management (saving money) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  69. 69. CONCLUSION Cost Management (saving money) CLI automation EffectiveMySQL.com - Performance, Scalability & Business Continuity
  70. 70. CONCLUSION Cost Management (saving money) CLI automation Instrumentation (inc business metrics) EffectiveMySQL.com - Performance, Scalability & Business Continuity
  71. 71. CONCLUSION Cost Management (saving money) CLI automation Instrumentation (inc business metrics) Distribute your application & data EffectiveMySQL.com - Performance, Scalability & Business Continuity
  72. 72. CONCLUSION Cost Management (saving money) CLI automation Instrumentation (inc business metrics) Distribute your application & data Disaster is inevitable EffectiveMySQL.com - Performance, Scalability & Business Continuity
  73. 73. AWS for FREE http://aws.amazon.com/free/ Free EC2 t1.micro for a year Free RDS t1.micro for a year S3, DynamoDB, SimpleDB, +++ EffectiveMySQL.com - Performance, Scalability & Business Continuity
  74. 74. Ronald Bradford http://effectiveMySQL.com EffectiveMySQL.com - Performance, Scalability & Business Continuity

×