Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Best Practices for Application Management in AWS


Published on

A recent talk I gave on AWS best practices for LOPSA Columbus.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Best Practices for Application Management in AWS

  2. 2. INTRODUCTION BRAD ADAIR ▸ Director of Infrastructure Services at IQ Innovations, LLC. ▸ Have been working in IT for 12+ years in various areas ranging from desktop support to system administration to management. ▸ AWS Certified Solutions Architect ▸ Have been working heavily in AWS for about 2.5 years. ▸ Email: ▸ Twitter: @bpadair
  3. 3. INTRODUCTION APPLICATION MANAGEMENT IN AWS ▸ Public cloud in general, and AWS in particular are changing the way that we think about infrastructure and the way we manage the applications that run on that infrastructure. ▸ Less permanence, more ephemeral and temporary. ▸ More purpose built and dedicated resources. ▸ Less “make it fit”
  4. 4. INTRODUCTION FOUR DOMAINS ▸ Performance ▸ Security ▸ Reliability ▸ Scalability
  5. 5. PERFORMANCE WHAT DO WE MEAN? ▸ What do we mean when we talk about performance? ▸ Getting as much power as possible? ▸ Getting just enough? ▸ What about growth?
  6. 6. PERFORMANCE GENERAL GUIDANCE ▸ Use Trusted Advisor to find (somewhat) obvious performance issues. ▸ Things like over-utilized instances, excessive security group rules, and cache-hit ratio can be found here. ▸ Plan for performance to scale, not grow. ▸ Monitor, monitor, monitor.
  7. 7. PERFORMANCE DATABASES ▸ Need special consideration. ▸ RDS, Dynamo, EC2 instance. ▸ If using EC2, use provisioned IOPS, and RAID-0 volumes. ▸ Do not put databases on EFS instances. ▸ Replication - yes/no - where?
  8. 8. PERFORMANCE CASE-STUDY: IQ INNOVATIONS ▸ Two data centers and a public cloud provider. ▸ All Centos running on ESXi. ▸ MySQL database. ▸ Apache, Tomcat, Grails stack on app servers. ▸ 1 clients configuration: 8 servers dedicated to MySQL, 14 app servers, 1 NFS server, 2 utility servers. ▸ Performance was terrible. ▸ Average app response time: ~600ms ▸ Average end-user response time: ~4s ▸ Constantly running out of memory and restarting ▸ Nowhere to grow
  9. 9. PERFORMANCE CASE STUDY: IQ INNOVATIONS ▸ Moved to AWS. Eliminated the collocation space and other cloud provider. ▸ Still running MySQL and Centos. ▸ Databases moved to RDS. Application servers moved to EC2. ▸ Same client configuration: 6 RDS instances for databases, 4 app servers, 1 utility server, EFS to replace SAN. ▸ Performance improved dramatically: ▸ App response time: ~80-100ms ▸ End-user response time: ~1-2s ▸ No more memory issues. ▸ Cost savings of about 50%.
  10. 10. SECURITY HAVEN’T WE BEEN DOING THIS FOREVER? ▸ Yes, and a lot of existing knowledge still applies. ▸ You still need smart policies. ▸ Your application still needs to protect against common attack vectors. ▸ Some things to change with a move to AWS, however. ▸ You are no longer responsible for physical security. ▸ You are no longer responsible for hypervisor security or patching. ▸ Depending on the service you may not even be responsible for OS security and patching.
  11. 11. SECURITY BEST PRACTICES ▸ Trusted advisor. This is a recurring theme. ▸ Bastion hosts ▸ VPC ▸ Peering ▸ Security groups ▸ NACL ▸ COMMON SENSE!
  12. 12. SECURITY COMMON MISTAKES ▸ Console access for everyone. ▸ Overly permissive policies. ▸ Lack of two factor authentication. ▸ Overly/Publicly exposed access keys. ▸ Access key rotation.
  13. 13. RELIABILITY EASIER AND HARDER SIMULTANEOUSLY ▸ A lot of the work for reliability is done for you. ▸ It is a mistake to put too much trust in this. ▸ The tools are there, but you have to choose to use them. ▸ Architecture matters.
  14. 14. RELIABILITY CRITICAL THINGS TO UNDERSTAND ▸ Availability zones ▸ Regions ▸ Difference between AZs and Regions and how they should be used together. ▸ Replication of different services. ▸ Availability SLAs. ▸ S3 storage classes/levels
  15. 15. RELIABILITY CASE STUDY: CONFIDENTIAL COMPANY ▸ Pre-AWS: ▸ Only in one data center due to cost. ▸ Had clients nationwide, but all resources were centralized. ▸ Had to have 4 or more hours of downtime for deployments ▸ Many SPoF including storage and network. Redundancy was attempted but not done well.
  16. 16. RELIABILITY CASE STUDY: CONFIDENTIAL COMPANY ▸ AWS Setup: ▸ Multiple VPCs spread across multiple regions to provide redundancy and be close to customers. ▸ VPC peering to reduce single points of failure. ▸ MAZ RDS instances for databases. ▸ EFS for network based storage. ▸ Replication of databases across regions. ▸ IaC templates for VPCs to allow for rapid reproduction in other regions.
  17. 17. SCALABILITY WHAT IS SCALABILITY ▸ Scalability is about more than simply adding more resources in response to increased demand. ▸ Scalability needs to include both scaling up and scaling down. ▸ Goal is to maximize user experience while minimizing cost.
  18. 18. SCALABILITY DIFFERENT APPROACH ▸ Provision with small spikes in mind, but not growth. ▸ Scale to growth. ▸ Schedule scale downs and scale ups. ▸ Auto-scaling is your friend. ▸ Monitor, monitor, monitor. Don’t alert, alert, alert.
  19. 19. SCALABILITY COMMON MISTAKES ▸ Over-provisioning. ▸ Reserving too quickly. ▸ Planning for vertical scaling as opposed to horizontal. ▸ Provisioning for growth instead of planning for it. ▸ Manual intervention. ▸ Under analysis of utilization.
  20. 20. QUESTIONS?