AWS migration: getting to Data Center heaven with AWS and Chef


Published on

Successful case about the migration of a whole platform from a datacenter to AWS with chef managing the configuration

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS migration: getting to Data Center heaven with AWS and Chef

  1. 1. Juan Vicente Herrera Ruiz de Alejo @jvicenteherrera
  2. 2. About Me ● @jvicenteherrera ● ● ●
  3. 3. Description ● Social feed aggregation/recommendation app ● Client developed by a Global Fortune 500 company that makes video consoles, TVs and many years ago Walkmans… ● Expected at the end of 2013 around 1.000.000 new users registered in the platform and 170.000 DAU ● All servers are running in AWS and the deployments and configuration management are handled by Chef.
  4. 4. System stats Main Components – Custom API(Java) – Beanstalk – RabbitMQ – Redis – MongoDB (Sharding) EC2 – Production env: Reserved instances for the mininum configuration. On demand instances for scale out. – Staging env: Reserved instances for ½ day – Elastic Load Balancers – Security Groups and ACLs – Key Pairs per each subnet – Current EC2 region is US east
  5. 5. Main AWS products used
  6. 6. VPC Subnet VPC Subnet VPC Subnet VPC Subnet VPC Subnet VPC Subnet DEV Stage APP Stage DB Prod APP Stage DB DNS VPN DEV- NAT Public- Nexus Public Git server Public- Chef Public- Jenkins Stage NAT Prod NAT Prod NAT Nagios forwarder ELB 1 Web Servers Stage ELB 1 Web Servers Prod Security Group Security Group Security Group Security Group Security Group Architecture/Infrastructure
  7. 7. APP and DB VPC VPC Subnet VPC Subnet Prod APP Prod DB Security Group Security Group Mongodb Config1 Mongodb Config2 Mongodb Config2 Mongodb1 set1 Mongodb2 set1 Mongoarb set1 Mongodb1 set2 Mongodb2 set2 Mongoarb set2 MySQL master MySQL slave LDAP master LDAP slave ELB1 App1 APP1 servers App2 master App2 slave Solr master Solr slave Varnish master Varnish slave Alfresco APP2 servers APP3 servers APP3 servers Redis Master Redis slave Logs RabbitMQ servers
  8. 8. Improvements achieved (I) ● APIs are state-less so you can scale out very easily. Nodes are created by Chef(Knife). ● Fine integration with Chef. Ensure that you have the same configuration in all of the environments and avoid misconfigurations in production environment. Chef Bootstrap ec2 instances works fully integrated with knife. ● Get a quick and confident way to create an exact production mirror (staging) environment with Chef and Cloudformation – Before AWS/Chef → create a staging env took 6 weeks – After AWS/Chef → create a staging env takes less than 1 day
  9. 9. ● Save costs managing non-production environments – Before AWS/Chef → environments up 24*7 – After AWS/Chef → environments up 8 hours / working days (scripts in cron which use API Tools) – Python Script example ● Outage recovery plan handled with nodes snapshots (MongoDB) or Chef (other nodes stateless) ● Very quick response and customized consulting for the project provided by Amazon Team. Improvements achieved (II)
  10. 10. Staging example with dynamic ip (dhcp) knife ec2 server create -I ami-af71f8c6 -r "role[apache]" -f m1.medium --region us-east-1 -S scp-staging -i /Users/juanvi/keypairs/scp-staging.pem -g sg-2418e54b -s subnet-919cecfc -x ec2-user -N stapp-apache-Test -E staging Staging example with static ip ec2-run-instances ami-af71f8c6 -k vpc-public-10-234-1 -g sg- 379e6d58 -s subnet-cb9596a0 -t m1.xlarge --private-ip- address knife bootstrap -/Users/juanvi/keypairs/scp- staging.pem -r "role[webserver]" -N STAGING-public- webserver2 -x ec2-user -E staging --sudo Example Create a new node
  11. 11. What we have learned ● Strongly recommended run servers in more than one availability zone for avoid a total downtime in case of outage us-east-1a us-east-1d
  12. 12. ●For certain services balanced use TCP instead of HTTP. The balancing of requests to different nodes of our APIs by TCP internally solved some problems with HTTP requests without closing sessions. We only use HTTP balancing for requests that come to the public Apache. We noticed that a lot of Apache connections were not closed properly with HTTP balance mode and in some hours we reached the limit connections Solved with TCP balance mode in ELB What we have learned (II)
  13. 13. ●Use Cloudformation to create network resources automatically. –Before Cloudformation→ create one by one all of the resources –After Cloudformation →create automatically all the nodes and network resources of an entire environment in one execution –Cloudformation Example What we have learned (III)
  14. 14. ●Analyze performance tests for choose the minimum number of nodes that will be running 24 * 7 and sizes to reserve instances. Reserved instances reduce the cost to 2/3. –Before AWS/Chef→ limits in the performance tests caused by non available servers due to their costs. Test simulated. –After AWS/Chef →High-powerful Instances available per use only for some hours or days with a reduced cost What we have learned (IV)
  15. 15. ●Advisable to use a large number of small servers instances close to 100% CPU usage, instead of having few powerful machines with their resources wasted, and launch new nodes and balancing requests among them when load increase. ●Pre balancers warming if you expect a exponential increase of the requests ●Request to support increasing the initial limitations of instances that can run on a simultaneous EC2 (20) What we have learned (V)
  16. 16. • You must adapt to the size of the instances whose resources(CPU, RAM...) are predefined and not customizable • You have no control over the evolution of the products that your service depends • You don't have access to the logs of some instances (for example load balancers) • Danger engaging AWS services and consequent difficulty migrating to another DC. Things to consider
  17. 17. ● @jvicenteherrera ● ● ● for your attention