Case Study: Lucidchart's Migration to VPC

807 views

Published on

Originally presented at CloudConnect 2013 in Chicago, IL.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
807
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Case Study: Lucidchart's Migration to VPC

  1. 1. Case Study: Lucidchart's Migration to VPC by Matthew Barlocker www.lucidchart.com/jobs
  2. 2. “The Barlocker” • • • • • Chief Architect at Lucid Software Inc since 2011 Bachelors in CS from BYU Managed data center, Rackspace and AWS deployments Love to play board games, go 4wheeling, wrestle my sons, and fly airplanes nineofclouds.blogspot.com www.lucidchart.com/jobs
  3. 3. Why Lucid Chose VPC • Same price as EC2 Classic • Interoperability with existing AWS services (S3, Route53, etc) • New features like Internal ELBs and on-the-fly security group changes • Heightened security using only private IPs www.lucidchart.com/jobs
  4. 4. Other Benefits • • • • • All ELBs have security groups Additional security layer with Network ACLs Elastic IPs stay associated with stopped instances VPN support for common hardware Reserved instances can be transferred between EC2 classic and VPC www.lucidchart.com/jobs
  5. 5. Drawbacks Cost & maintenance of NAT instance(s) Setup time New terminology VPN or SSH tunnel is required to access instances on private subnets • Internal DNS names are disabled by default • • • • www.lucidchart.com/jobs
  6. 6. Things You Should Know • Instances in the public subnets must have an elastic IP to communicate with the internet • NAT instances are just normal instances that are configured to be routers • NAT instances must be in a public subnet • Public & private subnets are defined by their route tables, network ACLs, and DHCP options www.lucidchart.com/jobs
  7. 7. Migration Plan www.lucidchart.com/jobs
  8. 8. Migration Constraints • EC2 cannot connect to private VPC servers • Private VPC server connections must go through the NAT instances • EC2 & VPC have different security groups, load balancers, autoscale groups • EC2 & VPC share EBS volumes, snapshots, instance sizes, zones, regions www.lucidchart.com/jobs
  9. 9. Migration Plan • • • • • Move top layer first Move one layer at a time Meticulously manage security groups Move monitoring/utility servers last http://nineofclouds.blogspot.com/search/label/VPC www.lucidchart.com/jobs
  10. 10. Starting Layout www.lucidchart.com/jobs
  11. 11. Move Webservers First www.lucidchart.com/jobs
  12. 12. Move Next Layer www.lucidchart.com/jobs
  13. 13. Move Databases Next www.lucidchart.com/jobs
  14. 14. Top 5 Pain Points www.lucidchart.com/jobs
  15. 15. 5. Setup & Terminology • Took time to determine which VPC configuration we wanted • Took time to troubleshoot network ACL and security group issues • It took us 3 days with 1 person • We have not had to revisit the configuration since we got it working • Unavoidable www.lucidchart.com/jobs
  16. 16. 4. Security Groups • Private VPC instances communicate through the NAT instances • EC2 instances only see traffic from the NAT • EC2 security groups were open to entire VPC • Avoidable by doing 2 moves – one to public VPC, one to private VPC www.lucidchart.com/jobs
  17. 17. 3. VPN • Highly available configuration supported for some hardware • We chose OpenVPN, which took 3 days to configure and test properly • Avoidable in a number of different ways www.lucidchart.com/jobs
  18. 18. 2. MongoDB Election = Downtime • MongoDB has an election process to determine primary and secondaries • To elect a primary, a majority of servers must vote • Because EC2 cannot speak to VPC, we had to move each server to the public subnet, and then to the private afterward • During move from public to private, MongoDB died for 15 minutes • Avoidable by not using MongoDB www.lucidchart.com/jobs
  19. 19. 1. NAT Bandwidth • The traffic between private VPC and EC2 exceeded the capacity for our NAT instances • Requests timed out as throughput maxed out • Downtime of 30 minutes on some services • Completely avoidable! During the migration, increase size of NAT instances. Decrease after the migration is done. www.lucidchart.com/jobs
  20. 20. Thank You www.lucidchart.com/jobs
  21. 21. www.lucidchart.com/jobs

×