Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
10 Tips for Your Journey to the Public Cloud
Suchi Upadhyayula Sean McCluskey
Director of Product Development, Intuit Dire...
Quick Facts About Mint
Millions of Active Users
> 50TB of Financial Data
> 400 Servers
(in 10 PODS, > 90 MySQL Shards)
1.5k req/sec, 80k concurrent
connections, 120k concurrent
sessions
Tablets
iPad, Android, Surface
Smart Phones
iPhone, Android, Win 8
Web
Desktops
Mac, Win 8
Mint is on …
10 Tips from Our Journey
Load Balancing
• Security policy against terminating SSL on ELB
– ELB acts as a dumb pass-through
• Routing logic to suppo...
Securing Sensitive Customer Data
• Multi-layer encryption (integrated with Amazon’s Key Management System) with periodic k...
Establishing a Framework for Low Latency
• Prepare for latency impact due to encryption
– Mint planned for 30% degradation...
Infrastructure as Code
• Configuration change in the infrastructure resulted in a release
failing to deploy and requiring ...
Migrating Large Volumes of Data
• Not feasible to copy >50TB (and growing) of secure data “over the
wire”
• Plan for data ...
High Availability and Disaster Recovery
• Recovery Time Objective (RTO): time to restore a
service to operation
• Recovery...
Monitoring and Diagnostics
• Disassociate with IPs
– Instances, ELBs, and their IP addresses are dynamic
– Number of insta...
End-to-End Testing
• In addition to validating the full functionality of the production
environment, you also need to vali...
Managing Costs
• Compute: reserved vs. on-demand
– If compute is “on” for more than 9 hours per day, reserved will save mo...
Release Operations
• Infrastructure deployed independently of applications
– DB schema
– AMI
– Infrastructure as code
– Ap...
Summary
1. Load balancing: Evaluate if ELB is sufficient and plan ahead
2. Security: Multi-layer encryption, AWS Key Manag...
Thank You
Upcoming SlideShare
Loading in …5
×

10 Tips for Your Journey to the Public Cloud

Presented at Velocity 2015.

  • Login to see the comments

10 Tips for Your Journey to the Public Cloud

  1. 1. 10 Tips for Your Journey to the Public Cloud Suchi Upadhyayula Sean McCluskey Director of Product Development, Intuit Director of Quality and Operations, Intuit May 28, 2015
  2. 2. Quick Facts About Mint
  3. 3. Millions of Active Users
  4. 4. > 50TB of Financial Data
  5. 5. > 400 Servers (in 10 PODS, > 90 MySQL Shards)
  6. 6. 1.5k req/sec, 80k concurrent connections, 120k concurrent sessions
  7. 7. Tablets iPad, Android, Surface Smart Phones iPhone, Android, Win 8 Web Desktops Mac, Win 8 Mint is on …
  8. 8. 10 Tips from Our Journey
  9. 9. Load Balancing • Security policy against terminating SSL on ELB – ELB acts as a dumb pass-through • Routing logic to support bulk-head pattern (Pods) too complex for current ELBs • Developed a proxy layer to: – Terminate SSL – Implement routing logic – Access audit logging 1
  10. 10. Securing Sensitive Customer Data • Multi-layer encryption (integrated with Amazon’s Key Management System) with periodic key rotation: – Application encryption of sensitive data – Encryption in flight – File level encryption at rest • Reviewed fields to identify sensitive data to be “application level” encrypted – Dropping of clear text columns before data ready to ship • >50TB of data encrypted 2
  11. 11. Establishing a Framework for Low Latency • Prepare for latency impact due to encryption – Mint planned for 30% degradation • Continuous measurement of TP50, TP90, TP99 for critical features – Weekly review of TPs to drive improvements to reduce latency – Constant tuning of code and single page architecture – Able to maintain TP50 & TP90 SLAs • Create a culture of continuous focus on TPs to drive improvements 3
  12. 12. Infrastructure as Code • Configuration change in the infrastructure resulted in a release failing to deploy and requiring rollback • What we learned: – In AWS, operations spends a lot of time writing code: CloudFormation templates, deployment automation, monitors – Development rigor was new to the operations team – Needed to adopt development practices within operations: designs, code reviews, testing, validation, formal release processes for infrastructure 4
  13. 13. Migrating Large Volumes of Data • Not feasible to copy >50TB (and growing) of secure data “over the wire” • Plan for data transport to AWS: – Encrypted drives physically secure shipped to AWS; 3 days to ship backup copy to AWS and upload – Catch up replication – Final drive shipment needs to be timed so that replication can catch up to the shipment window and sustain data growth prior to production cutover 5
  14. 14. High Availability and Disaster Recovery • Recovery Time Objective (RTO): time to restore a service to operation • Recovery Point Objective (RPO): amount of data acceptable to lose • Solve for availability first with Multi-AZ • Determine acceptable RTO/RPO and solve for regional failures second – Balance lower RTO/RPO against increased cost and complexity – Recognize the technology you use to handle regional failures will add complexity that could increase outages Region US-EAST Availability Zone Availability Zone Availability Zone Region US-WEST Availability Zone Availability Zone Availability Zone 6
  15. 15. Monitoring and Diagnostics • Disassociate with IPs – Instances, ELBs, and their IP addresses are dynamic – Number of instances are constantly changing – When an instance has issues it can be “blown away” • Build resilient and self-healing infrastructure – Monitoring should then be built to compliment this – If you alert on failure, have the courtesy to alert on healing 7
  16. 16. End-to-End Testing • In addition to validating the full functionality of the production environment, you also need to validate: – Build, config, deploy, and validation infrastructure – Logging, Monitoring, etc system that ensure the environment is healthy – Access controls and security – Auto-Scaling • Continuous synthetic testing in the production environment – provide an end-to-end test to ensure the customer experience doesn’t degrade 8
  17. 17. Managing Costs • Compute: reserved vs. on-demand – If compute is “on” for more than 9 hours per day, reserved will save money – On-demand for seasonal workloads and rare peaks – Reaper scripts; shutdown unused instances • Snapshots drove significant cost savings • Storage is cheap – A lot of work that yields a small return • IOPS are not – Optimizing IOPS per shard saved a lot of money 9 Other, 3.13% Storage, 3.42% IOPS, 17.09%Snapshots, 42.17% Compute, 34.19% Savings Distribution
  18. 18. Release Operations • Infrastructure deployed independently of applications – DB schema – AMI – Infrastructure as code – Application • Support rollbacks for everything (blue-green) – We can always go back to N-1, ALWAYS!! 10
  19. 19. Summary 1. Load balancing: Evaluate if ELB is sufficient and plan ahead 2. Security: Multi-layer encryption, AWS Key Management 3. Low latency: TP50, TP90, TP99 measure and improve 4. Infrastructure as code: Design, review, test templates 5. Migrating large volumes of data: Encrypted drives 6. HA/DR: Multi-AZ, multi-region 7. Monitoring and diagnostics: Disassociate with IP addresses 8. End-to-end testing: Don’t forget to test auto-scaling 9. Managing costs: Compute is more expensive than storage 10. Release operations: Rollback-ready, blue-green
  20. 20. Thank You

×