The Art of Infrastructure Elasticity              April 28th, 2012    Cloud Developer Conference 2012 , Bangalore         ...
Agenda• Problem• Challenges• Requirements• Solution Architecture• Q&A                          2
What is the problem scenario ?                                 3
Big Sales Promotion every quarter by         the Enterprise                                       4
• Massive online Concurrent Visitors• Limited processing capacity of the Booking Engine                (~3k requests/sec) ...
• Unhappy Visitors• More Booking opportunity lost                                  6
Solution (Step 1):• Create a Queuing App before the Booking engine• Efficiently Queue the concurrent visitors             ...
Solution (Step 2) :Moderate and move the visitors waiting in Queuingapp to Booking engine                                 ...
What are the Challenges ?                            9
Concurrency• HTTP/AJAX/REST requests • Total : 500+ Million requests in 6 hours • Average :23k+ requests/sec • Peak : 80K+...
Queue efficiency• Allot unique Queue Numbers for visitors• Queue Number allotment on Fair Basis (As  much possible)• Reduc...
Load Volatility                                         Peak utilization duringCompute                                    ...
IP Whitelisting Public Cloud                                                      3rd Party                               ...
Variety of OS / Software’s• RedHat OS for Load Balancer , NoSQL and  Queue Layer• Apache Tomcat Java web/App Layer• CentOS...
What are the requirements from          enterprise ?                                 15
Requirements• Elastic Infrastructure • Create the Infrastructure 2 hrs before the   promo • Tear down infrastructure 2 hrs...
Solution Architecture                        17
Solution ArchitectureOption 1: Single Queue ( Initial thought)                    Queuing                   Application   ...
Solution ArchitectureOption 2: Parallel Queue ( Recommended)                                 BookingConcurrent       Queui...
Request types• Customer Visit is a HTTP request to the  Queuing Application• Current Visitor Queue position is a AJAX  cal...
Solution Step 1 : The Cloud ?• Amazon Web Services• We had 4+ years Architecture experience in AWS• It satisfied many cust...
Solution Step 2 : R53/NW                                Amazon Virtual Private CloudUsers         Amazon         Route 53 ...
Solution Step 3 : Load Balancing                                                         Amazon Virtual Private CloudUsers...
Solution Step 3: Load Balancing• HAProxy vs Amazon ELB• Custom programs to Auto Scale HAProxy• HAProxy Elastic -> Attach /...
Solution Step 4 : Web/App Servers                                                         Amazon Virtual Private CloudUser...
Solution Step 4: Web/App Servers• 3 Web/App instances under every HAProxy• C1.Xlarge Instance Type for Web/App Instances• ...
Solution Step 5 : Queue Servers                                               Amazon Virtual Private Cloud                ...
Solution Step 5: Queue Servers• RabbitMQ vs Amazon SQS• FIFO/Concurrency/No Duplicate messages• 1 RabbitMQ instance for qu...
Solution Step 6 : Processors/Redis                 Amazon                                  Single Sector View     Componen...
Solution Step 6: Redis• Redis vs Amazon DynamoDB• Redis : NoSQL KV Data store• Visitors are shown their Current Queue  pos...
Solution Step 6: Processors• BG Processors : Java Programs to  • RabbitMq -> Redis : Allot Queue numbers to visitor    req...
Overall Solution ArchitectureSector is not an AWSterm , it is 8KMiles termfor Logical EC2 instance             Amazongroup...
Scalability                                   AZ-1                Amazon Virtual Private Cloud                            ...
Scalability• New sectors containing LB, Web, Queue ,  NoSQL , BG stack will be created  automatically depending upon the l...
High Availability @ Instance level                                   AZ-1                Amazon Virtual Private Cloud     ...
High Availability @ Instance• HA built @ Web/App , Redis and BG  processor instances• Any Failure / Non responsive EC2 ins...
High Availability @ Sector level                                   AZ-1                Amazon Virtual Private Cloud       ...
High Availability @ Sector level• Any Failure / Non responsive instances inside  Sectors will be automatically  detected/r...
High Availability @ AZ Level                                   AZ-1                Amazon Virtual Private Cloud           ...
High Availability @ AZ level• If entire AZ-2 fails then load will be balanced  to instances in AZ-1• Automated programs wi...
Log Analytics                                       HDFS Cluster            1               2                  3   EC2    ...
Monitoring• Nagios + Puppet (combined) for Auto  scaled monitoring infra and deployment• CloudWatch Custom metrics / Tomca...
Backup• No backups -> only Syncs to S3• Golden AMI’s snapshot to S3• Periodic Sync of data between EC2 and S3• Periodic lo...
Infrastructure• Amazon Route53• Amazon VPC – Public , Private subnet• 150+ EC2 instances , 2 AZ’s , 1 Region• 70+ Elastic ...
Infrastructure Elasticity• Entire Infra created 2 hrs before promo• Tear down infra 2 hrs after promo• ~30 Mins to launch ...
Infrastructure Cost• ~10K USD per promo• Not inclusive of Data charges• Unthinkable Savings• Visitor experience was good• ...
If you need help in architecting Highly Elasticsolutions on AWS?
Leave it to the experts , we willhandle thisCloud Architecture ConsultingCloud Application DevelopmentCloud Migration & Im...
Q&AHarish11g.aws@gmail.comhttp://in.linkedin.com/in/harishganesanwww.twitter.com/harish11ghttp://harish11g.blogspot.comAma...
Upcoming SlideShare
Loading in...5
×

The art of infrastructure elasticity

12,601

Published on

Art of designing an elastic and scalable infrastructure for an Queuing application

Published in: Technology
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,601
On Slideshare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
0
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

Transcript of "The art of infrastructure elasticity"

  1. 1. The Art of Infrastructure Elasticity April 28th, 2012 Cloud Developer Conference 2012 , Bangalore Harish Ganesan CTO and Co-Founder 8KMiles Harish11g.AWS@gmail.com
  2. 2. Agenda• Problem• Challenges• Requirements• Solution Architecture• Q&A 2
  3. 3. What is the problem scenario ? 3
  4. 4. Big Sales Promotion every quarter by the Enterprise 4
  5. 5. • Massive online Concurrent Visitors• Limited processing capacity of the Booking Engine (~3k requests/sec) 5
  6. 6. • Unhappy Visitors• More Booking opportunity lost 6
  7. 7. Solution (Step 1):• Create a Queuing App before the Booking engine• Efficiently Queue the concurrent visitors 7
  8. 8. Solution (Step 2) :Moderate and move the visitors waiting in Queuingapp to Booking engine 8
  9. 9. What are the Challenges ? 9
  10. 10. Concurrency• HTTP/AJAX/REST requests • Total : 500+ Million requests in 6 hours • Average :23k+ requests/sec • Peak : 80K+ requests/sec 10
  11. 11. Queue efficiency• Allot unique Queue Numbers for visitors• Queue Number allotment on Fair Basis (As much possible)• Reduce the wait time in Queue Number allotment process• Reduce overall Queue wait time for the 11 visitor
  12. 12. Load Volatility Peak utilization duringCompute Promos Wasted Capacity Yearly Complete under utilization of Infra other times • Massive utilization and under utilization 12 pattern
  13. 13. IP Whitelisting Public Cloud 3rd Party Services IP Address of the source EC2 Instances needs to be whitelisted in 3rd party Services gateway• Booking engine needs EC2 IP Whitelisting for security 13• Consecutive IP range needed
  14. 14. Variety of OS / Software’s• RedHat OS for Load Balancer , NoSQL and Queue Layer• Apache Tomcat Java web/App Layer• CentOS for Processing Programs• MySQL for Result storage 14• Hadoop for Analytics
  15. 15. What are the requirements from enterprise ? 15
  16. 16. Requirements• Elastic Infrastructure • Create the Infrastructure 2 hrs before the promo • Tear down infrastructure 2 hrs after the promo • Elastically expand the infra during the promo• Highly Scalable and Available• Log Analytics 16• Complete Infrastructure Automation
  17. 17. Solution Architecture 17
  18. 18. Solution ArchitectureOption 1: Single Queue ( Initial thought) Queuing Application BookingConcurrent Engine visitors 18
  19. 19. Solution ArchitectureOption 2: Parallel Queue ( Recommended) BookingConcurrent Queuing Engine visitors Application 19
  20. 20. Request types• Customer Visit is a HTTP request to the Queuing Application• Current Visitor Queue position is a AJAX call every X seconds to the Queuing Application • More Wait ~ More Calls 20
  21. 21. Solution Step 1 : The Cloud ?• Amazon Web Services• We had 4+ years Architecture experience in AWS• It satisfied many customer requirements and 21 challenges in this use case
  22. 22. Solution Step 2 : R53/NW Amazon Virtual Private CloudUsers Amazon Route 53 EC2 Instances on AWS VPC Subnet 1 VPC Subnet 2 Availability Zone 1 Availability Zone 2 Users• Amazon VPC with Multi-AZ subnet configurations ( HA )• Amazon Route 53 for Managed DNS 22• DNS RR algorithm at Route53
  23. 23. Solution Step 3 : Load Balancing Amazon Virtual Private CloudUsers Amazon Route 53 EBS M1.large EBS M1.large Elastic IP Elastic IP Volumes Volumes HAProxy EC2 Instance –1 HAProxy EC2 Instance –2 Users Round Robin Round Robin Algorithm Algorithm VPC Subnet 1 23 Availability Zone 1
  24. 24. Solution Step 3: Load Balancing• HAProxy vs Amazon ELB• Custom programs to Auto Scale HAProxy• HAProxy Elastic -> Attach / Detach from Route53• HAProxy IP whitelisting in 3rd party Gateway• 16 HAProxy Instances , 2 AZ’s , 2 Subnets• RR Load Balancing algorithm 24
  25. 25. Solution Step 4 : Web/App Servers Amazon Virtual Private CloudUsers Amazon Route 53 HA Proxy EC2 Instance-1 Round Robin Algorithm Users EBS C1.Xlarge Elastic IP Web/App 2 Web/App 3 Volumes Web/App EC2 Instance –1 VPC Subnet 1 25 Availability Zone 1
  26. 26. Solution Step 4: Web/App Servers• 3 Web/App instances under every HAProxy• C1.Xlarge Instance Type for Web/App Instances• Custom programs to Auto Scale C1.Xlarge• Automatic Attach / Detach from HAProxy• Every web/App Instance with EIP for IP whitelisting• 48 Web/App EC2 Instances spread across 2 AZ’s 26
  27. 27. Solution Step 5 : Queue Servers Amazon Virtual Private Cloud HA Proxy EC2 Instance-1Users Amazon Route 53 Round Robin Algorithm Users Web/App 1 Web/App 2 Web/App 3 EBS m1.large Volumes RabbitMQ VPC Subnet 1 27 Availability Zone 1
  28. 28. Solution Step 5: Queue Servers• RabbitMQ vs Amazon SQS• FIFO/Concurrency/No Duplicate messages• 1 RabbitMQ instance for queuing every sector• M1. large Instance Type• 16 RabbitMQ Instances overall 28
  29. 29. Solution Step 6 : Processors/Redis Amazon Single Sector View Components of Route 53 Single Sector 1 1. One HAProxy 2. Three Web/App HA Proxy 3. One RabbitMQ 4. One BG Round Robin Processor Node 2 Algorithm 5. Two Redis Sector is not an AWS term , it is 8KMiles term for Web/App 1 Web/App 2 Web/App 3 Logical EC2 3 instance groups for this use case RabbitMQ 4 5 Redis Master 29Processors 6 7 Processors Redis Slave Booking Engine
  30. 30. Solution Step 6: Redis• Redis vs Amazon DynamoDB• Redis : NoSQL KV Data store• Visitors are shown their Current Queue position every X seconds from Redis• 1 Redis Master-Slave instance for every sector• M1. large Instance Type for Redis 30• 32 Redis Instances overall
  31. 31. Solution Step 6: Processors• BG Processors : Java Programs to • RabbitMq -> Redis : Allot Queue numbers to visitor requests and insert to Redis • Redis -> Booking Engine : Moderate the movement of queued visitors from Redis to Booking Engine • Process the Response Status / Booking Status / Inactive Visitors / Timeouts• 2 BG Processor node per sector• CPU intensive : C1.Xlarge Instance Type 31• 32 BG Processor Instances overall
  32. 32. Overall Solution ArchitectureSector is not an AWSterm , it is 8KMiles termfor Logical EC2 instance Amazongroups for this use case Route 53 Sector 1 2 3 4 5 .. .. 16 HAProxy Web/App RabbitMQ Redis BG Programs 32 Booking Engine
  33. 33. Scalability AZ-1 Amazon Virtual Private Cloud AZ-2 Sector -1 Sector -3AmazonRoute 53 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2 Sector -2 Sector -4 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2
  34. 34. Scalability• New sectors containing LB, Web, Queue , NoSQL , BG stack will be created automatically depending upon the load• Same AZ or multi-AZ can be specified for the creation• CloudWatch Custom parameters used• Automated Java Programs were used for the sector creation• No Manual intervention needed 34
  35. 35. High Availability @ Instance level AZ-1 Amazon Virtual Private Cloud AZ-2AmazonRoute 53 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2
  36. 36. High Availability @ Instance• HA built @ Web/App , Redis and BG processor instances• Any Failure / Non responsive EC2 instances will be automatically detected/replaced by Java programs• No Manual intervention needed 36
  37. 37. High Availability @ Sector level AZ-1 Amazon Virtual Private Cloud AZ-2 Sector -1 Sector -2 Sector -5 Sector -3AmazonRoute 53 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2 Sector -6 Sector -4 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2
  38. 38. High Availability @ Sector level• Any Failure / Non responsive instances inside Sectors will be automatically detected/replaced by Java programs• If sector-3 fails , still other sectors will be active and can take requests 38
  39. 39. High Availability @ AZ Level AZ-1 Amazon Virtual Private Cloud AZ-2AmazonRoute 53 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2 EC2 Instances EC2 Instances EC2 Instances EC2 Instances VPC Subnet 1 VPC Subnet 1 VPC Subnet 2 VPC Subnet 2 Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2
  40. 40. High Availability @ AZ level• If entire AZ-2 fails then load will be balanced to instances in AZ-1• Automated programs will create new sectors inside AZ-1 to handle the load 40
  41. 41. Log Analytics HDFS Cluster 1 2 3 EC2 S3 RDSInstances Bucket MySQL with logs Elastic Map Reduce Jobs• Redis , Web/App , HAProxy , RBQ logs synced to S3• Elastic MapReduce Jobs to process / analyze the logs• Processed result moved to RDS MySQL for reports/ 41 Visualizations
  42. 42. Monitoring• Nagios + Puppet (combined) for Auto scaled monitoring infra and deployment• CloudWatch Custom metrics / Tomcat Valve/ Automated Java Programs for EC2 42
  43. 43. Backup• No backups -> only Syncs to S3• Golden AMI’s snapshot to S3• Periodic Sync of data between EC2 and S3• Periodic log Sync between Web/App to S3 43
  44. 44. Infrastructure• Amazon Route53• Amazon VPC – Public , Private subnet• 150+ EC2 instances , 2 AZ’s , 1 Region• 70+ Elastic IP’s• 200+ EBS• S3 buckets• Suite of monitoring tools• 1 Puppet Server• Amazon CloudWatch 44• Amazon CloudFront
  45. 45. Infrastructure Elasticity• Entire Infra created 2 hrs before promo• Tear down infra 2 hrs after promo• ~30 Mins to launch the infra in AWS• ~45 Mins to tear down• Automated Failure detection/rectification• Automated Programs for Infra creation 45
  46. 46. Infrastructure Cost• ~10K USD per promo• Not inclusive of Data charges• Unthinkable Savings• Visitor experience was good• More Bookings per PromoPower of Elasticity is Simply priceless 46AWS is “AWSome”
  47. 47. If you need help in architecting Highly Elasticsolutions on AWS?
  48. 48. Leave it to the experts , we willhandle thisCloud Architecture ConsultingCloud Application DevelopmentCloud Migration & ImplementationCloud Adoption Strategy “Lets get the job done”
  49. 49. Q&AHarish11g.aws@gmail.comhttp://in.linkedin.com/in/harishganesanwww.twitter.com/harish11ghttp://harish11g.blogspot.comAmazon Web Servicesaws.amazon.comaws.amazon.com/contact-us/aws-sales 49

×