Cloud Scale: AWS and Azure
Lessons Learned
October 15th, 2014
Nick Stephens
Cloud Scale Challenge
• Pariveda held an internal competition to build a highly scalable
cloud application
• The application had to be built on 2 of the most popular clouds
– AWS and Azure
• It was a great learning experience
Competition - Rules
• Build simple E-commerce site
– Search for Products
– Add to Cart
– Submit Order
• Build on both AWS and Azure
– Must use 3 services each cloud offers
• Best performance for price wins
Competition - SLAs
• Search for Product
– 600,000 requests/min with response in 1 sec
• Add to Cart
– 30,000 requests/min with response within 500 ms
– Request must be persisted within 10 sec
• Submit Order
– 3,000 requests/min with response within 500 ms
– Request must be persisted within 10 sec
Competition - Deliverables
• Teams pick their most cost effective solution
• Demo chosen solution to judges
• Must prove SLAs were met by generating load on system
My Team’s Solution
• Strategy
– Re-use as much as possible
• Chose IaaS over PaaS for portability
– Pick right technology for problem
• Chose NodeJS because of high networking and low CPU need
– Handle Add to Cart and Submit Order requests asynchronously
• Queue request to scale more easily
My Team’s Solution
• Development
– Coded to interface to abstract cloud specific storage logic
• Separate implementations for each cloud
– Used Redis as a queue with Redisq library
• VM with Redis on AWS
• Redis Cache on Azure
My Team’s Solution
• AWS Architecture
– NodeJS Web Server
– Redis Server (Queue)
– NodeJS Worker
• Services Used
– EC2
– DynamoDB
– Cloud Search
My Team’s Solution
• Testing
– Needed to generate heavy load on the system to prove SLAs
• Built a custom load test rig to capture client response times and request
persistence times
– Response times were captured in SQL database for easy reporting
– Used Remote Desktop to monitor servers
• Watched CPU and network traffic to gauge performance
My Team’s Solution
• Competition Results
– We demoed our solution but didn’t meet all SLAs
• Only achieved approximately 300,000 searches/min
– We hadn’t tested our system at that scale
• We realized a bottleneck during the demo
– We didn’t have all of the deployment automated
• We couldn’t quickly redeploy, scale out, and retest
Winning Team’s Solution
• Development
– Developed AWS and Azure solution separately
• Both started out using .NET on Windows
– AWS solution switched to NodeJS on Linux
• Linux servers are much cheaper than Windows
– Azure solution ended up being cheaper
• Higher SQS vs Azure storage transaction costs added
Winning Team’s Solution
• Azure Architecture
– .NET Web API
– PaaS
– Azure Storage
• Services Used
– Web Roles
– Worker Roles
– Azure Storage
Winning Team’s Solution
• Testing
– Wrote custom test harness
• Could view aggregate results from test runners
– Increased application servers until meet SLAs
– Tried different sizes of instances
Lessons Learned
• Scale Out not Up
– This type of problem is a network bound problem
– More instances were better than larger instances
• Synchronous writes were possible for this scenario
– The teams that had synchronous writes had to scale out more
– Asynchronous writes can be quicker and scales better
Lessons Learned
• Capture metrics to judge performance
– Metrics can show bottlenecks
– Objective measure of performance
• Use existing tools whenever possible
– Some teams used load testing service instead of custom tool
– Allowed those teams to focus more on application
Lessons Learned
• Automate deployment as much as possible
– Fast and reliable process
• No clear winner in AWS vs Azure
– Team submissions were split between AWS and Azure
– Each cloud had similar but unique feature sets
– Either cloud could have won with right architecture
QUESTIONS?

Cloud Scale Lessons Learned

  • 1.
    Cloud Scale: AWSand Azure Lessons Learned October 15th, 2014 Nick Stephens
  • 2.
    Cloud Scale Challenge •Pariveda held an internal competition to build a highly scalable cloud application • The application had to be built on 2 of the most popular clouds – AWS and Azure • It was a great learning experience
  • 3.
    Competition - Rules •Build simple E-commerce site – Search for Products – Add to Cart – Submit Order • Build on both AWS and Azure – Must use 3 services each cloud offers • Best performance for price wins
  • 4.
    Competition - SLAs •Search for Product – 600,000 requests/min with response in 1 sec • Add to Cart – 30,000 requests/min with response within 500 ms – Request must be persisted within 10 sec • Submit Order – 3,000 requests/min with response within 500 ms – Request must be persisted within 10 sec
  • 5.
    Competition - Deliverables •Teams pick their most cost effective solution • Demo chosen solution to judges • Must prove SLAs were met by generating load on system
  • 6.
    My Team’s Solution •Strategy – Re-use as much as possible • Chose IaaS over PaaS for portability – Pick right technology for problem • Chose NodeJS because of high networking and low CPU need – Handle Add to Cart and Submit Order requests asynchronously • Queue request to scale more easily
  • 7.
    My Team’s Solution •Development – Coded to interface to abstract cloud specific storage logic • Separate implementations for each cloud – Used Redis as a queue with Redisq library • VM with Redis on AWS • Redis Cache on Azure
  • 8.
    My Team’s Solution •AWS Architecture – NodeJS Web Server – Redis Server (Queue) – NodeJS Worker • Services Used – EC2 – DynamoDB – Cloud Search
  • 9.
    My Team’s Solution •Testing – Needed to generate heavy load on the system to prove SLAs • Built a custom load test rig to capture client response times and request persistence times – Response times were captured in SQL database for easy reporting – Used Remote Desktop to monitor servers • Watched CPU and network traffic to gauge performance
  • 10.
    My Team’s Solution •Competition Results – We demoed our solution but didn’t meet all SLAs • Only achieved approximately 300,000 searches/min – We hadn’t tested our system at that scale • We realized a bottleneck during the demo – We didn’t have all of the deployment automated • We couldn’t quickly redeploy, scale out, and retest
  • 11.
    Winning Team’s Solution •Development – Developed AWS and Azure solution separately • Both started out using .NET on Windows – AWS solution switched to NodeJS on Linux • Linux servers are much cheaper than Windows – Azure solution ended up being cheaper • Higher SQS vs Azure storage transaction costs added
  • 12.
    Winning Team’s Solution •Azure Architecture – .NET Web API – PaaS – Azure Storage • Services Used – Web Roles – Worker Roles – Azure Storage
  • 13.
    Winning Team’s Solution •Testing – Wrote custom test harness • Could view aggregate results from test runners – Increased application servers until meet SLAs – Tried different sizes of instances
  • 14.
    Lessons Learned • ScaleOut not Up – This type of problem is a network bound problem – More instances were better than larger instances • Synchronous writes were possible for this scenario – The teams that had synchronous writes had to scale out more – Asynchronous writes can be quicker and scales better
  • 15.
    Lessons Learned • Capturemetrics to judge performance – Metrics can show bottlenecks – Objective measure of performance • Use existing tools whenever possible – Some teams used load testing service instead of custom tool – Allowed those teams to focus more on application
  • 16.
    Lessons Learned • Automatedeployment as much as possible – Fast and reliable process • No clear winner in AWS vs Azure – Team submissions were split between AWS and Azure – Each cloud had similar but unique feature sets – Either cloud could have won with right architecture
  • 17.