• Save
Building and Managing Scalable Applications on AWS: 1 to 500K users
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Building and Managing Scalable Applications on AWS: 1 to 500K users

on

  • 2,980 views

This presentation session from the Cloud Management, Services and Applications Theatre at Cloud Expo Europe 2014 explores the techniques and AWS services that you can use in order to build high ...

This presentation session from the Cloud Management, Services and Applications Theatre at Cloud Expo Europe 2014 explores the techniques and AWS services that you can use in order to build high scalability web applications on AWS. It also features a great overview of a high-scalability mobile application built by Myriad Group, and AWS customer, that serves over 41 million users.

Statistics

Views

Total Views
2,980
Views on SlideShare
2,952
Embed Views
28

Actions

Likes
28
Downloads
77
Comments
0

5 Embeds 28

https://twitter.com 13
http://blog.vicdev.me.uk 11
https://www.linkedin.com 2
http://www.linkedin.com 1
http://www.plurk.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Introduce yourself, who the crowd is, and our goal for today
  • Background intro: Scaling is a big topic, with lots of opinions, guides, how-tos, and 3rd parties. If you are new to scaling on AWS, you might ask yourself this question: “So how do I scale?”
  • We need some basics to lay the foundations we’ll need to build our knowledge of AWS on top of.
  • Next up we have Availability Zones, these are part of our regions, and existing within them. There will be at a minimum 2 of these in every AZ, and generally speaking your infrastructure will live in one or more AZ’s inside a given region. We’ll be talking a lot about Multi-AZ architectures today, as it’s a core-component to having a highly available, highly redundant, and highly durable infrastructure on AWS.https://aws.amazon.com/about-aws/globalinfrastructure
  • Last up we have our Edge locations, which are the homes to our CloudFront CDN, and 53 DNS service. There are over 40 of these around the globe right now, and we’re constantly trying to add these to new places to help best meet our customer’s needs. We also sometimes have multiple edge locations in a single area to help handle the capacity, IE New York, which has 3 today.https://aws.amazon.com/about-aws/globalinfrastructure
  • Over 30 services as represented in groupings in the console
  • Over 30 services, the 25 major services as seen in the console
  • So let’s start from day one, user one, of our new infrastructure and application.
  • This here is the most basic set up you would need to serve up a web application. We have Route53 for DNS, an EC2 instance running our webapp and database, and an Elastic IP attached to the EC2 instance so Route53 can direct traffic to us. Now in scaling this infrastructure, the only real option we have is to get a bigger EC2 instance…
  • Scaling the one EC2 instance we have to a larger one is the most simple approach to start with. There are a lot of different AWS instance types to go with depending on your work load. Some have high I/O, CPU, Memory, or local storage. You can also make use of EBS-Optimized instances and Provisioned IOPs to help scale the storage for this instance quite a bit.
  • Scaling the one EC2 instance we have to a larger one is the most simple approach to start with. There are a lot of different AWS instance types to go with depending on your work load. Some have high I/O, CPU, Memory, or local storage. You can also make use of EBS-Optimized instances and Provisioned IOPs to help scale the storage for this instance quite a bit.
  • So while we could reach potentially a few hundred or few thousand users supported by this single instance, its not a long term play.
  • We also have to consider some other issues with this infrastructure; No Failover, No redundancy, and too many eggs in one basket, since we have both the database and webapp on the same instance.
  • The first thing we can do to address the issues of too many eggs in one basket, and to over come the “no bigger boat” problem, is to split out our Webapp and Database into two instances. This gives us more flexibility in scaling these two things independently. And since we are breaking out the Database, this is a great time to think about maybe making use of a database services instead of managing this ourselves…
  • At AWS there are a lot of different options to running databases. One is to just install pretty much any database you can think of on an EC2 instance, and manage all of it yourself. If you are really comfortable doing DBA like activities, like backups, patching, security, tuning, this could be an option for you.If not, then we have a few options that we think are a better idea:First is Amazon RDS, or Relational Database Service. With RDS you get a managed database instance of either MySQL, Oracle, or SQL Server, with features such as automated daily backups, simple scaling, patch management, snapshots and restores, High availability, and read replicas, depending on the engine you go with.Next up we have DynamoDB, a NoSQL database, built ontop of SSDs. DynamoDB is based on the Dynamo whitepaper published by Amazon.com back in 2003, considered the grandfather of most modern NoSQL databases like Cassandra and Riak. DynamoDB that we have here at AWS is kind of like a cousin of the original paper. One of the key concepts to DynamoDB is what we call “Zero Administration”. With DynamoDB the only knobs to tweak are the reads and writes per second you want the DB to be able to perform at. You set it, and it will give you that capacity with query responses averaging in single millisecond. We’ve had customers with loads such as half a million reads and writes per second without DynamoDB even blinking.Lastly we have Amazon Redshift, a multi-petabyte-scale data warehouse service. With Redshift, much like most AWS services, the idea is that you can start small, and scale as you need to, while only paying for what scale you are at. What this means is that you can get 1TB of of data per year at less than a thousand dollars with Redshift. This is several times cheaper than most other dataware house providers costs, and again, you can scale and grow as your business dictates without you needing to sign an expensive contract upfront.
  • Given that we have all these different options, from running pretty much anything you want yourself, to making use of one of the database services AWS provides, how do you choose? How do you decide between SQL and NoSQL?
  • So for this scenario today, we’re going to go with RDS and MYSQL as our database engine.
  • Next up we need to address the lack of failover and redundancy in our infrastructure. We’re going to do this by adding in another webapp instance, and enabling the Multi-AZ feature of RDS, which will give us a standby instance in a different AZ from the Primary. We’re also going to replace our EIP with an Elastic Load Balancer to share the load between our two web instances
  • For those who aren’t familiar yet with ELB( Elastic Load Balancer ), it is a highly scalable load balancing service that you can put infront of tiers of your application where you have multiple instances that you want to share load across. ELB is a really great service, in that it does a lot for you without you having to do much. It will create a self-healing/self-scaling LB that can do things such as SSL termination, handle sticky Sessions, have multiple listeners. It will also do health checks back to the instances behind it, and puts a bunch of metrics into CloudWatch for you as well. This is a key service in building highly available infrastructures on AWS.
  • Read this slide.
  • Most of you will get to this point and be pretty well off honestly. You can take this really pretty far for most web applications. We could scale this out over another AZ maybe. Add in another tier of read replicas.
  • but its not that efficient in both performance or cost, and since those are important too, let’s clean up this infrastructure a bit.
  • The biggest things we can do, and these are incredibly important, is lighten up some of the work our webapp is doing, as well as make life easier for our database. We can start by moving any static assets from our webapp instances to S3, and then serving those objects via CloudFront. We can also move things like session information, and any other temporary application data to a memory based cache like one supported by ElastiCache or DynamoDB. We can also use this same cache to store some of our database query results which will help us from hitting the database too much.
  • Talk about S3
  • Talk about CloudFront. Make sure to mention the two charts to the right. Static content will certainly speed up your site, but Static&Dynamic content is even better. The chart down below is showing data from a real customer who went from very little traffic, to a huge spike of over 60gigabits per second, without having to do anything on their side, or notify AWS at all.
  • Imagine for instance if you cached the search pages for highly requested queries. This could take load off your search, off your web application, your database, etc. So now we can see here that we’ve got CloudFront in front of both S3 and our ELB. Now that we’ve got that covered, lets move back to the session information, and database queries we can be caching as well.
  • Talk about DynamoDB. We could use it in this work flow to store session information from our web application.
  • Read slide
  • Read slide
  • Talk about auto-scaling.
  • Imagine a “typical” week of traffic to Amazon.com. This pattern might look a lot like your own traffic, with a peak during the middle of the day, and a low down in the middle of the night.
  • Given this pattern, it becomes really easy to do capacity planning. We can provision say 15% over what we see our normal peak to be, and be happy with the capacity we have for a while, so long as our traffic matches this pattern. BUT, what if I told you that this was the first week of November for Amazon.com
  • And this was the month of November! We can see a pretty big growth here at the end of the month with Black Friday and Cyber Monday sales in the US.
  • IF we attempted to do our ‘add 15% capacity for spikes” rule, we’d be in trouble for the month of November.
  • That’s a lot of potential wasted infrastructure and cost. 76% wasted potentially, while only 24% of it on average for the month gets utilized. Traditionally this is how IT did things. You bought servers for a 6-12 month vision on what growth might be. So, since we can all agree this is bad, what is the solution.
  • Well what if we could map our capacity directly to our needs based on our end-users? We could make sure that at any point in time, we were only provisioning what we needed, vs some arbitrary capacity guess.
  • Read slide.
  • If we add in auto-scaling, our caching layer(both inside, and outside our infrastructure), and the read-replicas with MySQL, we can now handle a pretty serious load. This could potentially even get us into the millions of users by itself if continued to be scaled horizontally and vertically.

Building and Managing Scalable Applications on AWS: 1 to 500K users Presentation Transcript

  • 1. Building and Managing Scalable Applications on AWS: 1 to 500K users Ian Massingham IanMmmm Technical Evangelist, Amazon Web Services Feb, 2014
  • 2. WARNING: NOT A MARKETING PRESENTATION
  • 3. • US: Ian Massingham – Technical Evangelist, Amazon Web Services, @IanMmmm • YOU: Here to learn more about scaling and managing applications on AWS • THIS SESSION: About best practices and things to think about when building for large scale
  • 4. So how do we scale?
  • 5. Some Basics
  • 6. Regions& Availability Zones US West (Oregon) EU (Ireland) AWS GovCloud (US) Asia Pacific (Tokyo) US East (Virginia) Asia Pacific (Sydney) US West (N. California) South America (Sao Paulo) Asia Pacific (Singapore)
  • 7. Edge Locations
  • 8. Service Reference Model Deployment & Administration App Services Compute Storage & Content Delivery Database Networking AWS Global Infrastructure
  • 9. AWS OpsWorks Amazon SNS Amazon CloudSearch Amazon SES Amazon SWF Amazon SQS Amazon CloudWatch AWS Elastic Beanstalk AWS IAM AWS CloudFormation Deployment & Administration AWS Data Pipeline Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon RedShift Amazon Elastic Transcoder App Services Compute Amazon EC2 Amazon EMR Storage & Content Delivery Database AWS Storage Gateway Amazon VPC Networking Amazon Route 53 Amazon S3 AWS Direct Connect AWS Global Infrastructure Amazon CloudFront Amazon Glacier
  • 10. So let’s start from day one, user one ( you )
  • 11. Day One, User One • A single EC2 Instance • With full stack on this host • Web app • Database • Management • Etc. • A single Elastic IP • Route53 for DNS Amazon Route 53 User Elastic IP EC2 Instance
  • 12. “We’re gonna need a bigger box” • Simplest approach • Can now leverage PIOPs • High I/O instances • High memory instances • High CPU instances • High storage instances • Easy to change instance sizes • Will hit an endpoint eventually hi1.4xlarge m2.4xlarge m1.small
  • 13. “We’re gonna need a bigger box” • Simplest approach • Can now leverage PIOPs • High I/O instances • High memory instances • High CPU instances • High storage instances • Easy to change instance sizes • Will hit an endpoint eventually hi1.4xlarge m2.4xlarge m1.small
  • 14. Day One, User One • We could potentially get to a few hundred to a few thousand depending on application complexity and traffic • No failover • No redundancy • All our eggs in one basket Amazon Route 53 User Elastic IP EC2 Instance
  • 15. Day One, User One: • We could potentially get to a few hundred to a few thousand depending on application complexity and traffic • No failover • No redundancy • All our eggs in one basket Amazon Route 53 User Elastic IP EC2 Instance
  • 16. Day Two, User >1 First let’s separate out our single host into more than one. • Web • Database • Make use of a database service? Amazon Route 53 User Elastic IP EC2 Instance Database Instance
  • 17. Database Options Self-managed Fully Managed Database Server on Amazon EC2 Amazon RDS Amazon DynamoDB Amazon Redshift Your choice of database running on Amazon EC2 Microsoft SQL, Oracle or MySQL as a managed service Managed NoSQL database service using SSD storage Massively parallel, petabyte-scale, data warehouse service Bring Your Own License (BYOL) Flexible licensing – BYOL or license included Seamless scalability Fast, powerful and easy to scale Zero administration
  • 18. But how do I choose what DB technology I need? SQL? NoSQL? Not a binary decision!
  • 19. User >100 First let’s separate out our single host into more than one • Web Amazon Route 53 User Elastic IP • Database • Use RDS to make your life easier EC2 Instance RDS DB Instance
  • 20. User > 1000 User Next let’s address our lack of failover and redundancy issues Amazon Route 53 Elastic Load Balancing • Elastic Load Balancing • Another web instance • In another Availability Zone Web Instance Web Instance RDS DB Instance Active (Multi-AZ) RDS DB Instance Standby (Multi-AZ) Availability Zone Availability Zone • Enable Amazon RDS multi-AZ
  • 21. Elastic Load Balancing • Create highly scalable applications Elastic Load Balancer • Distribute load across EC2 instances in multiple Availability Zones Feature Available Health checks Session stickiness Secure sockets layer Monitoring Details Load balance across instances in multiple Availability Zones Automatically checks health of instances and takes them in or out of service Route requests to the same instance Supports SSL offload from web and application servers with flexible cipher support Publishes metrics to CloudWatch
  • 22. Scaling this horizontally and vertically will get us pretty far (10s-100s of thousands)
  • 23. User >10 ks–100 ks Amazon Route 53 User Elastic Load Balancing Web Instance Web Instance Web Instance RDS DB Instance RDS DB Instance Read Replica Read Replica Availability Zone Web Instance RDS DB Instance Active (Multi-AZ) Web Instance Web Instance RDS DB Instance Standby (Multi-AZ) Web Instance RDS DB Instance Read Replica Availability Zone Web Instance RDS DB Instance Read Replica
  • 24. This will take us pretty far honestly, but we care about performance and efficiency, so let’s clean this up a bit
  • 25. Shift Some Load Around Let’s lighten the load on our web and database instances: Amazon Route 53 User Amazon CloudFront • Move static content from the web Instance to Amazon S3 and CloudFront • Move dynamic content from the Elastic Load Balancing to CloudFront • Move session/state and DB caching to ElastiCache or Amazon DynamoDB Elastic Load Balancer Amazon S3 Web Instance ElastiCache RDS DB Instance Active (Multi-AZ) Availability Zone Amazon DynamoDB
  • 26. Working with S3 – Amazon Simple Storage Service • Object-based storage for the web • Supports fine grained permission control • 11 9s of durability • Ties in well with CloudFront • Good for things like: • Ties in with Amazon EMR • Static assets ( css, js, images, videos ) • Acts as a logging endpoint for Amazon S3, • Backups CloudFront, Billing • Supports encryption at transit and at rest • Reduced redundancy 1/3 cheaper • Amazon Glacier for super long term storage • Logs • Ingest of files for processing • “Infinitely scalable”
  • 27. Amazon CloudFront Amazon CloudFront is a web service for scalable content delivery. CDN for Static CDN for Static & Content No CDN Dynamic Content • Cache static content at the edge for faster delivery • Helps lower load on origin infrastructure • Dynamic and static content Server Load Response Time Server Load • Low TTLs ( as short as 0 seconds ) Response Time • Custom SSL certificates Server Load • Zone apex support Response Time • Streaming video • Lower costs for origin fetches ( between Amazon 80 Volume of Data Delivered (Gbps) S3/EC2 and CloudFront ) • Optimized to work with EC2, Amazon S3, Elastic Load Balancing, and Route53 70 60 50 40 30 20 10 0 8:00 AM 9:00 10:00 11:00 12:00 1:00 AM AM AM PM PM 2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM 8:00 PM 9:00 PM
  • 28. Shift Some Load Around Let’s lighten the load on our web and database instances • Move static content from the web instance to Amazon S3 and CloudFront • Move dynamic content from the Elastic Load Balancing to CloudFront • Move session/state and DB caching to ElastiCache or Amazon DynamoDB Amazon Route 53 User Amazon CloudFront Elastic Load Balancing Amazon S3 Web Instance ElastiCache RDS DB Instance Active (Multi-AZ) Availability Zone Amazon DynamoDB
  • 29. Amazon DynamoDB • Provisioned throughput NoSQL database • Fast, predictable performance • Fully distributed, fault-tolerant Feature Details Provisioned throughput Predictable performance Dial up or down provisioned read/write capacity Average single-digit millisecond latencies from SSD-backed infrastructure architecture • Considerations for non-uniform data Strong consistency Be sure you are reading the most up to date values Fault tolerant Data replicated across Availability Zones Monitoring Secure Elastic MapReduce Integrated to CloudWatch Integrates with AWS Identity and Access Management (IAM) Integrates with Amazon Elastic MapReduce for complex analytics on large datasets
  • 30. Now that our Web tier is much more lightweight, we can revisit the beginning of our talk…
  • 31. Auto Scaling!
  • 32. Auto Scaling Amazon CloudWatch Automatic resizing of compute clusters based on demand Trigger autoscaling policy Feature Details Control Define minimum and maximum instance pool sizes and when scaling and cool down occurs. Integrated to Amazon CloudWatch Use metrics gathered by CloudWatch to drive scaling. Instance types Run Auto Scaling for On-Demand and Spot Instances. Compatible with VPC. aws autoscaling create-autoscaling-group — Auto Scaling-group-name MyGroup — Launch-configuration-name MyConfig — Min size 4 — Max size 200 — Availability Zones us-west-2c
  • 33. Typical Weekly Traffic to Amazon.com Sunday Monday Tuesday Wednesday Thursday Friday Saturday
  • 34. Typical Weekly Traffic to Amazon.com Provisioned Capacity Sunday Monday Tuesday Wednesday Thursday Friday Saturday
  • 35. November Traffic to Amazon.com November
  • 36. November Traffic to Amazon.com Provisioned Capacity November
  • 37. November Traffic to Amazon.com 76% Provisioned Capacity November 24%
  • 38. November Traffic to Amazon.com November
  • 39. Auto Scaling lets you do this!
  • 40. User >500k+ Amazon Route 53 User Amazon Cloudfront Elastic Load Balancing Web Instance Web Instance Web Instance Amazon S3 Web Instance Web Instance Web Instance DynamoDB RDS DB Instance RDS DB Instance Active (Multi-AZ) Read Replica Availability Zone ElastiCache RDS DB Instance RDS DB Instance Standby (Multi-AZ) Read Replica Availability Zone ElastiCache
  • 41. Want to go beyond 500K users? Let me introduce an AWS customer that has done that…
  • 42. 42
  • 43. Myriad Overview Myriad delivers consumer applications, messaging solutions, and embedded software to leading OEMs, mobile operators, and pay TV providers worldwide. Messaging Services – Instant messaging to tens of millions of users in Latin America – Access to social networking applications via USSD on feature phones in Latin America, Asia and Africa – Working with some of the world’s most successful mobile phone operators. Java – Long established business providing Java VM for embedded devices Connected Home Solutions – Found in tens of millions of set-top boxes and Blu-ray players. – Driving the convergence of mobile, tablets and TV through creation of a suite of exciting connected home solutions. Worldwide operation with 200 people located in UK, France, Mexico, Brazil, India & China. 43
  • 44. Myriad Messenger Latin American chat service launched in 16 markets 41M users through current carrier partnerships Service platform and clients undergoing redevelopment to support rapid growth with complementary carrier and OTT services The challenge – Build a new platform that will deliver superior performance, features and scale in a short time with a small team 44
  • 45. High level platform overview user service Platform is Java addressbook service All services developed as OSGi bundles Apache Felix OSGi container WAP service OSGi allows us to separate functional interfaces from implementations dynamically – We can quickly and painlessly swap out implementations if they don’t perform or scale We have a single design pattern for all system components message service REST API servers message queue handlers message queue MSNGR clients routing service SMS aggregator message notification We have developed OSGi services over AWS 45 notification queue Amazon cloud messaging
  • 46. AWS Application Services Each of our persistent services runs over DynamoDB – Gives us unlimited scale, predictable latency and the ability to individually tune the IO requirements to suit the service (e.g. address book needs less IO than message service, and greater read capacity than write) We make extensive use of SQS to decouple system components and allow us to – scale more effectively – flatten daily system usage patterns – give clients a very fast response to the REST API SNS for internal message routing Elasticache provides caches for performance (and cost reduction) over system hotspots S3 for storage of media, extended message bodies, branding packages, code deployments EC2/Cloudwatch/Auto scaling for flexible processing on demand 46
  • 47. Thank You Read! • aws.amazon.com/documentation • aws.amazon.com/architecture • aws.amazon.com/start-ups Listen! • aws.amazon.com/podcast Ian Massingham IanMmmm Technical Evangelist, Amazon Web Services Feb, 2014