O'Reilly Webcast: Architecting Applications For The Cloud


Published on

This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional data center and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Presented by Jorge Noa, CTO of Hyperstratus

Published in: Technology

O'Reilly Webcast: Architecting Applications For The Cloud

  1. 1. Architecting Applications for the Cloud Jorge Noa CTO, HyperStratus Jorge.Noa@HyperStratus.com v7 Copyright 2009 HyperStratus
  2. 2. About HyperStratus • Silicon Valley-based cloud computing consultancy • Founded by executives with deep experience in corporate IT, enterprise software, and global consultancy • We assist clients in establishing cloud computing strategies, cloud application architectures, system selection and implementations • We also provide cloud computing training and workshops
  3. 3. Introduction to Cloud Architecture
  4. 4. What is the Cloud? UC Berkeley RAD Lab Definition The illusion of infinite computing resources available on Huge demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning Resources The elimination of an up-front commitment by Cloud No users, thereby allowing companies to start small and increase hardware resources only when there is an Commitment increase in their needs The ability to pay for use of computing resources on a Pay by the short-term basis as needed (e.g., processors by the hour and storage by the day) and release them as Drink needed
  5. 5. Key Cloud Benefits IT agility as systems can be sized to meet demand -- Huge as load scales, system resources are easily obtained Resources to ensure SLAs can be met No No longer face the tradeoff between overprovisioning (waste of capital) and underprovisioning (waste of Commitment users) Move IT payments from CAPEX to OPEX. Pay only for Pay by the actual resources consumed. Tie IT cost to business Drink benefit received
  6. 6. Cloud Service Categories • Infrastructure as a Service (Iaas) – Amazon EC2 – GoGrid – Eucalyptus • Platform as a Service (PaaS) – Google AppEngine (Python, Java) – Windows Azure (.Net) • Software as a Service (Saas) – Salesforce.com – Gmail
  7. 7. How the Cloud is Delivered More Less Structured Control Public Cloud -- SaaS Public Cloud -- PaaS Private Cloud -- IaaS Less More Structured Public Cloud -- IaaS Control
  8. 8. IaaS Cloud Providers Amazon (AWS) GoGrid CohesiveFT (VPN Cubed) Rackspace Amazon VPC (IPsec VPN) Public Virtual Private Cloud Public Cloud Internal Private Cloud External Private Cloud IBM HP Private Cisco/VMware Terremark HP (EDS) Microsoft AT&T 3Tera IBM Eucalyptus Isolated Shared
  9. 9. Cloud Application Example • Grows from 1MM to 100+ MM insurance claims/day in one week • Traditional solution: $750K new hardware + $30K/month maintenance/hosting • Cloud solution: $600/month Amazon Web Services
  10. 10. Cloud Taxonomy Source: Christofer Hoff, Cloud Security Alliance “Security Guidance for Critical Areas of Focus in Cloud Computing,” Page 22 •Foundation of cloud is virtualization •Upper cloud services are incremental to lower cloud services •Lower level services are key for higher level services
  11. 11. IaaS/Paas in Detail Components Providers Adapted: Christofer Hoff, “The Frogs Who Desired a King” Adapted: Christofer Hoff, “The Frogs Who Desired a King” • Amazon AWS EC2 is an IaaS environment with RESTful Web Services API to allocate & manage resources
  12. 12. IaaS/PaaS in Detail Components Providers Adapted: Christofer Hoff, “The Frogs Who Desired a King” • AWS SQS, SimpleDB, and CloudFront are PaaS Middleware • Google AppEngine and Microsoft Azure are PaaS AppServers
  13. 13. Basic Amazon AWS Concepts and Considerations
  14. 14. Amazon Web Services • Elastic Compute Cloud – EC2 (IaaS) • Simple Storage Service – S3 (IaaS) • Elastic Block Storage – EBS (IaaS) • SimpleDB (SDB) (PaaS) • Simple Queue Service – SQS (PaaS) • CloudFront (S3 based Content Delivery Network – PaaS) • Consistent AWS Web Services API
  15. 15. IaaS Taxonomy : AWS Components • Images - S3 “Gold-Master” VM Images • Compute - EC2 Instance Types • Storage - Default Local Disks, EBS, S3 • Network – Regions, Availability Zones, Virtual NICs • IPAM/DNS – Internet Protocol Address Management – Domain Name System • Security – Network Firewalls – S3 file ACLs
  16. 16. IaaS Taxonomy : AWS Components (cont) • IAM/Auth – (Identity Access Mgmt) AWS Credentials & X.509 Certificates • VMM – (Virtual Machine Mgmt) Self-Discovery, Auto- Configuration • LB & Transport – (Load Balancing) AWS Auto-Scaling • API – Web API, Command-Line Tools • Mgmt - AWS Mgmt Console, Firefox Elasticfox plug-in
  17. 17. PaaS Taxonomy : AWS Components • Messaging/Queuing – Simple Queue Service (SQS) • Database – SimpleDB (SDB)
  18. 18. IaaS Network Component : EC2 Regions & Zones • Amazon EC2 locations are composed of Regions which contain Availability Zones. • Regions consist of one or more Availability Zones, are geographically dispersed in separate geographic areas or countries – Currently only two Regions: “us-east-1”, “eu-west-1” • Availability Zones are distinct datacenter locations that are engineered to be insulated from failures in other Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region – E.g. “us-east-1a”, “us-east-1b”, …
  19. 19. IaaS Network Component : EC2 Regions & Zones (cont) • Traffic between Availability Zones in a single region is on AWS-controlled redundant infrastructure • All traffic between Regions is across a multiple Tier-1 Public Internet infrastructure
  20. 20. IaaS Image Component: EC2 and AMIs • EC2 provides elastic computing capacity. • EC2 instances provide empty virtual machines into which users install desired software assets: operating system, middleware platforms, configuration files and application(s). • EC2 instantiates the collection of these running instance assets as an “Amazon Machine Image” (AMI). • An AMI is digitally signed and encrypted by the owner using private x.509 key. AWS has a copy of the corresponding public X.509 certificate for decrypting an AMI at EC2 Instance “launch” time • An AMI is equivalent to a “Gold Master” image of the configured VM in an EC2 instance • Multiple EC2 instances can be launched from the same AMI
  21. 21. IaaS Image Component : S3 and AMIs • S3 File storage – Reliable web accessible file-based storage using a special file name syntax – “<bucket>/<folder>/<filename>” • EC2 AMIs are stored in S3 as a “bundle” of segmented 10MB files and EC2 VM instances are instantiated (launched) from their S3 AMI. • Users can create their own AMIs from scratch (P2V); use pre-built public AMIs; or use a pre-built AMI as a starting point and then add custom software assets to finalize the desired AMI. • Updating an EC2 AMI requires a full “bundling” process and results in a second AMI, different than the original one.
  22. 22. IaaS Compute Component: AWS EC2 • EC2 is based upon Xen Hypervisor (with significant constraints) • 1 EC2-CU = CPU capacity of 1.0-1.2 GHz 2007 Opteron or 2007 Xeon • Compute capacity is defined at granular levels – I.e Number of CPU Cores and “Compute Units” per core (1 core @ 1CU up to 8 cores @2.5 CU) • Virtual Memory ranges are 1.7GB, 7.5GB and 15GB depending on instance type • Default quota of 20 VM instances per account
  23. 23. IaaS Compute Component : EC2 Compute Unit • Several AWS benchmarks and tests manage the consistency and predictability of the performance of an EC2 Compute Unit • Over Time, there may be several different types of physical commodity hardware underlying EC2 instances, but EC2-CU performance should remain constant
  24. 24. IaaS Storage Component : EC2, EBS, S3 • EC2 Instance Default Local Storage – ephemeral virtual disks that are integral part of EC2 VM instance – Range from 170GB to 1.8TB total space, 1 to 5 disks • Elastic Block Storage – EC2 Additional persistent disk volumes that can be attached and mounted on a running VM. – 1TB max per volume, default quota of 20 volumes • S3 File storage – Reliable web accessible file- based storage. – 5GB max per file
  25. 25. IaaS Storage Component : EBS • An EBS volume is created in a user specified AWS Availability Zone. • AWS equivalent of a local SAN RAID Disk and can only be attached to one running EC2 instance at a time in the same Zone • Appears to running OS VM as standard disk drive • Must be partitioned and/or formatted with file system before being mounted • Higher reliability, lower latency and higher throughput than than Instance Default Storage • Supports live snapshots to S3
  26. 26. IaaS Storage Component : S3 • S3 File storage – Reliable web accessible file storage (s3.amazonaws.com). ( • Buckets are created in user assigned Regions (“us-east-1”, “eu-west-1”) • Unlimited number of index folders and files (i.e. objects) per bucket, 5GB max per file • Files in a bucket are replicated to geographically dispersed Zones in the bucket’s Region
  27. 27. IaaS Storage Component : EC2 Ephemeral Storage Notes • All Default Local instance storage devices (I.e. non- EBS EC2 volumes) are ephemeral and all data on them is lost when the instance is terminated (or crashes and cannot be rebooted). Use S3, EBS, or SDB for permanent data. • Analogous to the file system lifecycle of a Linux Live-CD that uses RAM drives • However, default instance storage data is retained on reboot. • This is a major EC2 constraint that must be taken into consideration in an application’s design.
  28. 28. EC2 Dynamic Data : Typical S3 Usage Pattern
  29. 29. EC2 Dynamic Data : Typical EBS Usage Pattern
  30. 30. IaaS Network Component : EC2 Virtual NIC • Each EC2 Instance has only one Virtual NIC that is assigned a dynamic EC2 MAC Address and internal private IP Address • AWS VM Prevents network cross-talk among users • No visibility beyond individual machine NIC traffic -- even among correlated machines in the same application configuration • Communicating within multi-tier VM configurations typically involves dynamic DNS server registration
  31. 31. IaaS IPAM/DNS Component : EC2 IP Addresses & DNS • No customer control of initial VM IP Address or DNS name assignments • EC2 routers map two IP addresses to the EC2 Instance • dynamic EC2 Private Address (RFC-1918, e.g. 10.x.x.x) • dynamic EC2 Public Address using Network Address Translation (NAT) (Note: public address range belongs to AWS) • IP Address is a component of the DNS name • Up to 5 fixed public Elastic-IP Addresses and DNS names can be pre-allocated for an AWS account and later assigned to a running EC2 instance.
  32. 32. IaaS Security Component : EC2 Security Groups & ACLs • EC2 Security Groups function as network firewall configurations. – A Security Group is a named collection of incoming network traffic rules for an EC2 account. • Access to each S3 file is controlled by its own Access Control List (ACL). – ACL allows READ, WRITE, and FULL CONTROL (includes access to ACL) privileges on: • “Everyone” • “Authenticated Users” (only valid AWS users) • A list of individual AWS users or groups
  33. 33. PaaS Messaging/Queuing Component : AWS SQS • Highly Reliable Message Queuing Service with built-in redundancy within user assigned Regions • Messages accessible from anywhere via Web API • Up to 8 KB of Unicode data per message • Messages can be retained in queues for up to 4 days • Messages can be sent and read simultaneously but FIFO not guaranteed • Queues can be securely shared with other AWS accounts and Anonymously. Queue sharing can also be restricted by IP address and time-of-day.
  34. 34. PaaS Database Component : AWS SimpleDB US Beta • Enhanced ISAM-like database service • Simple web services interface to create and store multiple data sets and query your data • Data is automatically indexed • Data stored in US-east-1 Region (Beta restriction) and automatically replicated to geographically dispersed Zones • Requests originating from an application running in same Amazon Region will have near-LAN latency.
  35. 35. PaaS Database Component : AWS SimpleDB US Beta (cont) • Similar to MyISAM with enhanced features – No SQL grammar support – No table JOIN – Simple WHERE criteria • 100 domains (tables) quota per account, max 10GB per domain, max 256 attributes (columns) per row, max 1KB data per attribute (cell) • Typically used to store App logs, EC2 Instance configurations, Application state, Instance status, analytics, indexes to S3 data • Scale-out is as simple as creating new domains, rather than building out new servers.
  36. 36. AWS Cloud Application Design
  37. 37. Cloud App Design Attributes Abstract Focus on your needs, not on hardware specs. As Resources your needs change, so should your resources. On-Demand Ask for what you need, exactly when you need it. Provisioning Get rid of it when you don’t need. Design should allow for resources to scale up or Scalability down depending on usage needs. No contracts or long-term commitments. No Up-Front Costs Pay only for what you use but design for the possibility of enhanced resource usage. Each machine instance must be capable of Dynamism dynamically identifying its configuration and relationship to other resources in the system.
  38. 38. Cloud Application Design: 10 Best Practices 1. Build cloud apps, not apps in the cloud 2. Virtualize the application stack 3. Design for failure and nothing fails 4. Design for scalability 5. Loose coupling lets you maximize plug&play 6. Design for dynamism 7. Build Security into every component 8. Leverage native cloud storage options 9. Leverage best cloud Management Tools 10. Don't fear cloud constraints
  39. 39. Best Practices: Don’t Just Build apps in the cloud Business tier Web Tier Load Balancer Back- Back- up up Source: GigaSpace, Back-up Back-up “Practical Guide for Developing Enterprise Application on the Cloud” Data Tier Messaging • Don’t simply port traditional Apps to the Cloud • Traditional Application Stacks are architected in functional silos • Each silo has its own machines, network, management, and support
  40. 40. Build Cloud Apps: Virtualize the Application Stack Web Business Processing Processing Units Units Load Balancer Users DB Source: GigaSpace, “Practical Guide for Developing Enterprise Application on the Cloud” • Re-factor to use standardized VM containers. Each instance should use self-discovery, be self configurable, and network independent • Use cloud standardized Messaging & DB when possible • Leverage inherent EBS replication and snapshots for DBMS
  41. 41. Build Cloud Apps: Compensate for Ephemeral Storage • EC2 instance default storage can only be used for transient data (e.g. intermediate or temp data files). Don’t use it for archival data logs such as login logs or error dumps. – Consider using SDB to store persistent archival data records that can be associated with a key (e.g. timestamp) • If OK to recover only from most recent backup, consider restoring data from S3 at boot-up and backing-up current data to S3 at shutdown. • If not OK, use EBS attached volumes for all persistent file data. • DBMS should always use EBS volumes
  42. 42. Build Cloud Apps: Compensate for Ephemeral Storage (cont) • Consider using soft-links (Linux) to map portions of the ephemeral Default Storage application file tree to persistent EBS volumes – This can be used for archival data logs such as login logs or error dumps (.i.e /var/logs/ files can be soft linked to EBS volume). • If only small chunks of persistent storage is needed for each Instance, consider using EBS volumes exported on EC2 NFS servers.
  43. 43. Build Cloud Apps: Compensate for Dynamic IP Address • Attach ElasticIP for Internet-facing EC2 instances (e.g. the HAProxy load-balancer instance) • Use dynamic DNS CNAME registration of EC2 instance internal IP address or use SDB • EC2 instances should only use the internal IP address for communicating with each other (free!).
  44. 44. Best Practices: Design for Failure • "Everything fails, all the time“, Werner Vogels, CTO Amazon.com • Avoid single points of failure • Assume everything fails, and design backwards • Design for failure and your App won’t fail
  45. 45. Design for Failure: What Can Fail in AWS? • The EC2 Instance may crash • Portions of Zone may not be accessible (i.e. internal network problem within Zone) – EC2 Instance in a Zone may not be launch-able – EBS volumes in a Zone may not be accessible • AWS Services in a Region may not be accessible (very low probability) – S3 buckets in Region may not be accessible – SDB domains (tables) in a Region may not be accessible – SQS Queues in a Region may not be accessible
  46. 46. Design for Failure: Use Failure Tolerant Features • Use Elastic IP addresses (or their DNS names) for consistent and re-mappable routes • Use multiple EC2 Availability Zones • Use EBS for persistent file systems and snapshots. – Snapshots can be used to restore EBS volumes on other Zones – Use Rsync for real-time synchronization of RBS volumes across Zones • Create multiple DBMS slaves across Availability Zones • Use real-time monitoring (Amazon CloudWatch or RightScale)
  47. 47. Best Practices: Design for Scalability • A scalable architecture is critical to take advantage of a scalable infrastructure • Characteristics of Truly Scalable Service – Increasing resources results in a proportional (linear) increase in performance – A scalable service is capable of handling heterogeneity – A scalable service is operationally efficient – A scalable service is resilient – A scalable service becomes more cost effective when it grows
  48. 48. Design for Scalability: Linear Performance Increase • E.g. Doubling EC2 instances doubles performance (doubles throughput while maintaining same response time) – Minimize centralized locks • No central point of data storage contention – Shared Nothing – Sharding – Distributed Caching • Loose coupling of processing requestors and responders
  49. 49. Design for Scalability : Use AWS Elastic Features • Use Load Balancing on multiple layers: either your own (e.g. HAProxy EC2 instance) or AWS Elastic Load Balancing • Use Cloud monitoring systems: either your own (e.g. CollectD) or AWS CloudWatch • Use Auto-scaling technology (Free with CloudWatch)
  50. 50. Best Practices: Build Loosely Coupled Systems • Use Independent components • Design everything as a Black Box with well defined inputs and outputs • Use subsystem de-coupling for Hybrid models • Use Load-balanced clusters of Black Boxes to maximize plug&play
  51. 51. Loose Coupling: Use Message Queues Controller Controller Controller Tight Coupling A B C Loose Coupling Q Q Q using Queues Controller Controller Controller Controller A Controller B Controller C Controller A Controller B Controller C A B C • Use MQueue system such as Amazon SQS or Gearman to pass along requests • Each message queue consumer can be a cluster of EC2 instances
  52. 52. Best Practices: Design for Dynamism • Don’t assume health or fixed location of components • Use designs that are resilient to reboot and re- launch • Bootstrap your instances based on self-discovery (E.g. EC2 Metadata API) – Store configurations in SimpleDB to bootstrap instances • Enable dynamic configuration – Store application, subsystem, and EC2 instance state in SimpleDB so instances can know health of system
  53. 53. Best Practices: Security in every component • Use de-perimiterized security model • Create distinct network Security Groups for each Amazon EC2 instance cluster • Use group-based network rules for controlling access between components • Restrict external access to specific IP ranges • Encrypt data “at-rest” in Amazon S3 • Encrypt data “in-transit” (SSL) • Consider encrypted EBS file systems for sensitive data
  54. 54. Best Practices: Leverage Storage Solutions • Amazon S3: large static objects • Amazon CloudFront: content distribution • Amazon SimpleDB: simple data indexing/querying • Amazon EC2 local disc drive : transient data • Amazon EBS: RDBMS persistent storage + S3 Snapshots
  55. 55. Best Practices: Leverage Best AWS Mgt Tools • Management of any but the simplest cloud application configurations is very cumbersome without advanced tools. • RightScale is a script-based instance provisioning, monitoring, & auto-scaling system – Supports collaborative sharing & reuse of scripts • Kaavo Infrastructure & Middleware On Demand (IMOD) is an “Application Centric Management System” – manages a multitier cloud application system as though it were a monolithic application
  56. 56. Best Practices: Don't fear cloud constraints • Think “out of the box” and leverage cloud features to solve EC2 constraints • Not enough EC2 instance RAM? – Distribute load across machines – Try shared distributed cache • Components use Static IP addresses? – Boot script for software reconfiguration from SimpleDB or use DNS CNAME • Local data center DBSM has better IOPS? – Try multiple read-only / sharding / DB clustering
  57. 57. AWS Management Tools
  58. 58. AWS Management Tools: Basic Tools • Amazon native AWS tools only leverage basic AWS API capability – AWS Management Console • Firefox plugins are slightly more advanced – Elasticfox – EC2 Instance, EBS, EIP management – S3 Organizer – S3 file upload/download (similar to ftp plugin) • CloudBerry Explorer – Windows S3 file upload/download application, slightly better than S3 Organizer
  59. 59. AWS Management Tools: Ideal Advanced Tools • Attaching EBS volumes, EIPs, and other resources should be scripted and managed by “Cloud Deployment & Mgmt System” (CDMS) • CDMS should incorporate standards-based Performance Monitoring services • Should incorporate standards-based Event Notification services • Should incorporate Auto-scaling configuration services as remediation of Performance/Load Events • CDMS should incorporate Administrator Collaboration allowing sharing and partitioning of admin responsibilities
  60. 60. AWS Management Tools: Ideal Advanced Tools (cont) • CDMS Should allow for automated provisioning of EC2 instances • Should allow sharing of scripts and launch/terminate of instances based on group roles or at least read/write/execute rights. • Should allow for re-use generalized scripts • Should allow for auto-scaling based on dynamic load evaluation functions • CDMS should support escalating event notification to groups of users. – Should have interfaces to other EMS (e.g. Nagios)
  61. 61. AWS Management Tools: RightScale • Script-based instance provisioning, monitoring, & auto-scaling system • Manages complex deployments involving multiple instance clusters • Re-use of version-controlled scripts in different deployments • Full automation of auto-scaling, remediation, notification and automatic configuration • Cloud application developer and administrator collaboration framework
  62. 62. RightScale Provisioning Pattern Adapted: 2009 CummunityOne West Conference: “Practical Cloud Computing Patterns” • RightScale proxy server uses modified Push Pattern – “Boot Finished” event triggers automated “provisioning commands” sequence
  63. 63. RightScale Lifecycle Mgmt Pattern • RightScale uses an Injection Pattern to push individual command scripts into a running EC2 instance or an entire deployed cluster of instances • Boot Scripts are automatically run at Instance Launch after OS “boot_finished” event • Operational Scripts are run during automated Event Handling or manual operations • Decommissioning Scripts are automatically run prior to Instance Termination
  64. 64. Current RightScale Cloud Service Monitoring Pattern Source: 2009 CummunityOne West Conference: “Practical Cloud Computing Patterns” • Based on collectd framework
  65. 65. Native AWS CloudWatch Source: 2009 CummunityOne West Conference: “Practical Cloud Computing Patterns” • RightScale will likely eventually incorporate CloudWatch
  66. 66. AWS Management Tools: Kaavo IMOD • Kaavo Infrastructure & Middleware On Demand “Application Centric Management System” • Proxy server manages complex multitier cloud application system as if it were a monolithic application via IMOD System Definitions • Quickstart Kaavo provides out of the box System Definitions for deploying popular multi-tier HA infrastructure: • Ruby on Rails, LAMP, Tomcat, Jboss • IMOD workflow engine monitors application run-time state events and responds dynamically with user customized Event Workflows (e.g. MySQL scale-up/scale-down)
  67. 67. Kaavo IMOD : Source: Kaavo IMOD Data Sheet • IMOD Engine monitors events and responds with Work Flows
  68. 68. Q&A : More Resources • www.hyperstratus.com – White Paper: “Migrating Applications to the Cloud: An Amazon Web Services Case Study” – Cloud Computing Workshops (via Unitek Education) – Jorge.Noa@hyperstratus.com