AWS Sydney Summit 2013 - Architecting for High Availability

2,302 views

Published on

Session 3, Presentation 6 from the AWS Sydney Summit

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
  • it's a very good doc.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,302
On SlideShare
0
From Embeds
0
Number of Embeds
408
Actions
Shares
0
Downloads
0
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide
  • HA means different things to different people so let’s agree on some fundamental definitions.HA is also implemented differently based on app architecture and workload.Does HA mean that the app is simply alive or reachable? Or that is servicing requests within an acceptable level of performance.Typically higher HA % means more cost.The higher the level of HA, the less likely human intervention is possible.http://en.wikipedia.org/wiki/High_availability
  • Scalability is important to availability. If an application cannot handle growth, then it will be overwhelmed and will affect availability. But a scalable app doesn’t guarantee HA.
  • Monitoring typically uses a combination of systems:
  • 1) SQS Buffers building up2) Launching transcoding3) Overprovisioning to catch up4) Back to normal
  • 1) SQS Buffers building up2) Launching transcoding3) Overprovisioning to catch up4) Back to normal
  • 1) SQS Buffers building up2) Launching transcoding3) Overprovisioning to catch up4) Back to normal
  • 1) SQS Buffers building up2) Launching transcoding3) Overprovisioning to catch up4) Back to normal
  • Writing a decider requires you to review the state of the workflow. The decider itself is stateless but SWF keeps the state and tells the decider about what has happened.[Point out that a decider can return several decisions in the same call. This allows for parallel processing.]To write workers and deciders you can use the SWF SDK (provided for Java, .NET, PHP) or call the API directly, but to make this easier [CUE NEXT SLIDE]
  • Writing a decider requires you to review the state of the workflow. The decider itself is stateless but SWF keeps the state and tells the decider about what has happened.[Point out that a decider can return several decisions in the same call. This allows for parallel processing.]To write workers and deciders you can use the SWF SDK (provided for Java, .NET, PHP) or call the API directly, but to make this easier [CUE NEXT SLIDE]
  • AWS Sydney Summit 2013 - Architecting for High Availability

    1. 1. Joseph ZieglerArchitecting for High AvailabilityAWS Technical Evangelist @jiyosubAlexander CourtisSolutions ArchitectSilverQuest ConsultingGuest presenter:
    2. 2. High Availability PrinciplesDesign for reliable, affordable, fault-tolerant systemsthat operate with a minimal amount of humaninteraction from day one
    3. 3. Agenda• Objective– Review services and approaches to build a highly available architecture on AWS• Sections– High Availability Overview– Relevant AWS Features and Services– Principles in Practice• Customer Case Study– Carsguide
    4. 4. Agenda• Objective– Review services and approaches to build a highly available architecture on AWS• Sections– High Availability Overview– Relevant AWS Features and Services– Principles in Practice• Customer Case Study– Carsguide
    5. 5. 55What is High Availability (HA)?• Availability: Percentage of time an application operates during its work cycle.• Loss of availability is known as an outage or downtime.– App is offline, unreachable or partially available.– App is slow to use.– Planned and unplanned.• Goal– No downtime.– Always available.
    6. 6. 66HA is related to …• Scalability– Ability of a application to accommodate growth without changing design.– If app cannot scale, then availability will be impacted.– Scalability doesn’t guarantee availability.• Fault Tolerance– Built-in redundancy so apps can continue functioning when components fail.– FT is crucial to HA.• Disaster Recovery– The process, policies and procedures related to restoring service after a catastrophicevent.
    7. 7. 77Automation• “Everything is an API” philosophy enables automation of AWS resources.• AWS is literally a programmable data center.• Provisioning resources is a web service call away.• Many different ways to automate:– AWS CloudFormation– Numerous SDKs: Java, .NET, Python, Ruby, PHP– Command line tools• Automation is one of the key differentiators between AWS and traditionalinfrastructure.• Automation assists with HA.
    8. 8. Agenda• Objective– Review services and approaches to build a highly available architecture on AWS• Sections– High Availability Overview– Relevant AWS Features and Services– Principles in Practice• Customer Case Study– Carsguide
    9. 9. AWS GLOBALINFRASTRUCTURE
    10. 10. US-WEST (Oregon)EU-WEST (Ireland)ASIA PAC (Tokyo)ASIA PAC(Singapore)US-WEST (N. California)SOUTH AMERICA (Sao Paulo)US-EAST (Virginia)GOV CLOUDASIA PAC (Sydney)
    11. 11. US-WEST (Oregon))EU-WEST (Ireland)ASIA PAC (Tokyo)ASIA PAC(Singapore)US-WEST (N. California)SOUTH AMERICA (Sao Paulo)US-EAST (Virginia)GOV CLOUDASIA PAC (Sydney)
    12. 12. AWS BUILDING BLOCKSInherently Highly Available andFault Tolerant ServicesHighly Available withthe right architecture Amazon S3 Amazon DynamoDB Amazon CloudFront Amazon Route53 Elastic Load Balancing Amazon SQS Amazon SNS Amazon SES Amazon SWF … Amazon EC2 Amazon EBS Amazon RDS Amazon VPC
    13. 13. 1313Relevant Features of AWS• Leverage FT services whenever possible.• Use multiple AZs• Use abstract machine and system representations– Build images from recipes, stacks from CloudFormation• Implement elasticity– Bootstrapping, load balancing, Auto Scaling, etc…– Instance asks: “Who am I and what is my role?”
    14. 14. Agenda• Objective– Review services and approaches to build a highly available architecture on AWS• Sections– High Availability Overview– Relevant AWS Features and Services– Principles in Practice• Customer Case Study– Carsguide
    15. 15. Principles of HA1. DESIGN FOR FAILURE2. MULTIPLE AVAILABILITY ZONES3. SCALING4. SELF-HEALING5. LOOSE COUPLING
    16. 16. LET’S BUILD AHIGHLY AVAILABLESYSTEM
    17. 17. #1DESIGN FOR FAILURE
    18. 18. « Everything failsall the time »Werner VogelsCTO of Amazon
    19. 19. AVOID SINGLE POINTS OF FAILURE
    20. 20. AVOID SINGLE POINTS OF FAILUREASSUME EVERYTHING FAILS,AND WORK BACKWARDS
    21. 21. YOUR GOALApplications should continue to function
    22. 22. AMAZON EBSELASTIC BLOCK STORE
    23. 23. AMAZON ELBELASTIC LOAD BALANCING
    24. 24. HEALTH CHECKS
    25. 25. #2MULTIPLEAVAILABILITY ZONES
    26. 26. AMAZON RDSMULTI-AZ
    27. 27. AMAZON ELB ANDMULTIPLE AZs
    28. 28. #3SCALING
    29. 29. AUTO SCALINGSCALE UP/DOWN EC2 CAPACITY
    30. 30. #4SELF-HEALING
    31. 31. HEALTH CHECKS+AUTO SCALING
    32. 32. HEALTH CHECKS+AUTO SCALING=SELF-HEALING
    33. 33. #5LOOSECOUPLING
    34. 34. BUILD LOOSELYCOUPLED SYSTEMSThe looser they are coupled,the bigger they scale,the more fault tolerant they get…
    35. 35. AMAZON SQSSIMPLE QUEUE SERVICE
    36. 36. PUBLISH&NOTIFYRECEIVE TRANSCODE
    37. 37. PUBLISH&NOTIFYRECEIVE TRANSCODE
    38. 38. CLOUDWATCH METRICSFOR AMAZON SQS+AUTO SCALING
    39. 39. Simple WorkflowSWF
    40. 40. Keeps track of :StateExecuted tasksTimeoutsErrors
    41. 41. WORKFLOWACTORS
    42. 42. DECIDERSCOORDINATION LOGIC1. Poll for work on a decision listLong polling: 60 seconds2. Evaluate workflow execution historySWF sends full history in JSON format3. Return decision to Amazon SWFUsually scheduling another task
    43. 43. WorkersCOORDINATION LOGIC1. Poll for work on a specific task listLong polling: 60 seconds2. Execute works, send heartbeatsSWF sends input data from deciders3. Return success / failureDetailed data can be provided to deciders
    44. 44. NO NEW LANGUAGETO LEARNYOUR CODE IS YOUR WORKFLOW LANGUAGESWF MAINTAINS STATE
    45. 45. AWS FLOWFRAMEWORKJava Library • Entire workflow can beexpressed in sequential code •Integrated with Java Utils API
    46. 46. CHAINED TASKSWITHOUT DECISIONS?use AMAZON SQSNOTIFYRECEIVE TRANSCODE
    47. 47. TASK GRAPH WITH DECISIONS?use AMAZON SWFSPAMCHECKRECEIVEVIDEOCHECKLENGTHREJECTSHORTENVIDEOPUBLISH& NOTIFYGOODLONGOKSPAMTRANSCODE
    48. 48. Principles of HA1. DESIGN FOR FAILURE2. MULTIPLE AVAILABILITY ZONES3. SCALING4. SELF-HEALING5. LOOSE COUPLING
    49. 49. YOUR GOALApplications should continue to function
    50. 50. IT’S ALL ABOUTCHOICEBALANCE COST & HIGH AVAILABILITY
    51. 51. Agenda• Objective– Review services and approaches to build a highly available architecture on AWS• Sections– High Availability Overview– Relevant AWS Features and Services– Principles in Practice• Customer Case Study– Carsguide
    52. 52. Alexander CourtisSolutions Architect
    53. 53. carsguide.com.au – Lead Tracker• Requirements• Architecture• Development Approach• Technologies
    54. 54. carsguide.com.au – Lead Tracker• Requirements• Architecture• Development Approach• Technologies
    55. 55. 106106Lead Tracking ProcessPersist AuditB2BNotify
    56. 56. 107107Non-Functional Requirements• Meet B2B SLAs– Fault Tolerant– Scalable– Fully Auditable• Partial Manual Recovery• Parallel Execution
    57. 57. 108108Alex On Software Engineering: Principle #4• The Best Developers Are The Laziest• Avoid Inventing Octagonal Wheels• Work Very Hard Avoiding Future Work– Automate Testing– Production Requires Little To No Maintenance• Break Into Small, Independent Chunks
    58. 58. carsguide.com.au – Lead Tracker• Requirements• Architecture• Development Approach• Technologies
    59. 59. 110110And The Winner Is…+ +Amazon SWF Spring Framework
    60. 60. Availability Zone #1DeciderDeciderWorkerWorkerWorkerWorkerRDS DBInstanceDynamoDBAmazon SESAmazon SNSRDS DB InstanceStandby (Multi-AZ)Availability Zone #2DeciderDeciderWorkerWorkerWorkerWorkerDynamoDB Amazon SNS
    61. 61. carsguide.com.au – Lead Tracker• Requirements• Architecture• Development Approach• Technologies
    62. 62. Development• Don’t Start With SWF• Build Stateless, Standalone Services• Unit / Integration Test Services• Wrap Services As SWF Workers• Build SWF Deciders For Repeatable Workflows• Build A Single “Master” Decider
    63. 63. Artifacts• 2 Artifacts– Client JAR, used by external application servers to start the process– Master JAR, containing SWF deciders/workers and services• Why have a single Master JAR?– To make bootstrapping as simple as possible: each server instance is identical, youjust select a “flavour” i.e. Decider or Worker
    64. 64. carsguide.com.au – Lead Tracker• Requirements• Architecture• Development Approach• Technologies
    65. 65. B2B ServicesSpring Web Services Apache JAXBApache Amazon SESGSON
    66. 66. 117117Lead Persistence• Well Structured, Fixed Schema Data• Transactional– Relational DatabaseSpring Data JPAAmazon RDS+
    67. 67. 118118Audit Persistence• Important• Variable Format, Unstructured Data• Write Often, Read Rarely– NoSQL– Document Data Store+Spring DataAmazon DynamoDB
    68. 68. 119119Invoking SWF• SWF is invoked via a simple JSON web service call– Roll your own– Java SDK client• Suit yourself• We used the Java SDK client
    69. 69. 120120Workers• Wrap your services as an SWF Worker or Activity• aspectj generated classes
    70. 70. Worker Example@Activities(version = "1.0")@ActivityRegistrationOptions(defaultTaskHeartbeatTimeoutSeconds = FlowConstants.NONE,defaultTaskScheduleToCloseTimeoutSeconds = 180,defaultTaskScheduleToStartTimeoutSeconds = 60,defaultTaskStartToCloseTimeoutSeconds = 60)public interface MyFancyActivities {/*** Post something that is worthy** @param wowFancy mandatory; must be fancy* @return populated log indicating success or failure*/FancyLog postFancy(FancyThing wowFancy);...
    71. 71. 122122Deciders• No GUI or unmanageable “code”• Synchronous code, using Promises• Orchestrates workers and other decider workflows• Executes many times– Stateless
    72. 72. public class RogerDeciderImpl {...@Overridepublic void decide(final Stuff bigStuff) {Promise<StanDecision> stan = stanClient.decide(bigStuff);Promise<FranDecision> fran = franClient.decide(bigStuff);Promise<EarthDestroDecision> decision = rogerClient.decide(stan, fran);klausClient.audit(decision);mothershipClient.blowUp(decision);}Decider Implementation Example
    73. 73. 124124Deployment• EC2 instances managed via Puppet• Apache Maven does everything from source code management to running the processes• Is there a better way to bootstrap?+Amazon ElasticBeanstalkpom.xmlAlex’s AmazingElastic Mavenstalk™=
    74. 74. Architecting for High Availability

    ×