Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) - AWS re:Invent 2018

624 views

Published on

Applications built on a microservices-based architecture and packaged as containers bring several benefits to your organization. In this session, Duolingo, a popular language-learning platform and an Amazon ECS customer, describes its journey from a monolith to a microservices architecture. We highlight the hurdles you may encounter, discuss how to plan your migration to microservices, and explain how you can use Amazon ECS to manage this journey.

  • How to Manifest Anything You Want in 24 hours ♥♥♥ https://tinyurl.com/y6pnne55
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. From Monolith to Microservices (And All the Bumps along the Way) Max Blaze Staff Operations Engineer Duolingo C O N 3 6 0 - R Brent Langston Sr. Developer Advocate Amazon Web Services
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do I prepare?
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. People WorkflowTechnical
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are some hurdles to prepare for? People
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are some hurdles to prepare for? Old ways in contrast to new ways People
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are some hurdles to prepare for? Old ways in contrast to new ways “How can I develop locally?” People
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are some hurdles to prepare for? Old ways in contrast to new ways “How can I develop locally?” “Deployments are painful. Will that get worse?” People
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are some hurdles to prepare for? Old ways in contrast to new ways “How can I develop locally?” “Deployments are painful. Will that get worse?” “My config management already handles everything perfectly!” People
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are some hurdles to prepare for? Old ways in contrast to new ways “How can I develop locally?” “Deployments are painful. Will that get worse?” “My config management already handles everything perfectly!” “What do you mean I have to be on call?” People
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How can AWS help? People
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Docker help with the migration? Workflow
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Docker help with the migration? • Packaging It becomes the same, no matter the language. Docker multi-stage builds—tests in one container, copies the final artifact to a newer, smaller container. Workflow
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Docker help with the migration? • Packaging It becomes the same, no matter the language. Docker multi-stage builds—tests in one container, copies the final artifact to a newer, smaller container. • Distribution It becomes the same, no matter the language. Docker pulls the desired image tag, and Docker runs the container. Language doesn’t matter. Workflow
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Docker help with the migration? • Packaging It becomes the same, no matter the language. Docker multi-stage builds—tests in one container, copies the final artifact to a newer, smaller container. • Distribution It becomes the same, no matter the language. Docker pulls the desired image tag, and Docker runs the container. Language doesn’t matter. • Scaling up and down becomes fast Docker images start quickly, so we can add them and remove them in response to current demand. Workflow
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Docker help with the migration? • Packaging It becomes the same, no matter the language. Docker multi-stage builds—tests in one container, copies the final artifact to a newer smaller container. • Distribution It becomes the same, no matter the language. Docker pulls the desired image tag, and Docker runs the container. Language doesn’t matter. • Scaling up and down becomes fast. Docker images start quickly, so we can add them and remove them in response to current demand. • Experimenting becomes easier, so it’s encouraged. Since packaging and distribution are trivially easy, we can experiment with code changes, and roll them back, or iterate on them with ease, typically without any additional demand for resources. Workflow Workflow
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How can AWS help? Workflow Workflow
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? Technical
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? Is the process running? Replaces monit, supervisord, systemd Technical
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? Is the process healthy? Include load balancer healthchecks Include Docker healthcheck Only healthcheck one layer Technical
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? How do I get traffic to my processes? Integration with ALB/NLB Technical
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? How does one service find another? Service discovery Technical
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? When should I scale up (or down)? Metrics monitoring TPS calculation Auto scaling Technical
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? How do I deploy new code? Docker replaces Fabricator/Chef/Ansible/and the like No more copying files, moving symlinks, restarting processes No custom deploy logic to maintain tag with git-sha, deploy new tag Blue green deploys for most things Some processes should be red/black Technical
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? How do I add, replace, or remove instances? Instances are now simply resources, nothing special No “special snowflakes” Configuration is immutable Replace, don’t reconfigure quickly add or drain/replace/remove instances Technical
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What problems does orchestration solve? How do I save money? Replace “special snowflakes” with generic instances that are resource pools Docker makes it easier to co-locate services Instances can run at higher utilization Fewer instances are necessary Can use instance store for disk space, rather than Amazon EBS Amazon ECS integrates with spot fleet for even cheaper instances Technical
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How can AWS help? Technical
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing: Max Blaze Staff Operations Engineer Duolingo C O N 3 6 0 - R
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Free and accessible language education for all
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The most downloaded education app in the world
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 30+ languages/80+ courses
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  33. 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 300M+ users worldwide
  34. 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 150 employees
  35. 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Duolingo growth
  36. 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A brief history Launch Config management AWS Auto Scaling First microservice Centralized dashboards + logging Infrastructure as code First microservice on Amazon ECS
  37. 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why move to microservices? Scalability Cost savings Velocity ReliabilityFlexibility
  38. 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do you decide what to carve out of your monolith first? • Start with a small, but impactful feature • Move up in size, complexity, and risk • Consider dependencies
  39. 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 0.99 Monolith System availability
  40. 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chained microservices 0.99 * 0.99 * 0.99 = 0.97 0.990.990.99
  41. 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Independent microservices 1 - (1 - 0.99)3 = 0.999999 0.990.990.99
  42. 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why use Docker for microservices? • Standardizes the build process and encapsulates dependencies • Local development environment similar to production • Quick deployments and rollbacks • Flexible resource allocation
  43. 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplifying local development setup (old way) 1. Clone this repository. 2. Set up and activate a virtualenv and install requirements using pip install -r requirements.txt. 3. Download and install Postgres: brew install postgresql 4. Run Postgres locally: postgres -D /usr/local/var/postgres 5. Download pgAdmin3 (not totally necessary, but will make life easier). 6. Using pgAdmin3, create a new login role under your local server with name "admin" and password "somepassword". 7. Create a DB called “db". 8. Run the migration script in the repo using python manage.py db upgrade. 9. Check that your DB is now populated with tables. 10. Set up and run memcached: brew install memcached 11. Set up and run redis: brew install redis-server 12. Set up and run elasticsearch: brew install elasticsearch 13. Finally, try to run the server using python application.py. You can test if it’s working by going here (you should see user info) and here (you should see {'data': []}).
  44. 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simplifying local development setup (new way) $ docker-compose build $ docker-compose up
  45. 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why use Docker with Amazon ECS? Task auto scaling Manageability Amazon CloudWatch metrics Dynamic ALB targetsTask-level AWS IAM
  46. 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Microservice abstractions at Duolingo Web service (internal or external) Worker service (daemon or cron) Monitoring Amazon Route53 ALB Amazon ECS tasks Amazon SQS Amazon ECS task Event Data stores Amazon RDS Amazon DynamoDB Redis/Memcached AWS KMS ELK stack CloudWatch Grafana PagerDuty
  47. 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Microservice definition in Terraform module “duolingo-api" { source = “repo/terraform//modules/ecs_web_service" environment = “prod" product = “duolingo" service = “api" owner = “Max Blaze" min_count = 2 max_count = 4 cpu = 512 memory = 256 ecs_cluster = "prod" internal = "true" container_port = 5000 version = "${var.version}" } Billing tags Resources Auto scaling
  48. 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aurora database cluster definition in Terraform module “duolingo-api—db“ { source = “repo/terraform//modules/ecs_web_service" product = “duolingo” service = “api” subservice = “db" owner = “Max Blaze” cluster_identifier = “duolingo-api-db-cluster" identifier = “duolingo-api" engine = "aurora-postgresql" name = "duolingo" password = "${data.aws_kms_secret.duolingo_api_db.duolingo_api_db}" instance_class = "db.r4.large" num_cluster_instances = 2 } Billing tags Instance type DB engine
  49. 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Continuous integration and deployment Developer GitHub Jenkins Terraform Deployment Terraform state Amazon S3 bucket Plan/apply with version string Amazon ECS Amazon ECR repo Dockerfile build Docker image push Docker image pull
  50. 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Load balancing • ALBs and CLBs operate at different network layers • ALBs are more strict when handling malformed requests • ALBs default to HTTP/2 • Headers are always passed as lowercase • There are differences in Amazon CloudWatch metrics ALB ≠ CLB
  51. 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Task-level AWS IAM role permissions • Apply permissions at the service level • Do not share permissions across microservices • Needs to be supported by the AWS client library
  52. 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Standardizing microservices • Develop a common naming scheme for repos and services • Autogenerate as much of the initial service as possible • Move core functionality to shared base libraries • Provide standard alarms and dashboards • Periodically review microservices for consistency and quality
  53. 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring microservices Web service dashboard • Local time and UTC • Healthy, unhealthy, and running tasks • Latency average and percentiles • Number of requests • CPU and memory utilization (min/avg/max) • Service errors by AZ • ALB errors by AZ
  54. 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring microservices Worker service dashboard • Local time and UTC • Running tasks • CPU and memory utilization (min/avg/max) • Visible messages • Deleted messages
  55. 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring microservices PagerDuty integration • Schedules and rotations are defined in Terraform • Emergency alarms page (high latency) • Warning alarms go to e-mail (low memory) • Include links to playbooks • All pages are also visible in Slack
  56. 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Grading microservices Architecture Documentation Processes Tests
  57. 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Grading microservices
  58. 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Cluster • Instance type • Pricing options • Auto scale • Add/remove AZs Web Worker API • Task • Resource allocation • Auto scale Cost reduction options
  59. 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cluster starting point c3.2xlarge Reserved instances On-demand
  60. 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High-CPU instance generations Speed $/hour Disk c3.large - 0.105 SSD c4.large +20% of c3 0.100 None (Amazon EBS- only) c5d.large +25% of c4 0.096 NVMe c5 is 50% faster than c3!
  61. 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Moving to a new Amazon EC2 generation Latest instances are generally faster and cheaper but… • “cpu” units in Amazon ECS will not be equivalent • Auto scaling may not work properly between generations (1 vCPU core = 1024 units) c5.large cpu = 1024 c3.large cpu = 1024 c4.large cpu = 1024 > >
  62. 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fixed number of instances c5d.2xlarge Reserved instances On-demand c5d.large…c5d.18xlarge m5d.large...m5d.24xlarge Reserved instances On-demand Spot Auto scaling
  63. 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fixed number of instances c5d.2xlarge Reserved instances On-demand c5d.large…c5d.18xlarge m5d.large...m5d.24xlarge Reserved instances On-demand Spot
  64. 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spotinst cluster features • Mixes families + sizes • Uses RIs before spot • 15 minute spot notice • Fits capacity to Amazon ECS tasks • AZ capacity heat map
  65. 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spotinst cluster features • Drains Amazon ECS tasks • Cluster “headroom” • Spreads capacity across AZs • Bills on % of savings • Terraform support
  66. 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto scaling with Spotinst
  67. 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What about per-microservice costs? • Audit CPU/memory allocations for each service • Update auto scaling and/or CPU allocations as needed Goals 60% CPU 60–80% Memory
  68. 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adjusting allocated CPU for scaling allocatedCPU * currentUtilization = actualCPU actualCPU/desiredUtilization = Units to set Example: Current utilization: 40% Desired utilization: 60% 1024 * 40% = 409.6 409.6/60% = 682.67 Set Amazon ECS “cpu” allocation to 683 (1 vCPU core = 1024 units)
  69. 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adjusting allocated memory • Track memory usage between deployments • Constantly increasing memory usage points to memory leaks • Set containers to restart if memory exceeds 100%
  70. 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. API costs ListAllMyBuckets + GetObject > 50% of Amazon S3 cost!
  71. 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost savings May July August September October Amazon EC2 compute costs > 60% reduction from May to October > 60% reduction in compute costs > 30% reduction in costs per monthly active user (MAU) > 25% reduction total AWS bill
  72. 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key results • Scalability • Manage ~100 microservices • Velocity • Teams deploy to their own services • Flexibility • Officially support three different programming languages • Reliability • 99.99% availability achieved last quarter • Cost • 60% reduction in compute costs
  73. 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  74. 74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resources • Books • Building Microservices: Designing Fine-Grained Systems (Sam Newman) • Microservices in Production (Susan J. Fowler) • References • ec2instances.info • github.com/open-guides/og-aws • Tools and services • ansible.com • docker.com • elastic.io • github.com • grafana.com • jenkins.io • pagerduty.com • spotinst.com • terraform.io
  75. 75. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  76. 76. Please complete the session survey in the mobile app. ! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×