Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018

536 views

Published on

In modern, microservices-based applications, it’s critical to have end-to-end observability of each microservice and the communications between them in order to quickly identify and debug issues. In this session, we cover the techniques and tools to achieve consistent, full-application observability, including monitoring, tracing, logging, and service mesh.

  • Be the first to comment

Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Observability for Modern Applications Evgeny Shulyatyev Software Engineering Manager Cloud Platform, Autodesk C O N 3 0 6 Nathan Taber Sr. Product Marketing Manager AWS
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Autodesk’s cloud transformation Key challenges Autodesk’s resiliency cookbook Step 1: Standardized cloud platform Step 2: Full-stack observability for individual services Step 3: Unified logging and distributed tracing across services Step 4: Resiliency patterns across services Service mesh Introducing AWS App Mesh How it works
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  4. 4. Image courtesy of Tesla Motors, Inc. Image courtesy of Gensler. The Martian © 2015 Twentieth Century Fox. All rights reserved.
  5. 5. © 2018 Autodesk
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automate workflow DETAILS Design Modeling, detail, development Fabrication, pre-assembly Installation
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. .NET Go Go Django .NET Java GoNode.js Java Node.js Node.js Node.js
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. .NET Go Go Django .NET Node.js Node.js Node.js Java GoNode.js Java
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Key challenges Full-stack observability Logging Tracing Profiling Telemetry Standardization Retrofitting
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Our journey so far Resiliency 1: Standardized cloud platform 2: Full-stack observability for individual services 3: Unified logging and distributed tracing across services 4: Resiliency patterns across services
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autodesk “CloudOS” Security and compliance CI/CD Developer velocity Cost efficiency Availability and resiliency Accelerate innovation with faster, automated, and more reliable releases Lock-in security and compliance for all teams, self-serve, with minimal effort Provide a well-lit path to build, deploy, and run services, so product teams can focus on customer problems
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autodesk’s CloudOS platform Product teams Automated CI/CD pipeline CI/CD best practices Standardized deploy, run, and monitor Compliance framework 1 2 3 AWS Cloud
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autodesk’s CloudOS platform Product teams Automated CI/CD pipeline 1 AWS Cloud  Containers  Serverless  Batch 2CI/CD best practices run, and monitorCI pipelines Source code Learning content Localization Release notes Defect detection Codacy SonarQube Security Whitesource CheckMarx Deploy risk mitigation Blue/green deployments Automated post-release testing Deployment templates Standardized pipeline Containers Serverless Batch Key metrics Deployment frequency Change lead time Mean time to recover Change failure rate
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1 Autodesk’s CloudOS platform Product teams Automated CI/CD Pipeline AWS Cloud 2 Workloads Containers Batch Serverless Runtime Linux Windows GPU Infrastructure Zero-downtime patching Automated capacity management Monitoring, security, and compliance controls Standardized deploy, run, and monitor Cluster management Linux Windows GPU Batch WorkloadsCapacity AWS Batch
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autodesk’s CloudOS platform Product teams Automated CI/CD pipeline Compliance framework 3 AWS Cloud Built-in security and compliance controls Automated change management and audit trailsStreamlined compliance
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Full-stack observability for individual services Container application Amazon EC2 nodes Amazon ECS cluster Infrastructure dependencies Single pane of glass Alerting and escalation
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Full-stack infrastructure observability Container application  Application performance monitoring (APM) agent  Unified logging Amazon EC2 nodes  Disk  Memory  CPU  Network I/O  Net response time  Docker daemon health  Security vulnerabilities  Orphan tasks  Amazon ECS agent status Amazon ECS cluster  Pending Amazon ECS tasks  AWS account limits  Auto Scaling group limits Infrastructure dependencies  Vault  Jenkins  ServiceNow  Artifactory Monitors ToolsStack
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Single pane of glass Standardized dashboards for key metrics Automated provisioning Service summary Key API metrics Service dependencies Underlying infrastructure
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified alerting and escalation Alerting source Incident record Service Operations Center (SOC) SME escalation I2I Process
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Need a consistent way to collect and measure metrics of services: MTTR: Forensics Incident management MTBF: Analytics Insights to drive features + resiliency MTTD: Monitoring Real-time operational problem detection and notification
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified Logging Problem Log data in various formats • Cross-service tracing impossible • Complexity for monitoring, forensics, analytics Solution Standardize the log data model • Annotate log records with distributed tracing states • Adopt OpenTracing (http://opentracing.io) • Provide SDK that supports major languages • Integrate with vendor APM products
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Unified logging
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified logging architecture
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unified logging – End-to-end tracing (AWS X-Ray)
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. .NET Go Go Django .NET Node.js Node.js Node.js Java GoNode.js Java
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Degraded state .NETGo
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Degraded state Outage Latency Time (ms)
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resiliency patterns Traffic shaping Rate limiting Circuit breaking Retries Throttling
  33. 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Implementation options Microservice container In-process (SDK) Out-of-process (sidecar proxy) Microservice container Proxy Option 1 Option 2
  34. 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Implementation options Microservice container In-process (SDK) Out-of-process (sidecar proxy) Microservice Container Proxy Option 1 Option 2
  35. 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Option 1: In-process resiliency SDK SDK maintenance Application code changes Retrofitting Unknown dependencies … Java Scala Node.js Python C++ Django .NET GO … … MySQL (hosted + Amazon Relational Database Service (Amazon RDS)) Aurora Microsoft SQL Server PostgreSQL (hosted and Amazon RDS) Redis InfluxDB RabbitMQ MongoDB Amazon DynamoDB Cassandra … Languages Databases
  36. 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Implementation options Microservice container In-process (SDK) Out-of-process (sidecar proxy) Microservice container Proxy Option 1 Option 2
  37. 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Option 2: Side-car proxy Decouple operational logic and SDKs Microservice container Proxy Amazon ECS task / Kubernetes Pod Port 8081 Port 8080External traffic
  38. 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Option 2: Side-car proxy Out-of-process and language independent: Logging Tracing Metrics Resiliency patterns Separation of operational and business logic Integration with legacy services However…
  39. 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Centralized production-grade configuration of proxies at scale is difficult
  40. 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. We need a control plane Centralized location to manage configuration of proxies at scale Dynamic configuration reload without redeploying code Compatibility across different compute platforms Production-grade and fully managed
  41. 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  42. 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing AWS App Mesh
  43. 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS App Mesh configures every proxy Microservice Proxy
  44. 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Easily deliver configuration and receive data Infra Operator Application Developer Metrics Intent Microservice Proxy
  45. 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why AWS App Mesh Libraries or application code vs. mesh Overall—migrate to microservices safer and faster Reduce work required by developers Provide operational controls decoupled from application logic Use any language or platform Simplify visibility, troubleshooting, and deployments
  46. 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. App Mesh uses Envoy proxy
  47. 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why AWS App Mesh vs. building or running your own mesh No need to spend on Dev to build and Ops to maintain Not tied to application deployment system (e.g., container orchestration) Works across different compute systems Gradual migration, onboard services one at a time
  48. 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why AWS App Mesh vs. existing control plane solutions Works across clusters, container services Integrations with AWS and partner tools Run by AWS for scale and stability Extensible architecture from OSS base
  49. 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Services connect directly
  50. 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deployments B B’ A
  51. 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Traffic controls
  52. 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application observability + others Universal metrics collection for a wide range of monitoring tools
  53. 53. </> </> </> </> </> </> </> </> </> </> </> </>
  54. 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Mesh – [sample_app] Elastic Load Balancing Virtual node A Service discovery Listener Backends Virtual node B Service discovery Listener Backends App MeshMicroservices How it works
  55. 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Virtual node Virtual node Service discovery BackendsListeners Virtual node Logical representation of runtime services. Backends Set of destinations that this node will communicate with (hostnames) Service discovery Describes how its callers locate this node Listeners Policies to handle incoming traffic
  56. 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Mesh – [sample_app] Virtual router HTTP route Targets: Prefix: / B B’ Virtual node A Service discovery Listener Backends Virtual node B Service discovery Listener Backends Virtual node B’ Service discovery Listener Backends Connecting microservices
  57. 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deployments B B’ A
  58. 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Virtual router Virtual router HTTP route Prefix: / Targets: B B’
  59. 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Representing your sample_app in AWS App Mesh Mesh – [sample_app] Service C Virtual router Virtual node C Service D Virtual router Virtual node D Service A Service B Virtual router Virtual node B
  60. 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS App Mesh is available as a preview for all customers Observability and traffic control AWS container services compatibility Regions
  61. 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS App Mesh is available as a preview for all customers Preview today, GA in 2019 Learn more at: aws.amazon.com/app-mesh github.com/awslabs/aws-app-mesh-examples
  62. 62. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evgeny Shulyatyev https://www.linkedin.com/in/evgeny-shulyatyev-741b3026 Nathan Taber https://www.linkedin.com/in/natetaber/
  63. 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×