Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AIOps - Steps Towards Autonomous Operations - AWS Summit Sydney 2019

56 views

Published on

Automation has reaped efficiencies in business and IT operations and extending it with predictive maintenance will further improve reliability. In this session, learn how to architect a predictive and preventative remediation solution for your applications and infrastructure resources. We show you how to collect performance and operational intelligence, recognise and predict patterns using AI and machine learning, and fix issues. We show you how to achieve it using AWS native solutions, Amazon SageMaker and Amazon CloudWatch.

  • Be the first to comment

  • Be the first to like this

AIOps - Steps Towards Autonomous Operations - AWS Summit Sydney 2019

  1. 1. S U M M I T SYDNEY
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AIOps: Steps towards autonomous operations Sri (Srichakri) Nadendla Enterprise Solutions Architect Amazon Web Services
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda • Effective operational practices • Enablers for autonomous operations • Demo using Amazon Sagemaker
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Operations School of hard knocks
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Whatever can go wrong, will go wrong.
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Operations objectives Keep it safe Keep the lights on Reliability (Availability + Performance) Security
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ops holy grail Prevent Correct Baseline
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Enablers to effective operations Collection Patterns Actions
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Foundations to observability Alerts and notifications Data, tools and patterns Ingestion Metrics, events and logs Threat intel Budgets Planned events
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Step 1 – Enable instrumentation Metrics, events and logs
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS services in the context Instrumentation
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Step 2 – Ingest and store Data storage Ingestion Metrics, events and logs
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS services in the context Instrumentation Ingestion Storage
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Step 3 – Query and pattern mining Data Storage Ingestion Metrics, events and logs Threat intel Budgets Planned events Analysis
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS services in the context Instrumentation Ingestion Storage Analysis
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Step 4 – Alerts and Remediation Alerts & notifications Data, tools & patterns Ingestion Metrics, events and logs Threat intel Budgets Planned events
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T AWS services in the context Instrumentation Ingestion Storage Analysis Alerts & Actions
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Some challenges Metrics galore Dashboards fatigue Manual correlation Static thresholds
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Responsiveness matters Root cause identification Dynamic detection Proactive remediation
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Path to autonomous operations Data, tools & patterns Ingestion Metrics, events and logs Threat intel Budgets Planned events Predictive, actionable and automated remediation
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Key techniques Correlation Anomaly detection Forecasting
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon SageMaker: Build, train, and deploy machine learning models at scale 1 2 3
  23. 23. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  24. 24. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Artificial Intelligence amplifies the possibilities of human-machine collaboration
  25. 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Additional considerations Shift left (CI/CD) Runbook invocation Knowledge assist
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T GOOD intentions should have GOALs Mean time between failures Proactive actions # Problems avoided Time to detect Time to resolve Mean time to recover
  27. 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T What we are really trying to achieve … Infrastructure Support Innovation Infrastructure Support Innovation Innovation Support ✅
  28. 28. Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sri (Srichakri) Nadendla nadendls@amazon.com

×