SlideShare a Scribd company logo
AWS Observability
made simple
Eóin Shanaghy - Luciano Mammino
AWS Community Day - November 11th 2021
fth.link/o11y-simple
Hi! I’m Eoin 🙂
CTO
aiasaservicebook.com
@eoins
eoins
✉ Get in touch
👋 Hello, I am Luciano
Senior architect
nodejsdesignpatterns.com
Let’s connect:
🌎 loige.co
🐦 @loige
🎥 loige
🧳 lucianomammino
We are business focused technologists
that deliver.
Accelerated Serverless | AI as a Service | Platform Modernisation
We are hiring! Let’s have a chat 🙂
Check out our new Podcast!
awsbites.com
fth.link/o11y-simple
Observability in the cloud
a measure of how well internal states of a
system can be inferred from knowledge of its
external outputs
🪵 🔍 📈 🚨
Structured Logs Tracing Metrics Alarms
“
A typical case study
⚡ Serverless app
● Distributed system (100s of components)
🔌 HTTP APIs using
● Lambda
● DynamoDB
● API Gateway
● Cognito
🧱 Multiple services / stacks
🏁 Using SLIC Starter (fth.link/slic)
173
resources!
A typical case study
⚽ The goal: know about problems before users do
How?
📝 Structured Logs
📐 Metrics
🔔 Alarms
📊 Dashboards
🗺 Traces (X-Ray)
Can we test our observability?
󰝊 We run a stress test
○ Simulate traffic using the integration test
○ Run the test a number of times in parallel (in a loop)
○ Exercises all the APIs with typical use cases (login, CRUD operations, etc.)
🚨 After 10-15 minutes, we started to get alarms...
🚨 Alerts flow!
Making sense of alerts
Initial Hypothesis
🛑 We got throttled (DynamoDB write throttle events)
↪ 🔁 causing AWS SDK retries (in the Lambda function)
↪ ⏱ causing Lambda timeouts
↪ 👎 causing API Gateway 502
🧪 How do we validate this?
1. Check the timeout cause ➡ Lambda metrics/logs
2. Check the Lambda error cause ➡ Lambda logs
3. Identify the source of 5xx errors in API Gateway ➡ X-Ray
4. Check the DynamoDB metrics ➡ Dashboards
Gathering evidence
Checking timeouts
● Check lambda timeouts
○ Duration metrics (aggregated data)
○ Logs (individual requests)
● Logs Insights give us duration for each
individual request. We can use this to
isolate the logs for just that request.
● We use stats to see how many executions
are affected.
Inspecting DynamoDB Capacity
Tracing errors
HTTP 502
HTTP 500
UNEXPECTED! 😱
Lambda CloudWatch
Logs
Conclusions
🌡 Symptom 🐞 Problem 󰟿 Resolution
1 DynamoDB throttles
Table with low provisioned
WCUs (write capacity)
Switch table to
PAY_PER_REQUEST
Add throttling in API Gateway to limit
potential cost impact
2
API 502 Errors
Lambda Timeouts
Throttles caused
DynamoDB retries with
exponential backoff - up to
50 seconds of retry
Change maxRetries to 3 (350ms max
retry)
3 API 500 Errors
Attempt to update a
missing record - problem
with integration test!
Fix the integration test to ensure
deletion occurs after other actions
complete. Also improved the API
design
Before and after
What we have learned so far 󰠅
● We were able to identify, understand and fix these errors quite quickly
● We didn’t have to change the code to do that
● Nor did we run it locally with a debugger
● All of this was possible because we configured observability tools in
AWS in advance
AWS native o11y = CloudWatch
Cloudwatch gives you:
➔ Logs with Insights
➔ Metrics
➔ Dashboards
➔ Alarms
➔ Canaries
➔ Distributed tracing (with X-Ray)
Alternatives outside AWS
Established
New entrants
Roll your own (only for the brave)
CloudWatch out of the box
😍 A toolkit you can use to build
observability
🤩 Metrics are automatically
generated for all services!
😟 Lots of dashboards, but by
service and not by application!
😢 Zero alarms out of the box!
Getting the best out of Cloudwatch
Cloudwatch can be your friend if you...
📚 Research and understand available metrics
📐 Decide thresholds
📊 Write IaC for application dashboards
⏰ Write IaC for service metric alarms
⏪ Update every time your application changes
📋 Copy and paste for each stack in your application
(a.k.a. A LOT OF WORK!)
Best practices
😇 AWS Well Architected Framework
🏛 5 Pillars
⚙ Operational excellence pillar covers observability
🧐 Serverless lens applies these pillars
👍 Good guidance on metrics to observe
👎 More reading and research + you still have to pick thresholds
CloudFormation for CloudWatch Alarms 😬
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"ActionsEnabled": true,
"AlarmActions": [
"arn:aws:sns:eu-west-1:665863320777:FTSLICAlarms"
],
"AlarmName": "LambdaThrottles_serverless-test-project-dev-hello",
"AlarmDescription": "Throttles % for serverless-test-project-dev-hello ..",
"EvaluationPeriods": 1,
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"TreatMissingData": "notBreaching",
"Metrics": [
{
"Id": "throttles_pc",
"Expression": "(throttles / throttles + invocations) * 100",
"Label": "% Throttles",
"ReturnData": true
},
{
"Id": "throttles",
"MetricStat": {
"Metric": {
"Namespace": "AWS/Lambda",
"MetricName": "Throttles",
"Dimensions": [
{
"Name": "FunctionName",
"Value": "serverless-test-project-dev-hello"
}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "invocations",
"MetricStat": {
"Metric": {
"Namespace": "AWS/Lambda",
"MetricName": "Invocations",
Can we automate this?
Magically
generated alarms
and dashboards for
each application!
fth.link/slic-watch
Introducing
SLIC Watch
How SLIC Watch works 🛠
Your app
serverless.yml
sls deploy
CloudFormation stack
very-big.json
SLIC Watch
👀 🛠
CloudFormation stack ++
even-bigger.json
Deploy ☁
📊📈
Before SLIC Watch
After SLIC Watch
After SLIC Watch
After SLIC Watch
After SLIC Watch
After SLIC Watch
Check out SLIC Slack
Configuration
🎀 SLIC Watch comes with sane defaults
📝 You can configure what you don’t like
🔌 Or disable specific dashboards or alarms
How to get started
📣 Create an SNS Topic as the alarm destination (optional)
📦 ❯ npm install serverless-slic-watch-plugin --save-dev
✍ Update serverless.yml
⚙ Configure (optional)
🚢 ❯ sls deploy
plugins:
- serverless-slic-watch-plugin 💡 Check out
the complete
example project
in the repo!
Wrapping up 🎁
★ If your services are failing you definitely want to know about it!
★ Observability can save you from hundreds of hours of blind debugging!
★ CloudWatch is the go to tool in AWS but you have to configure it!
★ Automation can take most of the configuration pain away
★ SLIC Watch can give you this automation
★ You still have control and flexibility
🔬Try it out! 🗣 Give feedback! 🌈 Let’s make it better!
fth.link/slic-watch
Thank you!
fth.link/o11y-simple
Cover picture by Markus Spiske on Unsplash

More Related Content

What's hot

MySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA ToolMySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA Tool
Miguel Araújo
 
MySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - TutorialMySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - Tutorial
Kenny Gryp
 
How WebLogic 12c Can Boost Your Productivity
How WebLogic 12c Can Boost Your ProductivityHow WebLogic 12c Can Boost Your Productivity
How WebLogic 12c Can Boost Your Productivity
Bruno Borges
 
MySQL InnoDB Cluster and Group Replication - OSI 2017 Bangalore
MySQL InnoDB Cluster and Group Replication - OSI 2017 BangaloreMySQL InnoDB Cluster and Group Replication - OSI 2017 Bangalore
MySQL InnoDB Cluster and Group Replication - OSI 2017 Bangalore
Sujatha Sivakumar
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must Know
Bruno Borges
 
MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017
MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017
MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017
Ivan Ma
 
Java EE 7 for WebLogic 12c Developers
Java EE 7 for WebLogic 12c DevelopersJava EE 7 for WebLogic 12c Developers
Java EE 7 for WebLogic 12c Developers
Bruno Borges
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
Mario Beck
 
WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013
Michel Schildmeijer
 
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best PracticesOracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best PracticesSven Sandberg
 
Oracle Fusion Middleware on Exalogic Best Practises
Oracle Fusion Middleware on Exalogic Best PractisesOracle Fusion Middleware on Exalogic Best Practises
Oracle Fusion Middleware on Exalogic Best Practises
Michel Schildmeijer
 
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12cDeveloping Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Bruno Borges
 
Best Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle DatabaseBest Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle Database
Christopher Jones
 
MySQL 5.7: What's New, Nov. 2015
MySQL 5.7: What's New, Nov. 2015MySQL 5.7: What's New, Nov. 2015
MySQL 5.7: What's New, Nov. 2015
Mario Beck
 
MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...
MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...
MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...
Miguel Araújo
 
Why Play Framework is fast
Why Play Framework is fastWhy Play Framework is fast
Why Play Framework is fast
Legacy Typesafe (now Lightbend)
 
MySQL Group Replication - an Overview
MySQL Group Replication - an OverviewMySQL Group Replication - an Overview
MySQL Group Replication - an Overview
Matt Lord
 
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL ShellMySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
Miguel Araújo
 
Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.
Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.
Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.Geir Høydalsvik
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problems
Christo Kutrovsky
 

What's hot (20)

MySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA ToolMySQL Shell - The Best MySQL DBA Tool
MySQL Shell - The Best MySQL DBA Tool
 
MySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - TutorialMySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - Tutorial
 
How WebLogic 12c Can Boost Your Productivity
How WebLogic 12c Can Boost Your ProductivityHow WebLogic 12c Can Boost Your Productivity
How WebLogic 12c Can Boost Your Productivity
 
MySQL InnoDB Cluster and Group Replication - OSI 2017 Bangalore
MySQL InnoDB Cluster and Group Replication - OSI 2017 BangaloreMySQL InnoDB Cluster and Group Replication - OSI 2017 Bangalore
MySQL InnoDB Cluster and Group Replication - OSI 2017 Bangalore
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must Know
 
MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017
MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017
MySQL InnoDB Cluster and MySQL Group Replication @HKOSC 2017
 
Java EE 7 for WebLogic 12c Developers
Java EE 7 for WebLogic 12c DevelopersJava EE 7 for WebLogic 12c Developers
Java EE 7 for WebLogic 12c Developers
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013WebLogic on ODA - Oracle Open World 2013
WebLogic on ODA - Oracle Open World 2013
 
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best PracticesOracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
Oracle OpenWorld 2013 - HOL9737 MySQL Replication Best Practices
 
Oracle Fusion Middleware on Exalogic Best Practises
Oracle Fusion Middleware on Exalogic Best PractisesOracle Fusion Middleware on Exalogic Best Practises
Oracle Fusion Middleware on Exalogic Best Practises
 
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12cDeveloping Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
Developing Java EE Applications on IntelliJ IDEA with Oracle WebLogic 12c
 
Best Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle DatabaseBest Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle Database
 
MySQL 5.7: What's New, Nov. 2015
MySQL 5.7: What's New, Nov. 2015MySQL 5.7: What's New, Nov. 2015
MySQL 5.7: What's New, Nov. 2015
 
MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...
MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...
MySQL InnoDB Cluster / ReplicaSet - Making Provisioning & Troubleshooting as ...
 
Why Play Framework is fast
Why Play Framework is fastWhy Play Framework is fast
Why Play Framework is fast
 
MySQL Group Replication - an Overview
MySQL Group Replication - an OverviewMySQL Group Replication - an Overview
MySQL Group Replication - an Overview
 
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL ShellMySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
 
Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.
Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.
Simplifying MySQL, Pre-FOSDEM MySQL Days, Brussels, January 30, 2020.
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problems
 

Similar to AWS Observability Made Simple

AWS Observability (without the Pain)
AWS Observability (without the Pain)AWS Observability (without the Pain)
AWS Observability (without the Pain)
Luciano Mammino
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
Yan Cui
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
Amazon Web Services
 
AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)
Yan Cui
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the Trenches
Yan Cui
 
Serverless in production, an experience report (microservices london)
Serverless in production, an experience report (microservices london)Serverless in production, an experience report (microservices london)
Serverless in production, an experience report (microservices london)
Yan Cui
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
AWS Germany
 
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Domas Lasauskas
 
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
Yan Cui
 
Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)
Yan Cui
 
Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...
Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...
Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...
Amazon Web Services
 
AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)
AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)
AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)
Amazon Web Services
 
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Yan Cui
 
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Codemotion
 
Serverless microservices in the wild
Serverless microservices in the wildServerless microservices in the wild
Serverless microservices in the wild
Rotem Tamir
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Dhaval Nagar
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
Helen Rogers
 
Build an app on aws for your first 10 million users (2)
Build an app on aws for your first 10 million users (2)Build an app on aws for your first 10 million users (2)
Build an app on aws for your first 10 million users (2)
AWS Vietnam Community
 
Automated Governance of Your AWS Resources
Automated Governance of Your AWS ResourcesAutomated Governance of Your AWS Resources
Automated Governance of Your AWS Resources
Amazon Web Services
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Yan Cui
 

Similar to AWS Observability Made Simple (20)

AWS Observability (without the Pain)
AWS Observability (without the Pain)AWS Observability (without the Pain)
AWS Observability (without the Pain)
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)AWS Lambda from the trenches (Serverless London)
AWS Lambda from the trenches (Serverless London)
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the Trenches
 
Serverless in production, an experience report (microservices london)
Serverless in production, an experience report (microservices london)Serverless in production, an experience report (microservices london)
Serverless in production, an experience report (microservices london)
 
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
Automatisierte Kontrolle und Transparenz in der AWS Cloud – Autopilot für Com...
 
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)
 
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London 2018)
 
Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)Serverless in production, an experience report (London js community)
Serverless in production, an experience report (London js community)
 
Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...
Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...
Using AWS CloudTrail and AWS Config to Enhance Governance and Compliance of A...
 
AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)
AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)
AWS re:Invent 2016: Automated Governance of Your AWS Resources (DEV302)
 
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
 
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
 
Serverless microservices in the wild
Serverless microservices in the wildServerless microservices in the wild
Serverless microservices in the wild
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Build an app on aws for your first 10 million users (2)
Build an app on aws for your first 10 million users (2)Build an app on aws for your first 10 million users (2)
Build an app on aws for your first 10 million users (2)
 
Automated Governance of Your AWS Resources
Automated Governance of Your AWS ResourcesAutomated Governance of Your AWS Resources
Automated Governance of Your AWS Resources
 
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
 

More from Luciano Mammino

Did you know JavaScript has iterators? DublinJS
Did you know JavaScript has iterators? DublinJSDid you know JavaScript has iterators? DublinJS
Did you know JavaScript has iterators? DublinJS
Luciano Mammino
 
What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...
What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...
What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...
Luciano Mammino
 
Building an invite-only microsite with Next.js & Airtable - ReactJS Milano
Building an invite-only microsite with Next.js & Airtable - ReactJS MilanoBuilding an invite-only microsite with Next.js & Airtable - ReactJS Milano
Building an invite-only microsite with Next.js & Airtable - ReactJS Milano
Luciano Mammino
 
From Node.js to Design Patterns - BuildPiper
From Node.js to Design Patterns - BuildPiperFrom Node.js to Design Patterns - BuildPiper
From Node.js to Design Patterns - BuildPiper
Luciano Mammino
 
Let's build a 0-cost invite-only website with Next.js and Airtable!
Let's build a 0-cost invite-only website with Next.js and Airtable!Let's build a 0-cost invite-only website with Next.js and Airtable!
Let's build a 0-cost invite-only website with Next.js and Airtable!
Luciano Mammino
 
Everything I know about S3 pre-signed URLs
Everything I know about S3 pre-signed URLsEverything I know about S3 pre-signed URLs
Everything I know about S3 pre-signed URLs
Luciano Mammino
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
Luciano Mammino
 
JavaScript Iteration Protocols - Workshop NodeConf EU 2022
JavaScript Iteration Protocols - Workshop NodeConf EU 2022JavaScript Iteration Protocols - Workshop NodeConf EU 2022
JavaScript Iteration Protocols - Workshop NodeConf EU 2022
Luciano Mammino
 
Building an invite-only microsite with Next.js & Airtable
Building an invite-only microsite with Next.js & AirtableBuilding an invite-only microsite with Next.js & Airtable
Building an invite-only microsite with Next.js & Airtable
Luciano Mammino
 
Let's take the monolith to the cloud 🚀
Let's take the monolith to the cloud 🚀Let's take the monolith to the cloud 🚀
Let's take the monolith to the cloud 🚀
Luciano Mammino
 
A look inside the European Covid Green Certificate - Rust Dublin
A look inside the European Covid Green Certificate - Rust DublinA look inside the European Covid Green Certificate - Rust Dublin
A look inside the European Covid Green Certificate - Rust Dublin
Luciano Mammino
 
Monoliths to the cloud!
Monoliths to the cloud!Monoliths to the cloud!
Monoliths to the cloud!
Luciano Mammino
 
The senior dev
The senior devThe senior dev
The senior dev
Luciano Mammino
 
Node.js: scalability tips - Azure Dev Community Vijayawada
Node.js: scalability tips - Azure Dev Community VijayawadaNode.js: scalability tips - Azure Dev Community Vijayawada
Node.js: scalability tips - Azure Dev Community Vijayawada
Luciano Mammino
 
A look inside the European Covid Green Certificate (Codemotion 2021)
A look inside the European Covid Green Certificate (Codemotion 2021)A look inside the European Covid Green Certificate (Codemotion 2021)
A look inside the European Covid Green Certificate (Codemotion 2021)
Luciano Mammino
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti Serverless
Luciano Mammino
 
Finding a lost song with Node.js and async iterators - NodeConf Remote 2021
Finding a lost song with Node.js and async iterators - NodeConf Remote 2021Finding a lost song with Node.js and async iterators - NodeConf Remote 2021
Finding a lost song with Node.js and async iterators - NodeConf Remote 2021
Luciano Mammino
 
Finding a lost song with Node.js and async iterators - EnterJS 2021
Finding a lost song with Node.js and async iterators - EnterJS 2021Finding a lost song with Node.js and async iterators - EnterJS 2021
Finding a lost song with Node.js and async iterators - EnterJS 2021
Luciano Mammino
 
How to send gzipped requests with boto3
How to send gzipped requests with boto3How to send gzipped requests with boto3
How to send gzipped requests with boto3
Luciano Mammino
 

More from Luciano Mammino (20)

Did you know JavaScript has iterators? DublinJS
Did you know JavaScript has iterators? DublinJSDid you know JavaScript has iterators? DublinJS
Did you know JavaScript has iterators? DublinJS
 
What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...
What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...
What I learned by solving 50 Advent of Code challenges in Rust - RustNation U...
 
Building an invite-only microsite with Next.js & Airtable - ReactJS Milano
Building an invite-only microsite with Next.js & Airtable - ReactJS MilanoBuilding an invite-only microsite with Next.js & Airtable - ReactJS Milano
Building an invite-only microsite with Next.js & Airtable - ReactJS Milano
 
From Node.js to Design Patterns - BuildPiper
From Node.js to Design Patterns - BuildPiperFrom Node.js to Design Patterns - BuildPiper
From Node.js to Design Patterns - BuildPiper
 
Let's build a 0-cost invite-only website with Next.js and Airtable!
Let's build a 0-cost invite-only website with Next.js and Airtable!Let's build a 0-cost invite-only website with Next.js and Airtable!
Let's build a 0-cost invite-only website with Next.js and Airtable!
 
Everything I know about S3 pre-signed URLs
Everything I know about S3 pre-signed URLsEverything I know about S3 pre-signed URLs
Everything I know about S3 pre-signed URLs
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
JavaScript Iteration Protocols - Workshop NodeConf EU 2022
JavaScript Iteration Protocols - Workshop NodeConf EU 2022JavaScript Iteration Protocols - Workshop NodeConf EU 2022
JavaScript Iteration Protocols - Workshop NodeConf EU 2022
 
Building an invite-only microsite with Next.js & Airtable
Building an invite-only microsite with Next.js & AirtableBuilding an invite-only microsite with Next.js & Airtable
Building an invite-only microsite with Next.js & Airtable
 
Let's take the monolith to the cloud 🚀
Let's take the monolith to the cloud 🚀Let's take the monolith to the cloud 🚀
Let's take the monolith to the cloud 🚀
 
A look inside the European Covid Green Certificate - Rust Dublin
A look inside the European Covid Green Certificate - Rust DublinA look inside the European Covid Green Certificate - Rust Dublin
A look inside the European Covid Green Certificate - Rust Dublin
 
Monoliths to the cloud!
Monoliths to the cloud!Monoliths to the cloud!
Monoliths to the cloud!
 
The senior dev
The senior devThe senior dev
The senior dev
 
Node.js: scalability tips - Azure Dev Community Vijayawada
Node.js: scalability tips - Azure Dev Community VijayawadaNode.js: scalability tips - Azure Dev Community Vijayawada
Node.js: scalability tips - Azure Dev Community Vijayawada
 
A look inside the European Covid Green Certificate (Codemotion 2021)
A look inside the European Covid Green Certificate (Codemotion 2021)A look inside the European Covid Green Certificate (Codemotion 2021)
A look inside the European Covid Green Certificate (Codemotion 2021)
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti Serverless
 
Finding a lost song with Node.js and async iterators - NodeConf Remote 2021
Finding a lost song with Node.js and async iterators - NodeConf Remote 2021Finding a lost song with Node.js and async iterators - NodeConf Remote 2021
Finding a lost song with Node.js and async iterators - NodeConf Remote 2021
 
Finding a lost song with Node.js and async iterators - EnterJS 2021
Finding a lost song with Node.js and async iterators - EnterJS 2021Finding a lost song with Node.js and async iterators - EnterJS 2021
Finding a lost song with Node.js and async iterators - EnterJS 2021
 
How to send gzipped requests with boto3
How to send gzipped requests with boto3How to send gzipped requests with boto3
How to send gzipped requests with boto3
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

AWS Observability Made Simple

  • 1. AWS Observability made simple Eóin Shanaghy - Luciano Mammino AWS Community Day - November 11th 2021 fth.link/o11y-simple
  • 2. Hi! I’m Eoin 🙂 CTO aiasaservicebook.com @eoins eoins ✉ Get in touch
  • 3. 👋 Hello, I am Luciano Senior architect nodejsdesignpatterns.com Let’s connect: 🌎 loige.co 🐦 @loige 🎥 loige 🧳 lucianomammino
  • 4. We are business focused technologists that deliver. Accelerated Serverless | AI as a Service | Platform Modernisation We are hiring! Let’s have a chat 🙂
  • 5. Check out our new Podcast! awsbites.com
  • 7. Observability in the cloud a measure of how well internal states of a system can be inferred from knowledge of its external outputs 🪵 🔍 📈 🚨 Structured Logs Tracing Metrics Alarms “
  • 8. A typical case study ⚡ Serverless app ● Distributed system (100s of components) 🔌 HTTP APIs using ● Lambda ● DynamoDB ● API Gateway ● Cognito 🧱 Multiple services / stacks 🏁 Using SLIC Starter (fth.link/slic) 173 resources!
  • 9. A typical case study ⚽ The goal: know about problems before users do How? 📝 Structured Logs 📐 Metrics 🔔 Alarms 📊 Dashboards 🗺 Traces (X-Ray)
  • 10. Can we test our observability? 󰝊 We run a stress test ○ Simulate traffic using the integration test ○ Run the test a number of times in parallel (in a loop) ○ Exercises all the APIs with typical use cases (login, CRUD operations, etc.) 🚨 After 10-15 minutes, we started to get alarms...
  • 12. Making sense of alerts
  • 13. Initial Hypothesis 🛑 We got throttled (DynamoDB write throttle events) ↪ 🔁 causing AWS SDK retries (in the Lambda function) ↪ ⏱ causing Lambda timeouts ↪ 👎 causing API Gateway 502 🧪 How do we validate this? 1. Check the timeout cause ➡ Lambda metrics/logs 2. Check the Lambda error cause ➡ Lambda logs 3. Identify the source of 5xx errors in API Gateway ➡ X-Ray 4. Check the DynamoDB metrics ➡ Dashboards
  • 15. Checking timeouts ● Check lambda timeouts ○ Duration metrics (aggregated data) ○ Logs (individual requests) ● Logs Insights give us duration for each individual request. We can use this to isolate the logs for just that request. ● We use stats to see how many executions are affected.
  • 21. Conclusions 🌡 Symptom 🐞 Problem 󰟿 Resolution 1 DynamoDB throttles Table with low provisioned WCUs (write capacity) Switch table to PAY_PER_REQUEST Add throttling in API Gateway to limit potential cost impact 2 API 502 Errors Lambda Timeouts Throttles caused DynamoDB retries with exponential backoff - up to 50 seconds of retry Change maxRetries to 3 (350ms max retry) 3 API 500 Errors Attempt to update a missing record - problem with integration test! Fix the integration test to ensure deletion occurs after other actions complete. Also improved the API design
  • 23. What we have learned so far 󰠅 ● We were able to identify, understand and fix these errors quite quickly ● We didn’t have to change the code to do that ● Nor did we run it locally with a debugger ● All of this was possible because we configured observability tools in AWS in advance
  • 24. AWS native o11y = CloudWatch Cloudwatch gives you: ➔ Logs with Insights ➔ Metrics ➔ Dashboards ➔ Alarms ➔ Canaries ➔ Distributed tracing (with X-Ray)
  • 25. Alternatives outside AWS Established New entrants Roll your own (only for the brave)
  • 26. CloudWatch out of the box 😍 A toolkit you can use to build observability 🤩 Metrics are automatically generated for all services! 😟 Lots of dashboards, but by service and not by application! 😢 Zero alarms out of the box!
  • 27. Getting the best out of Cloudwatch Cloudwatch can be your friend if you... 📚 Research and understand available metrics 📐 Decide thresholds 📊 Write IaC for application dashboards ⏰ Write IaC for service metric alarms ⏪ Update every time your application changes 📋 Copy and paste for each stack in your application (a.k.a. A LOT OF WORK!)
  • 28. Best practices 😇 AWS Well Architected Framework 🏛 5 Pillars ⚙ Operational excellence pillar covers observability 🧐 Serverless lens applies these pillars 👍 Good guidance on metrics to observe 👎 More reading and research + you still have to pick thresholds
  • 29. CloudFormation for CloudWatch Alarms 😬 "Type": "AWS::CloudWatch::Alarm", "Properties": { "ActionsEnabled": true, "AlarmActions": [ "arn:aws:sns:eu-west-1:665863320777:FTSLICAlarms" ], "AlarmName": "LambdaThrottles_serverless-test-project-dev-hello", "AlarmDescription": "Throttles % for serverless-test-project-dev-hello ..", "EvaluationPeriods": 1, "ComparisonOperator": "GreaterThanThreshold", "Threshold": 0, "TreatMissingData": "notBreaching", "Metrics": [ { "Id": "throttles_pc", "Expression": "(throttles / throttles + invocations) * 100", "Label": "% Throttles", "ReturnData": true }, { "Id": "throttles", "MetricStat": { "Metric": { "Namespace": "AWS/Lambda", "MetricName": "Throttles", "Dimensions": [ { "Name": "FunctionName", "Value": "serverless-test-project-dev-hello" } ] }, "Period": 60, "Stat": "Sum" }, "ReturnData": false }, { "Id": "invocations", "MetricStat": { "Metric": { "Namespace": "AWS/Lambda", "MetricName": "Invocations",
  • 30. Can we automate this? Magically generated alarms and dashboards for each application!
  • 32. How SLIC Watch works 🛠 Your app serverless.yml sls deploy CloudFormation stack very-big.json SLIC Watch 👀 🛠 CloudFormation stack ++ even-bigger.json Deploy ☁ 📊📈
  • 38. After SLIC Watch Check out SLIC Slack
  • 39. Configuration 🎀 SLIC Watch comes with sane defaults 📝 You can configure what you don’t like 🔌 Or disable specific dashboards or alarms
  • 40. How to get started 📣 Create an SNS Topic as the alarm destination (optional) 📦 ❯ npm install serverless-slic-watch-plugin --save-dev ✍ Update serverless.yml ⚙ Configure (optional) 🚢 ❯ sls deploy plugins: - serverless-slic-watch-plugin 💡 Check out the complete example project in the repo!
  • 41. Wrapping up 🎁 ★ If your services are failing you definitely want to know about it! ★ Observability can save you from hundreds of hours of blind debugging! ★ CloudWatch is the go to tool in AWS but you have to configure it! ★ Automation can take most of the configuration pain away ★ SLIC Watch can give you this automation ★ You still have control and flexibility 🔬Try it out! 🗣 Give feedback! 🌈 Let’s make it better! fth.link/slic-watch
  • 42. Thank you! fth.link/o11y-simple Cover picture by Markus Spiske on Unsplash