SlideShare a Scribd company logo
SELF-HEALING SERVERLESS APPLICATIONS
PDX Serverless Meetup June 2018
NATE TAGGART
AWS | LAMBDA FEATURES PAGE
AWS Lambda invokes your code only when
needed and automatically scales to support the
rate of incoming requests without requiring you
to configure anything. There is no limit to the
number of requests your code can handle.
The Promise:
SELF-HEALING SERVERLESS APPLICATIONS | PG2
AWS | LAMBDA FEATURES PAGE
The Reality:
AWS Lambda invokes your code only when
needed and automatically scales to support the
rate of incoming requests without requiring you
to configure anything. There is no limit to the
number of requests your code can handle.
s
architecture
sometimes
certain
s
es
every
can
but
areproperly
^
(suggested edits)
SELF-HEALING SERVERLESS APPLICATIONS | PG3
What to expect

when you’re not expecting.
SELF-HEALING SERVERLESS APPLICATIONS | PG4
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
SELF-HEALING SERVERLESS APPLICATIONS | PG5
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
Synchronous invocations:
• Function fails
• Returns error to caller
• Logs timestamp, error message,
& stack trace to CloudWatch
Asynchronous invocations:
• Retries up to three times (or
more if reading from a stream)
• Caller is unaware of error
• Logs timestamp, error message,
& stack trace to CloudWatch
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
An event triggers your
Lambda to run, but raises
an unhandled exception
in your code.
SELF-HEALING SERVERLESS APPLICATIONS | PG6
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
Synchronous invocations:
• Lambda returns error to caller
(if client hasn’t timed out)
• Logs timestamp and error
message to CloudWatch
Asynchronous invocations:
• Retries up to three times (more
if reading from stream)
• Caller is unaware of error
• Logs timestamp & error
message to CloudWatch
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
An event triggers your
Lambda to run, but
execution does not
complete within the
configured maximum
execution time.
(Lambda’s default
configuration is a 

3-second timeout.)
SELF-HEALING SERVERLESS APPLICATIONS | PG7
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
When noisy:
• Behaves as Uncaught
Exception
• Visible in CloudWatch, but may
be difficult to diagnose without
event visibility
When silent:
• Unexpected application
behavior
• Can be lost permanently
• Can tank performance and
dramatically spike costs
An event triggers your
Lambda to run, but the
message is malformed or
state is improperly
provided causing
unexpected behavior.
SELF-HEALING SERVERLESS APPLICATIONS | PG8
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
Unbuffered invocations:
• Fails to invoke
• No retry
• Visible in CloudWatch metrics,
but not in logs
Buffered invocations:
• Initially fails to invoke
• Will eventually continue
reading from stream as volume
drops
Your application becomes
throttled as more Lambda
instances are required
than are allowed to be
concurrently running by
AWS for your account.
Your compute can’t scale
high enough.
SELF-HEALING SERVERLESS APPLICATIONS | PG9
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
Unbuffered invocations:
• Fails to invoke
• No retry
• Visible in CloudWatch metrics,
nothing in logs

(but really non-obvious)
Buffered invocations:
• Initially fails to invoke
• Will eventually continue
reading from stream as volume
drops
Your application becomes
throttled as more new
Lambda instances are
required than are allowed
to spawn by AWS for your
account.
Your compute can’t scale
fast enough.
SELF-HEALING SERVERLESS APPLICATIONS | PG10
FAILURE TYPES DESCRIPTION
Common Serverless Failures
FOR LAMBDA-BASED ARCHITECTURES
DEFAULT BEHAVIOR
• Runtime Error:
• Uncaught Exception
• Timeout
• Bad State
• Scaling:
• Concurrency Limits
• Spawn Limits
• Bottlenecking
Upstream bottlenecks:
• Fails to invoke
• No retry
• Visible in CloudWatch, as long
as you know where to look
Downstream bottlenecks:
• Can throw error, timeout, 

and/or distribute failures to
other functions.
• Can cause cascading failures
• Can tank performance and
dramatically spike costs
Your application is
throttled due to
throughput pressure
upstream or downstream
of your Lambda.
Your architecture can’t
scale enough.
SELF-HEALING SERVERLESS APPLICATIONS | PG11
Introducing:
Self-Healing Serverless Applications
SELF-HEALING SERVERLESS APPLICATIONS | PG12
Self-Healing Design Principles
LEADING PRACTICES FOR RESILIENT SYSTEMS
STANDARDIZE FAIL GRACEFULLY
• Reroute and unblock
• Automate known
solutions
• Notify a human
SELF-HEALING SERVERLESS APPLICATIONS | PG13
Learn to fail.
• Introduce universal
instrumentation
• Collect event-centric
diagnostics
• Give everyone visibility
PLAN FOR FAILURE
• Identify service limits
• Use self-throttling
• Consider alternative
resource types
SELF-HEALING SERVERLESS APPLICATIONS | PG14
Scenario: Uncaught Exceptions
WHEN THINGS BREAK AND YOU DON’T KNOW WHY
PROBLEM
Lambda periodically fails.
Error messages and stack
traces are visible in
CloudWatch logs. Failing
events are lost, making
reproduction difficult.
KEY PRINCIPLES
• Introduce universal
instrumentation
• Collect event-centric
diagnostics
• Give everyone visibility
SOLUTION
• Use function wrapper or
decorator pattern
• Capture and log events
which fail
SELF-HEALING SERVERLESS APPLICATIONS | PG15
Decrease time to resolution by capturing event data.
Event Diagnostics Wrapper Example
SELF-HEALING SERVERLESS APPLICATIONS | PG16
WHEN YOUR LAMBDAS AREN’T GETTING INVOKED
PROBLEM
API Gateway hits
throughput limits and fails
to invoke Lambda on
every request.
KEY PRINCIPLES
• Identify service limits
• Use self-throttling
• Notify a human
SOLUTION
• Implement retries with
exponential backoff
logic for 429 responses
• Raise alarm on:
4XXError
Scenario: Upstream bottleneck
SELF-HEALING SERVERLESS APPLICATIONS | PG17
Don’t overlook client-side solutions to backend failures.
SELF-HEALING SERVERLESS APPLICATIONS | PG18
WHEN EXECUTION TAKES TOO LONG
PROBLEM
Lambda is periodically
timing out.
KEY PRINCIPLES
• Introduce universal
instrumentation
• Use self-throttling
• Consider alternative
resource types
SOLUTION
• Use function wrapper or
decorator pattern
• Evaluate Fargate or
alternative long-running
resources
Scenario: Timeouts
SELF-HEALING SERVERLESS APPLICATIONS | PG19
Enforce your own limits.
Timeout Wrapper Example
SELF-HEALING SERVERLESS APPLICATIONS | PG20
WHEN FAILURES ARE BLOCKING THE REST OF THE STREAM
PROBLEM
Lambda exceptions and/or
timeouts are blocking
processing of a Kinesis
shard.
KEY PRINCIPLES
• Reroute and unblock
• Automate known
solutions
• Consider alternative
resource types
SOLUTION
• Introduce state machine-
type logic
• Move bad messages to
alternate stream
• Potentially architect with
Fargate or SNS
Scenario: Stream processing gets “stuck”
SELF-HEALING SERVERLESS APPLICATIONS | PG21
Small failures are preferable to large ones.
PROBLEM
Your Lambdas have scaled
up but are depleting your
RDS database connection
pools.
KEY PRINCIPLES
• Identify service limits
• Automate known
solutions
• Give everyone visibility
SOLUTION
• Always close database
connections
• Scale your database
• Map your dependencies
Scenario: Downstream bottleneck
WHEN LAMBDA IS OUT-SCALING YOUR DATABASE
SELF-HEALING SERVERLESS APPLICATIONS | PG22
Scale dependencies, too.
Want to try?
app.stackery.io/sign-up
TIMEOUT HANDLING UNIVERSAL INSTRUMENTATION
ERROR RE-ROUTINGBUILD STANDARDIZATION
EVENT DIAGNOSTICS SHARED HEALTH DASHBOARD
@stackeryio

More Related Content

What's hot

Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Matt Tesauro
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Client-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command PatternClient-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command Pattern
pgt technology scouting GmbH
 
(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise
Amazon Web Services
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
Cesar Cardenas Desales
 
Netflix conductor
Netflix conductorNetflix conductor
Netflix conductor
Viren Baraiya
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
Alexander Dean
 
DevOps Days Tel Aviv - Serverless Architecture
DevOps Days Tel Aviv - Serverless ArchitectureDevOps Days Tel Aviv - Serverless Architecture
DevOps Days Tel Aviv - Serverless Architecture
Antons Kranga
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
LivePerson
 
Top conf serverlezz
Top conf   serverlezzTop conf   serverlezz
Top conf serverlezz
Antons Kranga
 
Technology | Serverless
Technology | ServerlessTechnology | Serverless
Technology | Serverless
Ani Sinanaj
 
Going serverless with aws
Going serverless with awsGoing serverless with aws
Going serverless with aws
Alex Landa
 
Next generation pipelines
Next generation pipelinesNext generation pipelines
Next generation pipelines
Alex Landa
 
Paasta: Application Delivery at Yelp
Paasta: Application Delivery at YelpPaasta: Application Delivery at Yelp
Paasta: Application Delivery at Yelp
C4Media
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
Wayne Scarano
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
Peter Bakas
 
Fast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda FunctionsFast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda Functions
Kp Krishnamoorthy
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
The future of paas is serverless
The future of paas is serverlessThe future of paas is serverless
The future of paas is serverless
Yan Cui
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
Todd Palino
 

What's hot (20)

Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Client-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command PatternClient-Server-Kommunikation mit dem Command Pattern
Client-Server-Kommunikation mit dem Command Pattern
 
(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise
 
Writing and deploying serverless python applications
Writing and deploying serverless python applicationsWriting and deploying serverless python applications
Writing and deploying serverless python applications
 
Netflix conductor
Netflix conductorNetflix conductor
Netflix conductor
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
DevOps Days Tel Aviv - Serverless Architecture
DevOps Days Tel Aviv - Serverless ArchitectureDevOps Days Tel Aviv - Serverless Architecture
DevOps Days Tel Aviv - Serverless Architecture
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Top conf serverlezz
Top conf   serverlezzTop conf   serverlezz
Top conf serverlezz
 
Technology | Serverless
Technology | ServerlessTechnology | Serverless
Technology | Serverless
 
Going serverless with aws
Going serverless with awsGoing serverless with aws
Going serverless with aws
 
Next generation pipelines
Next generation pipelinesNext generation pipelines
Next generation pipelines
 
Paasta: Application Delivery at Yelp
Paasta: Application Delivery at YelpPaasta: Application Delivery at Yelp
Paasta: Application Delivery at Yelp
 
Meetup callback
Meetup callbackMeetup callback
Meetup callback
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Fast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda FunctionsFast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda Functions
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
The future of paas is serverless
The future of paas is serverlessThe future of paas is serverless
The future of paas is serverless
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 

Similar to PDX Serverless Meetup - Self-Healing Serverless Applications

Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
Amazon Web Services
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
Tensult
 
AWS Jungle - Lambda
AWS Jungle - LambdaAWS Jungle - Lambda
AWS Jungle - Lambda
Squash Apps Pvt Ltd
 
Operating Your Production API
Operating Your Production APIOperating Your Production API
Operating Your Production API
Amazon Web Services
 
Going Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGardenGoing Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGarden
Jay Gandhi
 
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Stephen Liedig: Building Serverless Backends with AWS Lambda and API GatewayStephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Steve Androulakis
 
Building serverless backends - Tech talk 5 May 2017
Building serverless backends - Tech talk 5 May 2017Building serverless backends - Tech talk 5 May 2017
Building serverless backends - Tech talk 5 May 2017
ARDC
 
Operating your Production API
Operating your Production APIOperating your Production API
Operating your Production API
Amazon Web Services
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Dhaval Nagar
 
ServerlessPresentation
ServerlessPresentationServerlessPresentation
ServerlessPresentationRohit Kumar
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
Amazon Web Services
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
Amazon Web Services
 
Building Resilient Serverless Systems with Non-Serverless Components
Building Resilient Serverless Systems with Non-Serverless ComponentsBuilding Resilient Serverless Systems with Non-Serverless Components
Building Resilient Serverless Systems with Non-Serverless Components
Jeremy Daly
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
Dima Pasko
 
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Amazon Web Services
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of Serverless
Pipedrive
 
What's New with AWS Lambda
What's New with AWS LambdaWhat's New with AWS Lambda
What's New with AWS Lambda
Amazon Web Services
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
Dhaval Nagar
 
Serverless Architectures and Continuous Delivery
Serverless Architectures and Continuous DeliveryServerless Architectures and Continuous Delivery
Serverless Architectures and Continuous Delivery
Robin Weston
 
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ....NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
NETFest
 

Similar to PDX Serverless Meetup - Self-Healing Serverless Applications (20)

Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
AWS Jungle - Lambda
AWS Jungle - LambdaAWS Jungle - Lambda
AWS Jungle - Lambda
 
Operating Your Production API
Operating Your Production APIOperating Your Production API
Operating Your Production API
 
Going Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGardenGoing Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGarden
 
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Stephen Liedig: Building Serverless Backends with AWS Lambda and API GatewayStephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
 
Building serverless backends - Tech talk 5 May 2017
Building serverless backends - Tech talk 5 May 2017Building serverless backends - Tech talk 5 May 2017
Building serverless backends - Tech talk 5 May 2017
 
Operating your Production API
Operating your Production APIOperating your Production API
Operating your Production API
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
 
ServerlessPresentation
ServerlessPresentationServerlessPresentation
ServerlessPresentation
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
 
Building Resilient Serverless Systems with Non-Serverless Components
Building Resilient Serverless Systems with Non-Serverless ComponentsBuilding Resilient Serverless Systems with Non-Serverless Components
Building Resilient Serverless Systems with Non-Serverless Components
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
 
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
 
The Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of ServerlessThe Good, Bad and Ugly of Serverless
The Good, Bad and Ugly of Serverless
 
What's New with AWS Lambda
What's New with AWS LambdaWhat's New with AWS Lambda
What's New with AWS Lambda
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
 
Serverless Architectures and Continuous Delivery
Serverless Architectures and Continuous DeliveryServerless Architectures and Continuous Delivery
Serverless Architectures and Continuous Delivery
 
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ....NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
 

Recently uploaded

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 

Recently uploaded (20)

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 

PDX Serverless Meetup - Self-Healing Serverless Applications

  • 1. SELF-HEALING SERVERLESS APPLICATIONS PDX Serverless Meetup June 2018 NATE TAGGART
  • 2. AWS | LAMBDA FEATURES PAGE AWS Lambda invokes your code only when needed and automatically scales to support the rate of incoming requests without requiring you to configure anything. There is no limit to the number of requests your code can handle. The Promise: SELF-HEALING SERVERLESS APPLICATIONS | PG2
  • 3. AWS | LAMBDA FEATURES PAGE The Reality: AWS Lambda invokes your code only when needed and automatically scales to support the rate of incoming requests without requiring you to configure anything. There is no limit to the number of requests your code can handle. s architecture sometimes certain s es every can but areproperly ^ (suggested edits) SELF-HEALING SERVERLESS APPLICATIONS | PG3
  • 4. What to expect
 when you’re not expecting. SELF-HEALING SERVERLESS APPLICATIONS | PG4
  • 5. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR SELF-HEALING SERVERLESS APPLICATIONS | PG5 • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking
  • 6. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR Synchronous invocations: • Function fails • Returns error to caller • Logs timestamp, error message, & stack trace to CloudWatch Asynchronous invocations: • Retries up to three times (or more if reading from a stream) • Caller is unaware of error • Logs timestamp, error message, & stack trace to CloudWatch • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking An event triggers your Lambda to run, but raises an unhandled exception in your code. SELF-HEALING SERVERLESS APPLICATIONS | PG6
  • 7. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR Synchronous invocations: • Lambda returns error to caller (if client hasn’t timed out) • Logs timestamp and error message to CloudWatch Asynchronous invocations: • Retries up to three times (more if reading from stream) • Caller is unaware of error • Logs timestamp & error message to CloudWatch • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking An event triggers your Lambda to run, but execution does not complete within the configured maximum execution time. (Lambda’s default configuration is a 
 3-second timeout.) SELF-HEALING SERVERLESS APPLICATIONS | PG7
  • 8. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking When noisy: • Behaves as Uncaught Exception • Visible in CloudWatch, but may be difficult to diagnose without event visibility When silent: • Unexpected application behavior • Can be lost permanently • Can tank performance and dramatically spike costs An event triggers your Lambda to run, but the message is malformed or state is improperly provided causing unexpected behavior. SELF-HEALING SERVERLESS APPLICATIONS | PG8
  • 9. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking Unbuffered invocations: • Fails to invoke • No retry • Visible in CloudWatch metrics, but not in logs Buffered invocations: • Initially fails to invoke • Will eventually continue reading from stream as volume drops Your application becomes throttled as more Lambda instances are required than are allowed to be concurrently running by AWS for your account. Your compute can’t scale high enough. SELF-HEALING SERVERLESS APPLICATIONS | PG9
  • 10. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking Unbuffered invocations: • Fails to invoke • No retry • Visible in CloudWatch metrics, nothing in logs
 (but really non-obvious) Buffered invocations: • Initially fails to invoke • Will eventually continue reading from stream as volume drops Your application becomes throttled as more new Lambda instances are required than are allowed to spawn by AWS for your account. Your compute can’t scale fast enough. SELF-HEALING SERVERLESS APPLICATIONS | PG10
  • 11. FAILURE TYPES DESCRIPTION Common Serverless Failures FOR LAMBDA-BASED ARCHITECTURES DEFAULT BEHAVIOR • Runtime Error: • Uncaught Exception • Timeout • Bad State • Scaling: • Concurrency Limits • Spawn Limits • Bottlenecking Upstream bottlenecks: • Fails to invoke • No retry • Visible in CloudWatch, as long as you know where to look Downstream bottlenecks: • Can throw error, timeout, 
 and/or distribute failures to other functions. • Can cause cascading failures • Can tank performance and dramatically spike costs Your application is throttled due to throughput pressure upstream or downstream of your Lambda. Your architecture can’t scale enough. SELF-HEALING SERVERLESS APPLICATIONS | PG11
  • 13. Self-Healing Design Principles LEADING PRACTICES FOR RESILIENT SYSTEMS STANDARDIZE FAIL GRACEFULLY • Reroute and unblock • Automate known solutions • Notify a human SELF-HEALING SERVERLESS APPLICATIONS | PG13 Learn to fail. • Introduce universal instrumentation • Collect event-centric diagnostics • Give everyone visibility PLAN FOR FAILURE • Identify service limits • Use self-throttling • Consider alternative resource types
  • 15. Scenario: Uncaught Exceptions WHEN THINGS BREAK AND YOU DON’T KNOW WHY PROBLEM Lambda periodically fails. Error messages and stack traces are visible in CloudWatch logs. Failing events are lost, making reproduction difficult. KEY PRINCIPLES • Introduce universal instrumentation • Collect event-centric diagnostics • Give everyone visibility SOLUTION • Use function wrapper or decorator pattern • Capture and log events which fail SELF-HEALING SERVERLESS APPLICATIONS | PG15 Decrease time to resolution by capturing event data.
  • 16. Event Diagnostics Wrapper Example SELF-HEALING SERVERLESS APPLICATIONS | PG16
  • 17. WHEN YOUR LAMBDAS AREN’T GETTING INVOKED PROBLEM API Gateway hits throughput limits and fails to invoke Lambda on every request. KEY PRINCIPLES • Identify service limits • Use self-throttling • Notify a human SOLUTION • Implement retries with exponential backoff logic for 429 responses • Raise alarm on: 4XXError Scenario: Upstream bottleneck SELF-HEALING SERVERLESS APPLICATIONS | PG17 Don’t overlook client-side solutions to backend failures.
  • 19. WHEN EXECUTION TAKES TOO LONG PROBLEM Lambda is periodically timing out. KEY PRINCIPLES • Introduce universal instrumentation • Use self-throttling • Consider alternative resource types SOLUTION • Use function wrapper or decorator pattern • Evaluate Fargate or alternative long-running resources Scenario: Timeouts SELF-HEALING SERVERLESS APPLICATIONS | PG19 Enforce your own limits.
  • 20. Timeout Wrapper Example SELF-HEALING SERVERLESS APPLICATIONS | PG20
  • 21. WHEN FAILURES ARE BLOCKING THE REST OF THE STREAM PROBLEM Lambda exceptions and/or timeouts are blocking processing of a Kinesis shard. KEY PRINCIPLES • Reroute and unblock • Automate known solutions • Consider alternative resource types SOLUTION • Introduce state machine- type logic • Move bad messages to alternate stream • Potentially architect with Fargate or SNS Scenario: Stream processing gets “stuck” SELF-HEALING SERVERLESS APPLICATIONS | PG21 Small failures are preferable to large ones.
  • 22. PROBLEM Your Lambdas have scaled up but are depleting your RDS database connection pools. KEY PRINCIPLES • Identify service limits • Automate known solutions • Give everyone visibility SOLUTION • Always close database connections • Scale your database • Map your dependencies Scenario: Downstream bottleneck WHEN LAMBDA IS OUT-SCALING YOUR DATABASE SELF-HEALING SERVERLESS APPLICATIONS | PG22 Scale dependencies, too.
  • 23. Want to try? app.stackery.io/sign-up TIMEOUT HANDLING UNIVERSAL INSTRUMENTATION ERROR RE-ROUTINGBUILD STANDARDIZATION EVENT DIAGNOSTICS SHARED HEALTH DASHBOARD