The slides for my talk given at DevOpsDays Edinburgh 2018. What is important when monitoring serverless systems and how can we leverage the distributed tracing to do it better.
37. How much time do you really spend?
Our own code API calls
38. A Real Life Example
Scanning CloudWatch logs using AWS Lambda – every 5 minutes
A new Lambda is spawned for every customer’s function (async)
Sounds simple and fun! PollSpawn (async)
CloudWatch
Lets first talk about distributed systems
In recent years we have seen a trend of moving from monolithic applications
Which are application made of a single component.
Holds all of your business logic, features and code
To microservices.
In the micro service architecture, your system is composed of many small components
Each in charge on one of your business logic or "domains".
This is called "distributed system" because there is no centralised control, each service can act on each on,
And they are communicating with each other via either sync or async channels.
Has nice advantages:
Scaling
Deployment
When I say serverless you think of Faas
Most cloud providers have an offering
What is Faas
No infrastructure to manage
Serverless ecosystem - dynamodb
Fass in the Serverless ecosystem
Let me start off by saying I think Serverless is great.
In this talk I am going to talk about challenges monitoring these application,
But that does not mean that I discourage you from using them
Advantages:
Development velocity - focus on your own business logic, iterate faster
Autoscaling - usually autoscales pretty easily, because invoking one lambda is the same as invoking 1000
Pay per use - Can be a real cost saver, you pay only for what you use
We call these groups of events transactions
Collecting and connecting these traces is what we call distributed tracing.
Not relevant only to Serverless but to any system with distributed components from a certain complexity level.
A transaction tells the story of a request or data as it propagates through the distributed system.
It is actually a directed acyclic graph of events.
Each one of them describes a segment of work in the chain.
<talk about the example>
So now that we have the data, we want to display it
2 common ways:
Timeline view. We can see all the events next to each other,
When each started and finished and some extra data on each
This is an example for Jagger, which is a distributed tracing platform, and this is it's UI.
Another common way is to display the transaction as a graph
Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph
Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph
Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph
Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph
Lets try to perform root cause analysis here, so we can see how critical this is
First thing we think is to do manual tracing
High maintenance - 2 lines of code becomes 5
High risk of error - whatever is left for a human to do will have an error
And that does not mean I don't like humans
First thing we think is to do manual tracing
High maintenance - 2 lines of code becomes 5
High risk of error - whatever is left for a human to do will have an error
And that does not mean I don't like humans
In serverless applications this becomes even more problematic
How can we use the fact that we are running in a Serverless environment to make this process easier?
The Serverless environment has
- Strongly defined
- well known
APIs. We can leverage that to automate the tracing process
This way we can maintain our development speed
While still being able to understand our production environment.
The last think I want to talk about is optimization
In serverless it is especially important because in serverless - <click>
Time is money
Not just a phrase - you pay for any millisecond your code executes.
The question is where do we really spend our time. Can be divided to 2 buckets
<explain>
I want to give you 2 examples of why it is so important to understand the REAL behaviour of your functions
And how it is using the APIs