Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh

•Download as PPTX, PDF•

0 likes•109 views

The slides for my talk given at DevOpsDays Edinburgh 2018. What is important when monitoring serverless systems and how can we leverage the distributed tracing to do it better.

Technology

Distributed Tracing in
Serverless Systems
Gal Bashan, Epsagon

Serverless is great!
Pay per use Autoscaling Development velocity

> whoami
Gal Bashan (@BashanGal)
Distributed tracing @ Epsagon
Cyber security @ IDF
Tel Aviv

Monitoring - why do we need it?
Track system health Troubleshoot and fix Optimize
performance/cost

Track system health
System == Functions ?

The era of APIs
We want managed resources
Applications become
Highly distributed
Highly event-driven
Without access to any server!

Track system health
System > Functions !
Functions
APIs
Resources

Troubleshoot and Fix
Track system health Troubleshoot and fix Optimize
performance/cost

Troubleshoot and fix
Functions are not enough
Need: track asynchronous events

Implementing Distributed Tracing
Manual tracing
•Before/after calls
•At the end of each micro-service

Implementing Distributed Tracing
Manual tracing
•Before/after calls
•At the end of each micro-service
•High maintenance
•High potential of errors

Serverless apps are very distributed
Complex systems have thousands of functions
What about the developer velocity?

Unique challenges to Serverless
Timeouts
Out of memory
Cold starts
Retries
Concurrency limit

Can it be done differently
in Serverless?

Automation can help to keep
up with the
development speed of
Serverless

Optimizations
Track system health Troubleshoot and fix Optimize
performance/cost

How much time do you really spend?
Our own code API calls

A Real Life Example
Scanning CloudWatch logs using AWS Lambda – every 5 minutes
A new Lambda is spawned for every customer’s function (async)
Sounds simple and fun! PollSpawn (async)
CloudWatch

As time flies…
CloudWatch became highly throttled 
requests took a very long time 
5K concurrent Lambdas, for 5 minutes, every 5
minutes
!!!!

Summing it up
Track system health Troubleshoot and fix Optimize
performance/cost

What's hot

Observability and DevOps ImprovementsHussain Mansoor

Deploy Fast Without Breaking Things Webinar Presentation June 25Serena Software

AppNeta: Challenges of Monitoring the Remote Office in the Hybrid-Cloud EraAppNeta

Puppet camp 2015 phoenix david pattersonPuppet

AppNeta: SD-WAN & End User ExperiencePaul Davenport

What's new in FME 2019: FME DesktopGIM_nv

Matthew Lewter - DemystifiedInfusionsoft

JIRA for Asset Management - Dan HorsfallAtlassian

Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"Splunk

AppNeta: Know your entire Application LandscapeAppNeta

What's hot (10)

Observability and DevOps Improvements

Deploy Fast Without Breaking Things Webinar Presentation June 25

AppNeta: Challenges of Monitoring the Remote Office in the Hybrid-Cloud Era

Puppet camp 2015 phoenix david patterson

AppNeta: SD-WAN & End User Experience

What's new in FME 2019: FME Desktop

Matthew Lewter - Demystified

JIRA for Asset Management - Dan Horsfall

Data-Drive DevOps: Mining Machine Data for "Metrics that Matter"

AppNeta: Know your entire Application Landscape

Similar to Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh

(DVO205) Monitoring Evolution: Flying Blind to Flying by InstrumentAmazon Web Services

Camunda Day Amsterdam 2019: Workflow Automation in Microservices Architecture...camunda services GmbH

Combining logs, metrics, and traces for unified observabilityElasticsearch

5 Years Of Building SaaS On AWSChristian Beedgen

Fault and performance mangementEchelon Edge Pvt Ltd

DevOps Underground - Microservices Monitoringkloia

How to handle errors and retries in a stateless environment - Nitzan Shapira ...DevOpsDays Tel Aviv

Why Startups Need Automated InfrastructuresAdam Jacob

Kafka Summit 2018: Monitoring and Orchestration of Your Microservices Landsca...Bernd Ruecker

The Big Picture: Monitoring and Orchestration of Your Microservices Landscape...confluent

Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui

Distributesd Tracing in Serverless Systems - Shannon Hogue, Epsagon - Cloud N...Cloud Native Day Tel Aviv

Why Your Start Up Needs An Automated Infrastructure Presentationelliando dias

Building Automated Infrastructureselliando dias

Building An Automated Infrastructureelliando dias

Compliance Automation with Inspec Part 1Chef

How to assess the risks in your SAP systems at the push of a buttonVirtual Forge

LsmwRohit Thakur

The 3 aspects of network performance managementManageEngine

Leveraging Microservice Architectures & Event-Driven Systems for Global APIsconfluent

Similar to Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh (20)

(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument

Camunda Day Amsterdam 2019: Workflow Automation in Microservices Architecture...

Combining logs, metrics, and traces for unified observability

5 Years Of Building SaaS On AWS

Fault and performance mangement

DevOps Underground - Microservices Monitoring

How to handle errors and retries in a stateless environment - Nitzan Shapira ...

Why Startups Need Automated Infrastructures

Kafka Summit 2018: Monitoring and Orchestration of Your Microservices Landsca...

The Big Picture: Monitoring and Orchestration of Your Microservices Landscape...

Trailblazer Community - Flows Workshop (Session 2)

Distributesd Tracing in Serverless Systems - Shannon Hogue, Epsagon - Cloud N...

Why Your Start Up Needs An Automated Infrastructure Presentation

Building Automated Infrastructures

Building An Automated Infrastructure

Compliance Automation with Inspec Part 1

How to assess the risks in your SAP systems at the push of a button

Lsmw

The 3 aspects of network performance management

Leveraging Microservice Architectures & Event-Driven Systems for Global APIs

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Build your next Gen AI Breakthrough - April 2024Neo4j

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Install Stable Diffusion in windows machinePadma Pradeep

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

DMCC Future of Trade Web3 - Special Edition

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Unblocking The Main Thread Solving ANRs and Frozen Frames

Connect Wave/ connectwave Pitch Deck Presentation

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Are Multi-Cloud and Serverless Good or Bad?

Streamlining Python Development: A Guide to a Modern Project Setup

Understanding the Laravel MVC Architecture

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Build your next Gen AI Breakthrough - April 2024

Injustice - Developers Among Us (SciFiDevCon 2024)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Install Stable Diffusion in windows machine

Unleash Your Potential - Namagunga Girls Coding Club

Pigging Solutions Piggable Sweeping Elbows

Benefits Of Flutter Compared To Other Frameworks

Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh

1. Distributed Tracing in Serverless Systems Gal Bashan, Epsagon

2. Let’s talk Distributed systems

3. From a monolith…

4. To microservices!

5. Let’s talk Serverless

6. Serverless is great! Pay per use Autoscaling Development velocity

7. > whoami Gal Bashan (@BashanGal) Distributed tracing @ Epsagon Cyber security @ IDF Tel Aviv

8. Monitoring - why do we need it? Track system health Troubleshoot and fix Optimize performance/cost

9. Slow down! Let’s go one by one…

10. Track system health System == Functions ?

11. The era of APIs We want managed resources Applications become Highly distributed Highly event-driven Without access to any server!

12. theburningmonk.com

13. System != Functions theburningmonk.com

14. Track system health System > Functions ! Functions APIs Resources

15. Troubleshoot and Fix Track system health Troubleshoot and fix Optimize performance/cost

16.

17.

18.

19.

20. Troubleshoot and fix Functions are not enough Need: track asynchronous events

21. Transactions

22. Distributed Tracing

23. Distributed Tracing

24. Distributed Tracing

25. Distributed Tracing

26. Distributed Tracing

27. Distributed Tracing

28. Distributed Tracing

29. Implementing Distributed Tracing Manual tracing •Before/after calls •At the end of each micro-service

30. Implementing Distributed Tracing Manual tracing •Before/after calls •At the end of each micro-service •High maintenance •High potential of errors

31. Serverless apps are very distributed Complex systems have thousands of functions What about the developer velocity?

32. Unique challenges to Serverless Timeouts Out of memory Cold starts Retries Concurrency limit

33. Can it be done differently in Serverless?

34. Automation can help to keep up with the development speed of Serverless

35. Optimizations Track system health Troubleshoot and fix Optimize performance/cost

36. In Serverless Time is Money

37. How much time do you really spend? Our own code API calls

38. A Real Life Example Scanning CloudWatch logs using AWS Lambda – every 5 minutes A new Lambda is spawned for every customer’s function (async) Sounds simple and fun! PollSpawn (async) CloudWatch

39. $$$$$$$$$$$$$$$$ Then one day...

40. As time flies… CloudWatch became highly throttled  requests took a very long time  5K concurrent Lambdas, for 5 minutes, every 5 minutes !!!!

41. Another Example 702ms

42. Business Flows Read Post Submit Post

43. Business Flows

44. Summing it up Track system health Troubleshoot and fix Optimize performance/cost

45. Thank you! gal@epsagon.com

Editor's Notes

Lets first talk about distributed systems In recent years we have seen a trend of moving from monolithic applications
Which are application made of a single component. Holds all of your business logic, features and code
To microservices. In the micro service architecture, your system is composed of many small components Each in charge on one of your business logic or "domains". This is called "distributed system" because there is no centralised control, each service can act on each on, And they are communicating with each other via either sync or async channels. Has nice advantages: Scaling Deployment
When I say serverless you think of Faas Most cloud providers have an offering What is Faas No infrastructure to manage Serverless ecosystem - dynamodb Fass in the Serverless ecosystem
Let me start off by saying I think Serverless is great. In this talk I am going to talk about challenges monitoring these application, But that does not mean that I discourage you from using them Advantages: Development velocity - focus on your own business logic, iterate faster Autoscaling - usually autoscales pretty easily, because invoking one lambda is the same as invoking 1000 Pay per use - Can be a real cost saver, you pay only for what you use
We call these groups of events transactions
Collecting and connecting these traces is what we call distributed tracing. Not relevant only to Serverless but to any system with distributed components from a certain complexity level. A transaction tells the story of a request or data as it propagates through the distributed system. It is actually a directed acyclic graph of events. Each one of them describes a segment of work in the chain. <talk about the example> So now that we have the data, we want to display it 2 common ways:
Timeline view. We can see all the events next to each other, When each started and finished and some extra data on each This is an example for Jagger, which is a distributed tracing platform, and this is it's UI.
Another common way is to display the transaction as a graph Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph Lets try to perform root cause analysis here, so we can see how critical this is
Another common way is to display the transaction as a graph Lets try to perform root cause analysis here, so we can see how critical this is
First thing we think is to do manual tracing High maintenance - 2 lines of code becomes 5 High risk of error - whatever is left for a human to do will have an error And that does not mean I don't like humans
First thing we think is to do manual tracing High maintenance - 2 lines of code becomes 5 High risk of error - whatever is left for a human to do will have an error And that does not mean I don't like humans
In serverless applications this becomes even more problematic
How can we use the fact that we are running in a Serverless environment to make this process easier?
The Serverless environment has - Strongly defined - well known APIs. We can leverage that to automate the tracing process This way we can maintain our development speed While still being able to understand our production environment.
The last think I want to talk about is optimization In serverless it is especially important because in serverless - <click> Time is money
Not just a phrase - you pay for any millisecond your code executes.
The question is where do we really spend our time. Can be divided to 2 buckets <explain> I want to give you 2 examples of why it is so important to understand the REAL behaviour of your functions And how it is using the APIs

Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh

Similar to Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh (20)

Recently uploaded

Recently uploaded (20)

Distributed Tracing in Serverless Systems - DevOpsDays Edinburgh

Editor's Notes