Performance monitoring and call tracing in microservice environments

Performance Analysis and
Call Tracing
in Microservice environments
Martin Gutenbrunner
Dynatrace Innovation Lab
@MartinGoowell
Microservice Meetup Berlin – 2016-06-30

About me
 Started with Commodore 8-bit (VC-20 and C-64)
 Built Null-Modem connections for playing Doom and WarCraft I
 Went on to IPX/SPX networks between MS-DOS 6.22 and
WfW 3.11
 Did DevOps before it was a thing (mainly Java and Web)
for ~ 10 years
 Now at Dynatrace Innovation Lab
 Tech Lead for Azure and Microservices
 Find me on Twitter: @MartinGoodwell
Passionate about life,
technology and the people
behind both of them.

Agenda
 Traditional monitoring
 What‘s wrong with it?
 Performance in your code
 The dramatic dilemma
 Happy end
@MartinGoodwell

Questions
 Please, ask and interrupt anytime!
 What‘s your occupation?
 Dev, Ops, BinExec?
 What‘s your technology stack?
 Java, .net
 Node.js
 Who of you knows what APM is/does?

A lil` bit o`history
 Traditional monitoring was for Ops only
 APM (incl. Call Tracing) is also for devs, debugging, pre-prod
@MartinGoodwell

Host performance
 CPU-usage
 Memory-usage
 Disk IO
 Network performance
@MartinGoodwell
Nagios

What‘s wrong with it?
 Nothing is wrong
 Some things might just be out of scope
 No insight into your application‘s performance
@MartinGoodwell

Performance in your code
a.k.a. Application Performance Management
@MartinGoodwell

Add monitoring code
@MartinGoodwell

statsd real quick
http://www.slideshare.net/DatadogSlides/dev-opsdays-tokyo2013effectivestatsdmonitoring
@MartinGoodwell

Aspect oriented programming
http://veerasundar.com/blog/2010/01/spring-aop-example-profiling-method-execution-time-tutorial/
@MartinGoodwell

Graphite Visualization
@MartinGoodwell

Any downsides here?
 Basic approaches are subject to polluting your code
 AOP is the better choice, but requires advanced skills
 If you‘re not using something like statsd, it‘s hard to have a central spot for
all your performance data of different components
 Great for performance insights of single components
 What about 3rd parties?
 Or distributed systems?
 Like, microservices, maybe
@MartinGoodwell

What about components which we
can‘t modify?
like databases, message queues, ...
@MartinGoodwell

 Best case: use readily available APIs or integrations (statsd, JMX, etc)
 For open-source: apply same technique as to your own code
 Keeping in sync with original code can become tedious
 try to make your changes part of the original project
 Use dedicated monitoring tools
 Very common for databases
 BUT even the best tool is an additional tool
 How long does it take to get a new team member up-to-speed?
@MartinGoodwell

Microservices vs SOA
 Microservices
 fit the scope of a single application
 Service Oriented Architecture
 is scoped to fit enterprises / environments / infrastructures
@MartinGoodwell

 For a dev, microservices hardly pose any downsides
 On the upside, the code-size and scope of the domain becomes smaller
 Any best practices for analyzing performance of a single microservice are still
valid
 The real challenge of microservices is proper operation
@MartinGoodwell

What‘s the challenge about monitoring
microservice?
 The big challenge of well performing microservices is the communication
between the microservices
 Not in the high-performance of a single microservice
 Tracing calls between services is very difficult
@MartinGoodwell

@MartinGoodwell
Source: http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/

In Java
https://taidevcouk.wordpress.com/category/experiments/
@MartinGoodwell

C#
http://theburningmonk.com/2015/05/a-consistent-
approach-to-track-correlation-ids-through-microservices/
@MartinGoodwell

Leverage on existing tools
https://github.com/ordina-jworks/microservices-dashboard
@MartinGoodwell

Spring Cloud Sleuth
@MartinGoodwell
Sleuth: https://github.com/spring-cloud/spring-cloud-sleuth
Spring Cloud Sleuth implements a distributed tracing
solution for Spring Cloud.

@MartinGoodwell
Trace
https://trace.risingstack.com/

So, here we got everything we need?
 Usually, one tracing solution only covers a single technology
 Besides visualization, you‘ll also want log analysis
 ELK stack does this really well, especially in connection with correlation Ids
 But ELK stack does no visualization
 And your visualization does no log analysis
  yet another tool
 Don‘t get me started about integrating all this with host monitoring...
 The trace ends, where your code ends
 No correlation IDs for database calls
@MartinGoodwell

What‘s next?
@MartinGoodwell

Considerations for custom
implementations
 Multitude of languages
 Open-source tools can get expensive
 Manual configuration
 Often only applicable to a single technology
 Keep the pace with new technology
 Serverless code (eg AWS Lambda, Azure Functions)
@MartinGoodwell

 http://de.slideshare.net/InfoQ/netflix-built-its-own-monitoring-system-and-
why-you-probably-shouldnt
@MartinGoodwell

The Ops‘ dilemma
how to handle all this in production
how to identify production issues
how to tell the devs, what they should look into, w/o tearing down everything
@MartinGoodwell

All fine?
 While the Dev can leverage on a huge number of tools, libs and frameworks,
it‘s still up to the Ops to integrate it into a single, unified, well-integrated
solution that allows to draw the right conclusions
@MartinGoodwell

From Dev to Prod
Dev
 Single transaction
 Deal with a specific problem
 No impact on real users and business
 Can concentrate on single component
 „perfect world“
 A dev‘s deadline is made of Sprints
 A couple of weeks, usually
Ops
 100s or 1000s of transactions
 No idea, what the prob is
 Slow or bad requests impact real
users and business
 Lots of components that might not
be under your control
 An Op‘s deadline is made of SLAs
 Hours, maybe just minutes
@MartinGoodwell

The Dev-Ops-Dev-Ops-Dev-Ops dilemma
Dev
Ops
@MartinGoodwell
Sprint
(days / weeks)
SLA
(hours / minutes)

From Prod to Dev
Dev
 Single transaction
 Deal with a specific problem
 No impact on real users and business
 Can concentrate on single component 
„perfect world“
Ops
 100s or 1000s of transactions
 No idea, what the prob is
 Slow or bad requests impact real users and
business
 Lots of components that might not be under
your control
Which?
Which?
Time!
Reproduce
?
@MartinGoodwell

Commercial solutions
Dynatrace Ruxit
@MartinGoodwell

Dynatrace Ruxit
@MartinGoodwell

Set-up in 5 minutes
 Install a single monitoring agent per host
 Everything is auto-detected
 No changes to your source-code
 No changes to runtime configuration
 Supports a wide array of technologies
 http://www.dynatrace.com/en/ruxit/technologies/
@MartinGoodwell

Traditional metrics
@MartinGoodwell

Service metrics
@MartinGoodwell

Does not end at your custom
components
@MartinGoodwell

Baselining
 Automatically detects and correlates problems without setting thresholds
@MartinGoodwell

Includes the Client-side
 Browser auto-injection
 Includes client-side JavaScript in traces and problem-correlation
@MartinGoodwell

Solving a dilemma
Include this URL in a
trouble ticket and the Dev
can jump in right away
@MartinGoodwell

Supporting most popular technologies
• Java
• .NET
• Node.js
• PHP
• Databases via
• JDBC
• ADO.NET
• PDO
• Message Queues
• Caches
• Cloud Infrastructure Metrics
• See more at
http://www.dynatrace.com/en/ruxit/technologies/
@MartinGoodwell

Dynatrace Ruxit
2016 hours for free
@MartinGoodwell
http://bit.ly/monitoring-2016

References
 https://www.nagios.org
 https://github.com/etsy/statsd/wiki
 http://veerasundar.com/blog/2010/01/spring-aop-example-profiling-method-
execution-time-tutorial/
 http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-
ids-through-microservices/
 http://apmblog.dynatrace.com/2014/06/17/software-quality-metrics-for-your-
continuous-delivery-pipeline-part-iii-logging/
 https://blog.buoyant.io/2016/05/17/distributed-tracing-for-polyglot-
microservices/
 https://blog.init.ai/distributed-tracing-the-most-wanted-and-missed-tool-in-the-
micro-service-world-c2f3d7549c47#.93r1dj6ah
@MartinGoodwell

Performance monitoring and call tracing in microservice environments

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Performance monitoring and call tracing in microservice environments

Similar to Performance monitoring and call tracing in microservice environments (20)

Recently uploaded

Recently uploaded (20)

Performance monitoring and call tracing in microservice environments