The document discusses challenges with monitoring microservice environments, including tracing calls between services. It describes how custom implementations can be complex due to different technologies. Commercial solutions like Dynatrace Ruxit provide unified monitoring with call tracing across technologies with minimal setup. They automatically detect issues without thresholds and include client-side monitoring.
Performance monitoring and call tracing in microservice environments
1. Performance Analysis and
Call Tracing
in Microservice environments
Martin Gutenbrunner
Dynatrace Innovation Lab
@MartinGoowell
Microservice Meetup Berlin – 2016-06-30
2. About me
Started with Commodore 8-bit (VC-20 and C-64)
Built Null-Modem connections for playing Doom and WarCraft I
Went on to IPX/SPX networks between MS-DOS 6.22 and
WfW 3.11
Did DevOps before it was a thing (mainly Java and Web)
for ~ 10 years
Now at Dynatrace Innovation Lab
Tech Lead for Azure and Microservices
Find me on Twitter: @MartinGoodwell
Passionate about life,
technology and the people
behind both of them.
3. Agenda
Traditional monitoring
What‘s wrong with it?
Performance in your code
The dramatic dilemma
Happy end
@MartinGoodwell
4. Questions
Please, ask and interrupt anytime!
What‘s your occupation?
Dev, Ops, BinExec?
What‘s your technology stack?
Java, .net
Node.js
Who of you knows what APM is/does?
5. A lil` bit o`history
Traditional monitoring was for Ops only
APM (incl. Call Tracing) is also for devs, debugging, pre-prod
@MartinGoodwell
17. Any downsides here?
Basic approaches are subject to polluting your code
AOP is the better choice, but requires advanced skills
If you‘re not using something like statsd, it‘s hard to have a central spot for
all your performance data of different components
Great for performance insights of single components
What about 3rd parties?
Or distributed systems?
Like, microservices, maybe
@MartinGoodwell
18. What about components which we
can‘t modify?
like databases, message queues, ...
@MartinGoodwell
19. Best case: use readily available APIs or integrations (statsd, JMX, etc)
For open-source: apply same technique as to your own code
Keeping in sync with original code can become tedious
try to make your changes part of the original project
Use dedicated monitoring tools
Very common for databases
BUT even the best tool is an additional tool
How long does it take to get a new team member up-to-speed?
@MartinGoodwell
21. Microservices vs SOA
Microservices
fit the scope of a single application
Service Oriented Architecture
is scoped to fit enterprises / environments / infrastructures
@MartinGoodwell
22. For a dev, microservices hardly pose any downsides
On the upside, the code-size and scope of the domain becomes smaller
Any best practices for analyzing performance of a single microservice are still
valid
The real challenge of microservices is proper operation
@MartinGoodwell
23. What‘s the challenge about monitoring
microservice?
The big challenge of well performing microservices is the communication
between the microservices
Not in the high-performance of a single microservice
Tracing calls between services is very difficult
@MartinGoodwell
30. Leverage on existing tools
https://github.com/ordina-jworks/microservices-dashboard
@MartinGoodwell
31. Spring Cloud Sleuth
@MartinGoodwell
Sleuth: https://github.com/spring-cloud/spring-cloud-sleuth
Spring Cloud Sleuth implements a distributed tracing
solution for Spring Cloud.
34. So, here we got everything we need?
Usually, one tracing solution only covers a single technology
Besides visualization, you‘ll also want log analysis
ELK stack does this really well, especially in connection with correlation Ids
But ELK stack does no visualization
And your visualization does no log analysis
yet another tool
Don‘t get me started about integrating all this with host monitoring...
The trace ends, where your code ends
No correlation IDs for database calls
@MartinGoodwell
36. Considerations for custom
implementations
Multitude of languages
Open-source tools can get expensive
Manual configuration
Often only applicable to a single technology
Keep the pace with new technology
Serverless code (eg AWS Lambda, Azure Functions)
@MartinGoodwell
38. The Ops‘ dilemma
how to handle all this in production
how to identify production issues
how to tell the devs, what they should look into, w/o tearing down everything
@MartinGoodwell
39. All fine?
While the Dev can leverage on a huge number of tools, libs and frameworks,
it‘s still up to the Ops to integrate it into a single, unified, well-integrated
solution that allows to draw the right conclusions
@MartinGoodwell
40. From Dev to Prod
Dev
Single transaction
Deal with a specific problem
No impact on real users and business
Can concentrate on single component
„perfect world“
A dev‘s deadline is made of Sprints
A couple of weeks, usually
Ops
100s or 1000s of transactions
No idea, what the prob is
Slow or bad requests impact real
users and business
Lots of components that might not
be under your control
An Op‘s deadline is made of SLAs
Hours, maybe just minutes
@MartinGoodwell
42. From Prod to Dev
Dev
Single transaction
Deal with a specific problem
No impact on real users and business
Can concentrate on single component
„perfect world“
Ops
100s or 1000s of transactions
No idea, what the prob is
Slow or bad requests impact real users and
business
Lots of components that might not be under
your control
Which?
Which?
Time!
Reproduce
?
@MartinGoodwell
46. Set-up in 5 minutes
Install a single monitoring agent per host
Everything is auto-detected
No changes to your source-code
No changes to runtime configuration
Supports a wide array of technologies
http://www.dynatrace.com/en/ruxit/technologies/
@MartinGoodwell