Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Performance monitoring and call tracing in microservice environments


Published on

Performance analysis can easily be done with on-board tools of nearly any programming language. In microservice environments, the real challenge is not in single, high-performing services, but in resiliently running a complex ecosystem of many services.This talk will introduce open-source tools for analysis and call tracing. Concluding, we will briefly get to know Dynatrace Ruxit - a commercial alternative. After this session, the audience will know about how to get started in performance analysis and call-tracing and some according tools.

Published in: Technology
  • Be the first to comment

Performance monitoring and call tracing in microservice environments

  1. 1. Performance Analysis and Call Tracing in Microservice environments Martin Gutenbrunner Dynatrace Innovation Lab @MartinGoowell Microservice Meetup Berlin – 2016-06-30
  2. 2. About me  Started with Commodore 8-bit (VC-20 and C-64)  Built Null-Modem connections for playing Doom and WarCraft I  Went on to IPX/SPX networks between MS-DOS 6.22 and WfW 3.11  Did DevOps before it was a thing (mainly Java and Web) for ~ 10 years  Now at Dynatrace Innovation Lab  Tech Lead for Azure and Microservices  Find me on Twitter: @MartinGoodwell Passionate about life, technology and the people behind both of them.
  3. 3. Agenda  Traditional monitoring  What‘s wrong with it?  Performance in your code  The dramatic dilemma  Happy end @MartinGoodwell
  4. 4. Questions  Please, ask and interrupt anytime!  What‘s your occupation?  Dev, Ops, BinExec?  What‘s your technology stack?  Java, .net  Node.js  Who of you knows what APM is/does?
  5. 5. A lil` bit o`history  Traditional monitoring was for Ops only  APM (incl. Call Tracing) is also for devs, debugging, pre-prod @MartinGoodwell
  6. 6. Monitoring @MartinGoodwell
  7. 7. Host performance  CPU-usage  Memory-usage  Disk IO  Network performance @MartinGoodwell Nagios
  8. 8. What‘s wrong with it?  Nothing is wrong  Some things might just be out of scope  No insight into your application‘s performance @MartinGoodwell
  9. 9. Performance in your code a.k.a. Application Performance Management @MartinGoodwell
  10. 10. Add monitoring code @MartinGoodwell
  11. 11. Use statsd @MartinGoodwell
  12. 12. statsd real quick @MartinGoodwell
  13. 13. Use JMX @MartinGoodwell
  14. 14. @MartinGoodwell
  15. 15. Aspect oriented programming @MartinGoodwell
  16. 16. Graphite Visualization @MartinGoodwell
  17. 17. Any downsides here?  Basic approaches are subject to polluting your code  AOP is the better choice, but requires advanced skills  If you‘re not using something like statsd, it‘s hard to have a central spot for all your performance data of different components  Great for performance insights of single components  What about 3rd parties?  Or distributed systems?  Like, microservices, maybe @MartinGoodwell
  18. 18. What about components which we can‘t modify? like databases, message queues, ... @MartinGoodwell
  19. 19.  Best case: use readily available APIs or integrations (statsd, JMX, etc)  For open-source: apply same technique as to your own code  Keeping in sync with original code can become tedious  try to make your changes part of the original project  Use dedicated monitoring tools  Very common for databases  BUT even the best tool is an additional tool  How long does it take to get a new team member up-to-speed? @MartinGoodwell
  20. 20. Microservices @MartinGoodwell
  21. 21. Microservices vs SOA  Microservices  fit the scope of a single application  Service Oriented Architecture  is scoped to fit enterprises / environments / infrastructures @MartinGoodwell
  22. 22.  For a dev, microservices hardly pose any downsides  On the upside, the code-size and scope of the domain becomes smaller  Any best practices for analyzing performance of a single microservice are still valid  The real challenge of microservices is proper operation @MartinGoodwell
  23. 23. What‘s the challenge about monitoring microservice?  The big challenge of well performing microservices is the communication between the microservices  Not in the high-performance of a single microservice  Tracing calls between services is very difficult @MartinGoodwell
  24. 24. @MartinGoodwell Source:
  25. 25. Call Tracing @MartinGoodwell
  26. 26. @MartinGoodwell Source:
  27. 27. @MartinGoodwell Source:
  28. 28. In Java @MartinGoodwell
  29. 29. C# approach-to-track-correlation-ids-through-microservices/ @MartinGoodwell
  30. 30. Leverage on existing tools @MartinGoodwell
  31. 31. Spring Cloud Sleuth @MartinGoodwell Sleuth: Spring Cloud Sleuth implements a distributed tracing solution for Spring Cloud.
  32. 32. @MartinGoodwell Zipkin
  33. 33. @MartinGoodwell Trace
  34. 34. So, here we got everything we need?  Usually, one tracing solution only covers a single technology  Besides visualization, you‘ll also want log analysis  ELK stack does this really well, especially in connection with correlation Ids  But ELK stack does no visualization  And your visualization does no log analysis   yet another tool  Don‘t get me started about integrating all this with host monitoring...  The trace ends, where your code ends  No correlation IDs for database calls @MartinGoodwell
  35. 35. What‘s next? @MartinGoodwell
  36. 36. Considerations for custom implementations  Multitude of languages  Open-source tools can get expensive  Manual configuration  Often only applicable to a single technology  Keep the pace with new technology  Serverless code (eg AWS Lambda, Azure Functions) @MartinGoodwell
  37. 37.  why-you-probably-shouldnt @MartinGoodwell
  38. 38. The Ops‘ dilemma how to handle all this in production how to identify production issues how to tell the devs, what they should look into, w/o tearing down everything @MartinGoodwell
  39. 39. All fine?  While the Dev can leverage on a huge number of tools, libs and frameworks, it‘s still up to the Ops to integrate it into a single, unified, well-integrated solution that allows to draw the right conclusions @MartinGoodwell
  40. 40. From Dev to Prod Dev  Single transaction  Deal with a specific problem  No impact on real users and business  Can concentrate on single component  „perfect world“  A dev‘s deadline is made of Sprints  A couple of weeks, usually Ops  100s or 1000s of transactions  No idea, what the prob is  Slow or bad requests impact real users and business  Lots of components that might not be under your control  An Op‘s deadline is made of SLAs  Hours, maybe just minutes @MartinGoodwell
  41. 41. The Dev-Ops-Dev-Ops-Dev-Ops dilemma Dev Ops @MartinGoodwell Sprint (days / weeks) SLA (hours / minutes)
  42. 42. From Prod to Dev Dev  Single transaction  Deal with a specific problem  No impact on real users and business  Can concentrate on single component  „perfect world“ Ops  100s or 1000s of transactions  No idea, what the prob is  Slow or bad requests impact real users and business  Lots of components that might not be under your control Which? Which? Time! Reproduce ? @MartinGoodwell
  43. 43. Commercial solutions Dynatrace Ruxit @MartinGoodwell
  44. 44. @MartinGoodwell
  45. 45. Dynatrace Ruxit @MartinGoodwell
  46. 46. Set-up in 5 minutes  Install a single monitoring agent per host  Everything is auto-detected  No changes to your source-code  No changes to runtime configuration  Supports a wide array of technologies  @MartinGoodwell
  47. 47. Traditional metrics @MartinGoodwell
  48. 48. Service metrics @MartinGoodwell
  49. 49. Does not end at your custom components @MartinGoodwell
  50. 50. Baselining  Automatically detects and correlates problems without setting thresholds @MartinGoodwell
  51. 51. Includes the Client-side  Browser auto-injection  Includes client-side JavaScript in traces and problem-correlation @MartinGoodwell
  52. 52. Visualization @MartinGoodwell
  53. 53. Call Tracing @MartinGoodwell
  54. 54. Solving a dilemma Include this URL in a trouble ticket and the Dev can jump in right away @MartinGoodwell
  55. 55. Supporting most popular technologies • Java • .NET • Node.js • PHP • Databases via • JDBC • ADO.NET • PDO • Message Queues • Caches • Cloud Infrastructure Metrics • See more at @MartinGoodwell
  56. 56. Dynatrace Ruxit 2016 hours for free @MartinGoodwell
  57. 57. References    execution-time-tutorial/  ids-through-microservices/  continuous-delivery-pipeline-part-iii-logging/  microservices/  micro-service-world-c2f3d7549c47#.93r1dj6ah @MartinGoodwell