Be the first to like this
ContainerDays 2018, Hamburg: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware) and Josef Fuchshuber (@fuchshuber, Principal Software Architect at QAware)
This is not just another “Oh look! I show you how to use Opentracing, Prometheus and EFK for distributed Hello World projects” talk. There are tons of great talks on this out there. Instead we present a case study of an observable large real-world cloud native application and share our key findings from a technical, functional and collaborational point of view. For typical monitoring / observability Sleuth, Prometheus and the EFK-Stack are perfect bulletproof tools. They are means to collect, store and analyze traces, metrics and logs. For technical monitoring of resources, e.g. memory and cpu consumption, we use the USE method described by Brendan Gregg  and for functional monitoring, e.g. use cases and business services, we use the RED method described by Tom Wilkie . Continuous end-to-end tests deployed along with the software system give us constant feedback about the software system. All relevant metrics are checked by automated alerts, defined in Grafana, which keep us up to date. In addition, we link all information (traces, logs, metrics) in order to gain as much knowledge as possible, e.g. add the trace id to every log event (called contextualize logging  or log correlation ). On top of our technical and functional monitoring we designed a so called collaborative monitoring. This means, that our observability tools are integrated in the standard tools of our audience, which is highly heterogeneous: Engineers, QA, Managers, Operations, Help Desk. The big benefit of having such a collaborative monitoring, is a better collaboration between the people around the project and also the machines. This, for example, allows us to build chatbots to easily interact with the software-system and everyone can jump directly to the traces, logs and metrics of a request and send them to a person that can provide help, if something bad happens. With this opportunities observability leads to an improvement of documentation, tickets, bug fix processes and communication all across the project. It was never easier to talk about a software system (Ok - This was just fun.). We show you our solution (also at code level) and talk about pros and cons.