Distributed systems
Observability
Elastic Stack
Jaeger Tracing
Distributed system
>
Monolithic systems
Distributed tracing
Netflix – microservices system
Distributed tracing
Nowadays all system are distributed
Distributed tracing
Lorem ipsum dolor sit
6
Distributed system – logical view
Distributed tracing
Observability
>
Observability
Distributed tracing
Monitoring
Dashboards
Thresholds
Interactive
Alerting
Event
based
Trigger
actions
Logging
Centralize
logs
Aggregate
Interactive
Tracing
Request
based
Debugging
Cross-
Platform
Monitoring
Interactive Tools:
• Graphite & Grafana
• Elastic stack with Kibana UI
• Icinga Dashboards
• Oracle Enterprise Manager
• Kafka Manager
• …
Distributed tracing
Distributed tracing
Alerting with icinga
Distributed tracing
Alerting
Main tool for alerting is Icinga
Distributed tracing
Log aggregation/analytics
>
Log aggregation
Elastic stack
Distributed tracing
Source: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
Several applications logs into one big index
Classical simple view
Distributed tracing
Amount of specific payload types increases
Mapping explosion can cause out of memory errors and difficult situations to recover from
index.mapping.total_fields.limit
The maximum number of fields in an index. Field and object mappings, as well as
field aliases count towards this limit. The default value is 1000.
Many applications
Distributed tracing
• granular configuration for disk space and history per component
• dashboards are faster
• no problem with mapping explosion
• no problem with same name but different type fields
Separated logstash index per component
Distributed tracing
One curator action per component
• delete using indices size for a specific component
• delete using amount of indices for some
• delete using date
Example:
Housekeeping with Curator
Distributed tracing
ILM replaces most of the basic Curator functionality.
But! ILM does not support deletion of oldest index of a group of indices sorted by a pattern and
based on overall size.
See: https://github.com/elastic/elasticsearch/issues/44001
Index Lifecycle management (ILM)
Distributed tracing
Demo
Distributed tracing
>
Distributed tracing
Distributed tracing takes a request-centric view.
"What happened to my request?"
It captures the detailed execution of important
activities performed by the components of a
distributed system as it processes a given
request.
Tracing infrastructure attaches contextual
metadata to each request and ensures that
metadata is passed around during the request
execution.
Distributed tracing
Vendor-neutral APIs and instrumentation for distributed tracing
opentracing.io
Distributed tracing
Source:https://medium.com/opentracing/towards-turnkey-distributed-tracing-5f4297d1736
jaegertracing.io
OpenTracing compatible data model and instrumentation libraries in
• Go, Java, Python, …
Multiple storage backends: Cassandra, Elasticsearch, memory.
Modern Web UI
Cloud Native Deployments
Not a full replacement for automatic profiler
Not a dynamic instrumentation
Distributed tracing
jaegertracing.io
Span
A span represents a logical unit of work in Jaeger that has an operation name, the start time of
the operation, and the duration.
Trace
A trace is a data/execution path
through the system.
Terminology
Distributed tracing
Source: https://www.jaegertracing.io/docs/1.13/architecture/
Trace Timeline
Distributed tracing
Trace Graph
Distributed tracing
jaegertracing architecture
Distributed tracing
Distributed tracing
Distributed tracing
Implementation details
3rd party libraries
• OpenTracing Cassandra Driver Instrumentation (https://github.com/opentracing-contrib/java-
cassandra-driver)
• OpenTracing Spring Web Instrumentation (https://github.com/opentracing-contrib/java-spring-web)
• OpenTracing Feign Instrumentation (https://github.com/OpenFeign/feign-opentracing)
• OpenTracing JAX-RS Instrumentation (https://github.com/opentracing-contrib/java-jaxrs)
Custom libraries
• Integration library for applications in tomcat (extint in DE and INT) and in weblogic (DE and INT)
https://pb-git.intra.loyaltypartner.com/projects/LIBRARIES/repos/opentracing-jee/browse
• Custom Spring Boot integration library with support for Kafka producers and consumers
(based on https://github.com/opentracing-contrib/java-kafka-client)
Distributed tracing
Code snippets
Tracer initialization
Distributed tracing
public static JaegerConfig fromConfiguration(final String service, final Configuration configuration) {
final boolean enabled = configuration.getBoolean("jaeger.enabled", false);
if (enabled) {
return JaegerConfig.enabled( //
service, configuration.getString("jaeger.endpoint"),
configuration.getInteger("jaeger.maxPacketSize", null), //
configuration.getInteger("jaeger.flushInterval", null), //
configuration.getInteger("jaeger.maxQueueSize", null), //
configuration.getInteger("jaeger.probabilityPercent", null) //
);
} else {
return JaegerConfig.disabled();
}
}
final Tracer tracer = JaegerBootstrapUtil.createTracer(jaegerConfig);
GlobalTracer.register(tracer);
public static Tracer createTracer(final JaegerConfig cfg) {
if (cfg.isEnabled()) {
final Sender sender = createSender(cfg);
final RemoteReporter reporter = createRemoteReporter(cfg, sender);
final Sampler sampler = createSampler(cfg);
return new JaegerTracer.Builder(cfg.getService()) //
.withReporter(reporter) //
.withSampler(sampler) //
.build();
} else {
LOGGER.info("Jaeger is disabled");
return NoopTracerFactory.create();
}
}
The sampling decision will be propagated with the requests.
Sampling
Distributed tracing
public static Sampler createSampler(final JaegerConfig cfg) {
if (cfg.getProbabilityPercent() == 100) {
LOGGER.info("Sending all spans to jaeger");
return new ConstSampler(true);
} else if (cfg.getProbabilityPercent() == 0) {
LOGGER.info("Sending no spans to jaeger");
return new ConstSampler(false);
} else {
LOGGER.info("Sending {}% of spans to jaeger", cfg.getProbabilityPercent());
return new ProbabilisticSampler(((double) cfg.getProbabilityPercent()) / 100);
}
}
Code examples
Distributed tracing
…
@Interceptors({CompositeOpenTracingInterceptor.class, MethodValidationInterceptor.class, LoggingInterceptor.class})
public class IcmLoyaltyOrderServiceBean {
…
/**
* Open-Tracing for EJBs that belong to the composite layer.
*/
public class CompositeOpenTracingInterceptor extends OpenTracingInterceptor {
@Override
protected void addSpanTags(final InvocationContext ctx, final RequestContext requestContext,
final SpanContext parent, final Span span) {
super.addSpanTags(ctx, requestContext, parent, span);
COMPONENT.set(span, "composite");
}
}
Interceptor example
Distributed tracing
public class OpenTracingInterceptor {
@AroundInvoke
public Object trace(final InvocationContext ctx) throws Exception {
final Tracer tracer = GlobalTracer.get();
final RequestContext requestContext = getRequestContext(ctx);
final SpanContext parent = null;
if (tracer.activeSpan()!= null) {
parent = activeSpan.context();
} else {
if (requestContext instanceof SpanContextTransporter) {
parent = ((SpanContextTransporter) requestContext).getSpanContext();
}
}
final Tracer.SpanBuilder spanBuilder = tracer.buildSpan(ctx.getMethod().getName());
if (parent != null) {
spanBuilder.asChildOf(parent);
}
try (final Scope scope = spanBuilder.startActive(true)) {
CLASS_NAME.set(span, determineClassName(ctx.getTarget()));
METHOD_NAME.set(span, ctx.getMethod().getName());
if (parent == null && requestContext != null) {
REQUEST_CONTEXT_ID.set(span, requestContext.getId());
}
try {
return dispatchTracedCall(ctx);
} catch (final Exception e) {
final Span span = scope.span();
Tags.ERROR.set(span, true);
span.log(e.getMessage());
throw e;
}
}
}
.....
Demo
Error information stored in Jeager
Analyzing errors with Jeager
Analyzing errors with Jeager
Distributed tracing
Jaeger UI view of two traces A and B being compared structurally in the graph form
Compare traces
Distributed tracing
Compare traces
Distributed tracing
OpenTracing APM java agent exists:
https://github.com/elastic/apm-agent-java
But!
Documentation for Elastic APM OpenTracing bridge:
Elastic APM
Distributed tracing
Stay curious
Keep exploring
Distributed tracing

Microservices observability

  • 1.
  • 2.
  • 3.
  • 4.
    Netflix – microservicessystem Distributed tracing
  • 5.
    Nowadays all systemare distributed Distributed tracing
  • 6.
  • 7.
    Distributed system –logical view Distributed tracing
  • 8.
  • 10.
  • 11.
    Monitoring Interactive Tools: • Graphite& Grafana • Elastic stack with Kibana UI • Icinga Dashboards • Oracle Enterprise Manager • Kafka Manager • … Distributed tracing
  • 12.
  • 14.
  • 15.
    Alerting Main tool foralerting is Icinga Distributed tracing
  • 16.
  • 17.
    Log aggregation Elastic stack Distributedtracing Source: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
  • 18.
    Several applications logsinto one big index Classical simple view Distributed tracing
  • 19.
    Amount of specificpayload types increases Mapping explosion can cause out of memory errors and difficult situations to recover from index.mapping.total_fields.limit The maximum number of fields in an index. Field and object mappings, as well as field aliases count towards this limit. The default value is 1000. Many applications Distributed tracing
  • 20.
    • granular configurationfor disk space and history per component • dashboards are faster • no problem with mapping explosion • no problem with same name but different type fields Separated logstash index per component Distributed tracing
  • 21.
    One curator actionper component • delete using indices size for a specific component • delete using amount of indices for some • delete using date Example: Housekeeping with Curator Distributed tracing
  • 22.
    ILM replaces mostof the basic Curator functionality. But! ILM does not support deletion of oldest index of a group of indices sorted by a pattern and based on overall size. See: https://github.com/elastic/elasticsearch/issues/44001 Index Lifecycle management (ILM) Distributed tracing
  • 23.
  • 27.
  • 28.
    Distributed tracing Distributed tracingtakes a request-centric view. "What happened to my request?" It captures the detailed execution of important activities performed by the components of a distributed system as it processes a given request. Tracing infrastructure attaches contextual metadata to each request and ensures that metadata is passed around during the request execution. Distributed tracing
  • 29.
    Vendor-neutral APIs andinstrumentation for distributed tracing opentracing.io Distributed tracing Source:https://medium.com/opentracing/towards-turnkey-distributed-tracing-5f4297d1736
  • 30.
    jaegertracing.io OpenTracing compatible datamodel and instrumentation libraries in • Go, Java, Python, … Multiple storage backends: Cassandra, Elasticsearch, memory. Modern Web UI Cloud Native Deployments Not a full replacement for automatic profiler Not a dynamic instrumentation Distributed tracing
  • 31.
    jaegertracing.io Span A span representsa logical unit of work in Jaeger that has an operation name, the start time of the operation, and the duration. Trace A trace is a data/execution path through the system. Terminology Distributed tracing Source: https://www.jaegertracing.io/docs/1.13/architecture/
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Implementation details 3rd partylibraries • OpenTracing Cassandra Driver Instrumentation (https://github.com/opentracing-contrib/java- cassandra-driver) • OpenTracing Spring Web Instrumentation (https://github.com/opentracing-contrib/java-spring-web) • OpenTracing Feign Instrumentation (https://github.com/OpenFeign/feign-opentracing) • OpenTracing JAX-RS Instrumentation (https://github.com/opentracing-contrib/java-jaxrs) Custom libraries • Integration library for applications in tomcat (extint in DE and INT) and in weblogic (DE and INT) https://pb-git.intra.loyaltypartner.com/projects/LIBRARIES/repos/opentracing-jee/browse • Custom Spring Boot integration library with support for Kafka producers and consumers (based on https://github.com/opentracing-contrib/java-kafka-client) Distributed tracing
  • 37.
  • 38.
    Tracer initialization Distributed tracing publicstatic JaegerConfig fromConfiguration(final String service, final Configuration configuration) { final boolean enabled = configuration.getBoolean("jaeger.enabled", false); if (enabled) { return JaegerConfig.enabled( // service, configuration.getString("jaeger.endpoint"), configuration.getInteger("jaeger.maxPacketSize", null), // configuration.getInteger("jaeger.flushInterval", null), // configuration.getInteger("jaeger.maxQueueSize", null), // configuration.getInteger("jaeger.probabilityPercent", null) // ); } else { return JaegerConfig.disabled(); } } final Tracer tracer = JaegerBootstrapUtil.createTracer(jaegerConfig); GlobalTracer.register(tracer); public static Tracer createTracer(final JaegerConfig cfg) { if (cfg.isEnabled()) { final Sender sender = createSender(cfg); final RemoteReporter reporter = createRemoteReporter(cfg, sender); final Sampler sampler = createSampler(cfg); return new JaegerTracer.Builder(cfg.getService()) // .withReporter(reporter) // .withSampler(sampler) // .build(); } else { LOGGER.info("Jaeger is disabled"); return NoopTracerFactory.create(); } }
  • 39.
    The sampling decisionwill be propagated with the requests. Sampling Distributed tracing public static Sampler createSampler(final JaegerConfig cfg) { if (cfg.getProbabilityPercent() == 100) { LOGGER.info("Sending all spans to jaeger"); return new ConstSampler(true); } else if (cfg.getProbabilityPercent() == 0) { LOGGER.info("Sending no spans to jaeger"); return new ConstSampler(false); } else { LOGGER.info("Sending {}% of spans to jaeger", cfg.getProbabilityPercent()); return new ProbabilisticSampler(((double) cfg.getProbabilityPercent()) / 100); } }
  • 40.
    Code examples Distributed tracing … @Interceptors({CompositeOpenTracingInterceptor.class,MethodValidationInterceptor.class, LoggingInterceptor.class}) public class IcmLoyaltyOrderServiceBean { … /** * Open-Tracing for EJBs that belong to the composite layer. */ public class CompositeOpenTracingInterceptor extends OpenTracingInterceptor { @Override protected void addSpanTags(final InvocationContext ctx, final RequestContext requestContext, final SpanContext parent, final Span span) { super.addSpanTags(ctx, requestContext, parent, span); COMPONENT.set(span, "composite"); } }
  • 41.
    Interceptor example Distributed tracing publicclass OpenTracingInterceptor { @AroundInvoke public Object trace(final InvocationContext ctx) throws Exception { final Tracer tracer = GlobalTracer.get(); final RequestContext requestContext = getRequestContext(ctx); final SpanContext parent = null; if (tracer.activeSpan()!= null) { parent = activeSpan.context(); } else { if (requestContext instanceof SpanContextTransporter) { parent = ((SpanContextTransporter) requestContext).getSpanContext(); } } final Tracer.SpanBuilder spanBuilder = tracer.buildSpan(ctx.getMethod().getName()); if (parent != null) { spanBuilder.asChildOf(parent); } try (final Scope scope = spanBuilder.startActive(true)) { CLASS_NAME.set(span, determineClassName(ctx.getTarget())); METHOD_NAME.set(span, ctx.getMethod().getName()); if (parent == null && requestContext != null) { REQUEST_CONTEXT_ID.set(span, requestContext.getId()); } try { return dispatchTracedCall(ctx); } catch (final Exception e) { final Span span = scope.span(); Tags.ERROR.set(span, true); span.log(e.getMessage()); throw e; } } } .....
  • 42.
  • 43.
    Error information storedin Jeager Analyzing errors with Jeager
  • 44.
  • 45.
  • 46.
    Jaeger UI viewof two traces A and B being compared structurally in the graph form Compare traces Distributed tracing
  • 47.
  • 48.
    OpenTracing APM javaagent exists: https://github.com/elastic/apm-agent-java But! Documentation for Elastic APM OpenTracing bridge: Elastic APM Distributed tracing
  • 49.