REAL-WORLD APPLICATION OBSERVABILITY
11 PRACTICAL DEVELOPER FOCUSED TIPS
Victor Szoltysek
SHORT TERM
VERSION TRACKING
TIP #1 - MAKE IT EASY TO IDENTIFY CURRENTLY DEPLOYED VERSIONS


▸ Human Readable Build Versions (1.3.101)


▸ Major / Minor (hot
fi
x) / Build Number


▸ JENKINS_BUILD_NUMBER /
GITHUB_BUILD_NUMBER


▸ Date / Time of Build


▸ GIT Hash


▸ GIT Branch


▸ Filename
plugins {


	
id 'org.springframework.boot' version '2.3.1.RELEASE'


	
id 'io.spring.dependency-management' version '1.0.9.RELEASE'


	
id "com.gorylenko.gradle-git-properties" version "2.2.2"


	
id 'java'


}


version = "1.0.${System.env.BUILD_NUMBER ?: System.env.GITHUB_RUN_NUMBER ?: '0-SNAPSHOT'}"


springBoot {


	
buildInfo()


}
GRADLE CODE
LIGHT-WEIGHT ALERTING
TIP #2 - ADD INCOMING WEBHOOKS


▸ Slack / Teams


▸ Developers Need Access


▸ CI Deployments and Failure Noti
fi
cations


▸ Separate “Bot” Channel
MEAN TIME TO DETECTION (MTTD)
TIP #3 - IMMEDIATELY ALERT ON SERIOUS ERRORS


▸ Logback.xml


▸ Filter on Errors


▸ Use a Custom Slack / Teams / Email Appender


▸ Fix Errors or change them to Warnings
<springProperty scope="context" name="APP_NAME" source="vcap.application.name"
defaultValue="local_app"/>


<springProperty scope="context" name="APP_SPACE" source="vcap.application.space_name"
defaultValue="${HOSTNAME}"/>


<appender name="SLACK" class="com.github.maricn.logback.SlackAppender">


<webhookUri>${SLACK_INCOMING_WEB_HOOK}</webhookUri>


<layout class="ch.qos.logback.classic.PatternLayout">


<pattern>%-4relative [%thread] %-5level %class - %msg%n</pattern>


</layout>


<username>${APP_NAME}@${APP_SPACE}</username>


<iconEmoji>:rage:</iconEmoji>


</appender>


<appender name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender">


<appender-ref ref="SLACK" />


<filter class="ch.qos.logback.classic.filter.ThresholdFilter">


<level>ERROR</level>


</filter>


</appender>
LOGBACK.XML CODE
CLIENT SIDE JAVASCRIPT ISSUES
TIP #4 - LOG JAVASCRIPT UNCAUGHT EXCEPTIONS


▸ Especially important for SPA’s (ReactJS)


▸ Send to Backend


▸ Include UserAgent
<script>


window.onerror = function(message, url, lineno, colno, error) {


$.post({


url: 'logging/client-error',


contentType: 'text/plain',


data: 'Source:' + url + ':' + lineno + ' Error:' +


message + ' UserAgent:’ + navigator.userAgent


});


}


</script>
JAVASCRIPT CODE
3RD PARTY FAILURES
TIP #5 - NOTIFY RELEVANT TEAMS OF 3RD PARTY DEPENDENCY FAILURES


▸ i.e. Communication Failures, Database Failures,
etc


▸ Filter on Speci
fi
c Error and create a Custom
Appender


▸ Automatically Email the Team Manager
MEDIUM TERM
SCALING OBSERVABILITY
TIP #6 - ADD CENTRALIZED LOGGING


▸ Prefer SaaS Solutions


▸ Warning on DIY (ELK)


▸ Give Team Access / Ownership


▸ Consider Appenders instead of “Agents”
BETTER LOG SEARCHABILITY
TIP #7 - PREFER STRUCTURED LOGS


▸ Indexed Key-Value Pairs


▸ Examples:


▸ env=prod


▸ error_code=COMM_FAILURE


▸ user_id=123


▸ trace_id=234234
DISTRIBUTED LOGGING
TIP #8 - LINK RELATED DISTRIBUTED CALLS WITH TRACEIDS


▸ Also know as Corelation-Ids


▸ Spring Cloud Sleuth
BETTER LOG SEARCHABILITY
TIP #9 - IMPROVE “SIGNAL-TO-NOISE” RATIO


▸ Logging can get slow and expensive quickly!


▸ Prune unused logging


▸ “Entering method / exiting” method


▸ Focus on Business Value / Exceptions


▸ Consider alternatives to logging for high
volume, aggregatable, and numerical data …
LONG TERM
SCALING OBSERVABILITY FURTHER
TIP #10 - USE CENTRALIZED METRICS


▸ Use Micrometer (like logback but metrics)


▸ Out of the Box Metrics


▸ Prefer SaaS Solutions (Datadog, Humio, etc)


▸ Warning on DIY (Prometheus)


▸ humio.io has a “free” version


▸ Give Developers Access / Ownership


▸ Understand the difference between Logging and
Metrics


▸ Not Meant for “High Cardinality”
Metrics.counter("sample.counter").increment();


//Example event - { "avg": 168.837394, "max": 213.738641, "name": "sample_timer",


//“count”: 5, "sum": 844.186968 } (after 5 calls)


Metrics.summary("purchase","product_name",


getRandomPurchaseName()).record(getRandomPurchaseAmount());


//Example event - { "avg": 40, "max": 61, "name": "purchase", "count": 2,


//“sum”: 80, "product_name": "House" } (after 2 random calls)
METRICS JAVA CODE
END

Real-World Application Observability - 11 Practical Developer Focused Tips

  • 1.
    REAL-WORLD APPLICATION OBSERVABILITY 11PRACTICAL DEVELOPER FOCUSED TIPS Victor Szoltysek
  • 2.
  • 3.
    VERSION TRACKING TIP #1- MAKE IT EASY TO IDENTIFY CURRENTLY DEPLOYED VERSIONS ▸ Human Readable Build Versions (1.3.101) ▸ Major / Minor (hot fi x) / Build Number ▸ JENKINS_BUILD_NUMBER / GITHUB_BUILD_NUMBER ▸ Date / Time of Build ▸ GIT Hash ▸ GIT Branch ▸ Filename
  • 4.
    plugins { id 'org.springframework.boot'version '2.3.1.RELEASE' id 'io.spring.dependency-management' version '1.0.9.RELEASE' id "com.gorylenko.gradle-git-properties" version "2.2.2" id 'java' } version = "1.0.${System.env.BUILD_NUMBER ?: System.env.GITHUB_RUN_NUMBER ?: '0-SNAPSHOT'}" springBoot { buildInfo() } GRADLE CODE
  • 5.
    LIGHT-WEIGHT ALERTING TIP #2- ADD INCOMING WEBHOOKS ▸ Slack / Teams ▸ Developers Need Access ▸ CI Deployments and Failure Noti fi cations ▸ Separate “Bot” Channel
  • 6.
    MEAN TIME TODETECTION (MTTD) TIP #3 - IMMEDIATELY ALERT ON SERIOUS ERRORS ▸ Logback.xml ▸ Filter on Errors ▸ Use a Custom Slack / Teams / Email Appender ▸ Fix Errors or change them to Warnings
  • 7.
    <springProperty scope="context" name="APP_NAME"source="vcap.application.name" defaultValue="local_app"/> <springProperty scope="context" name="APP_SPACE" source="vcap.application.space_name" defaultValue="${HOSTNAME}"/> <appender name="SLACK" class="com.github.maricn.logback.SlackAppender"> <webhookUri>${SLACK_INCOMING_WEB_HOOK}</webhookUri> <layout class="ch.qos.logback.classic.PatternLayout"> <pattern>%-4relative [%thread] %-5level %class - %msg%n</pattern> </layout> <username>${APP_NAME}@${APP_SPACE}</username> <iconEmoji>:rage:</iconEmoji> </appender> <appender name="ASYNC_SLACK" class="ch.qos.logback.classic.AsyncAppender"> <appender-ref ref="SLACK" /> <filter class="ch.qos.logback.classic.filter.ThresholdFilter"> <level>ERROR</level> </filter> </appender> LOGBACK.XML CODE
  • 8.
    CLIENT SIDE JAVASCRIPTISSUES TIP #4 - LOG JAVASCRIPT UNCAUGHT EXCEPTIONS ▸ Especially important for SPA’s (ReactJS) ▸ Send to Backend ▸ Include UserAgent
  • 9.
    <script> window.onerror = function(message,url, lineno, colno, error) { $.post({ url: 'logging/client-error', contentType: 'text/plain', data: 'Source:' + url + ':' + lineno + ' Error:' + message + ' UserAgent:’ + navigator.userAgent }); } </script> JAVASCRIPT CODE
  • 10.
    3RD PARTY FAILURES TIP#5 - NOTIFY RELEVANT TEAMS OF 3RD PARTY DEPENDENCY FAILURES ▸ i.e. Communication Failures, Database Failures, etc ▸ Filter on Speci fi c Error and create a Custom Appender ▸ Automatically Email the Team Manager
  • 11.
  • 12.
    SCALING OBSERVABILITY TIP #6- ADD CENTRALIZED LOGGING ▸ Prefer SaaS Solutions ▸ Warning on DIY (ELK) ▸ Give Team Access / Ownership ▸ Consider Appenders instead of “Agents”
  • 13.
    BETTER LOG SEARCHABILITY TIP#7 - PREFER STRUCTURED LOGS ▸ Indexed Key-Value Pairs ▸ Examples: ▸ env=prod ▸ error_code=COMM_FAILURE ▸ user_id=123 ▸ trace_id=234234
  • 14.
    DISTRIBUTED LOGGING TIP #8- LINK RELATED DISTRIBUTED CALLS WITH TRACEIDS ▸ Also know as Corelation-Ids ▸ Spring Cloud Sleuth
  • 15.
    BETTER LOG SEARCHABILITY TIP#9 - IMPROVE “SIGNAL-TO-NOISE” RATIO ▸ Logging can get slow and expensive quickly! ▸ Prune unused logging ▸ “Entering method / exiting” method ▸ Focus on Business Value / Exceptions ▸ Consider alternatives to logging for high volume, aggregatable, and numerical data …
  • 16.
  • 17.
    SCALING OBSERVABILITY FURTHER TIP#10 - USE CENTRALIZED METRICS ▸ Use Micrometer (like logback but metrics) ▸ Out of the Box Metrics ▸ Prefer SaaS Solutions (Datadog, Humio, etc) ▸ Warning on DIY (Prometheus) ▸ humio.io has a “free” version ▸ Give Developers Access / Ownership ▸ Understand the difference between Logging and Metrics ▸ Not Meant for “High Cardinality”
  • 18.
    Metrics.counter("sample.counter").increment(); //Example event -{ "avg": 168.837394, "max": 213.738641, "name": "sample_timer", //“count”: 5, "sum": 844.186968 } (after 5 calls) Metrics.summary("purchase","product_name", getRandomPurchaseName()).record(getRandomPurchaseAmount()); //Example event - { "avg": 40, "max": 61, "name": "purchase", "count": 2, //“sum”: 80, "product_name": "House" } (after 2 random calls) METRICS JAVA CODE
  • 19.