Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Developers and Observability - Icinga Camp Stockholm 2019

112 views

Published on

Talk by Anders Håål:
The foundation for application observability and monitoring starts with how we develop our applications and services. This talk is about how developers needs to start think about patterns and design to enable observability.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Developers and Observability - Icinga Camp Stockholm 2019

  1. 1. Developers & observability 1
  2. 2. Who am I Anders Håål software engineer and developer CTO @ opsdis @thenodon anders@opsdis.com www.bischeck.org “dynamic and adaptive thresholds”
  3. 3. Shifting focus
  4. 4. Application monitoring
  5. 5. An observable service Common metrics and events - Number of incoming request by api - Number of outgoing request by destination - Response time Domain specific - Number bookings - Number of failed bookings - Left booking cart - Aborted bookings - Booking value
  6. 6. Observability (monitoring) capabilities must be part of the application development
  7. 7. Observability == Monitoring ? Monitoring ● Checks ● Thresholds ● Alarms ● Notifications Observability adds the dimension of analytics on events and metrics
  8. 8. 3 pillars of observability ● Logging ○ 2016-10-11 13:14:22 Transaction id 12398 failed on update ○ 2016-10-11 13:14:22 /product - status:200 - response_ms:46 ● Metrics ○ Number of calls to endpoint /product ○ Number of failed calls to endpoint /product ○ Response time /product response time for p99 is 100 ms last hour ○ Rate is 145 tps for transaction “order” over last 5 minute ● Tracing ○ Request id 732427234 took 168 ms, broken down on services: ■ Service A took 28 ms ■ Service B took 99 ms ■ Service C took 41 ms[Peter Bourgon @peterbourgon]
  9. 9. 3 pillars of observability ● Logging ○ 2016-10-11 13:14:22 Transaction id 12398 failed on update ○ 2016-10-11 13:14:22 /product - status:200 - response_ms:46 ● Metrics ○ Number of calls to endpoint /product ○ Number of failed calls to endpoint /product ○ Response time /product response time for p99 is 100 ms last hour ○ Rate is 145 tps for transaction “order” over last 5 minute ● Tracing ○ Request id 732427234 took 168 ms, broken down on services: ■ Service A took 28 ms ■ Service B took 99 ms ■ Service C took 41 ms[Peter Bourgon @peterbourgon]
  10. 10. Get started Metric endpoints Health checks PUSH PULL Log events Metrics reporter
  11. 11. Aggregation complexity PUSH PULL
  12. 12. /health ● An API endpoint (e.g. HTTP /health) that returns the health of the service. ● Report the status of: ○ infrastructure services used by the service instance ○ its own internal state and logic ○ the status of the others services, if it is required ● Used periodically by ○ monitoring service ○ service registry ○ load balancer
  13. 13. Logs events that make sense - not 2016-10-10 23:10:12,670 WARN [com.foo.CheckAddress] Not a valid address 2016-10-11 12:11:17,424 ERROR [com.bar.Transaction] Should not have happened 2018-09-12 11:01:07,344 INFO [com.bar.Transaction] Check address Kungsgatan 10 - okay 23 ms Lack of format, lack of context
  14. 14. Event domain language { "@timestamp": "2016-10-10T23:10:12.670Z", "application": "customer", "xrequestid":”602b-4084-4329-b874-7c65203004af", "operation": "checkAddress", "countryCode": "SE", "city": "Kalmar", "streetNumber": "25", "postalCode": "39232", "streetName": "Storgatan", "ServiceProviderId": "6", "operation_status": "failed", "cause": "Not a valid address", "level": "INFO", "endpoint": "addressinfo", "version": "v1", "responsetime_ms": 113, …………. } Log Metrics { "name": "api_response", "tags": [ {"endpoint": "addressinfo"}, {"version": "v1"}, {"application": "customer"}, {"region": "eu-west"}, {“method”: "GET"} ], "unit": "seconds", "value": 0.064, "timestamp": 14340555620000 }
  15. 15. Start simpleEngage development with observability 19 ● Health check API’s ● Track and set request id ● Metrics reporting / API’s - what is key for the service ● Structure log formats that are possible to do ops on ● Engage in an event domain language
  16. 16. Monitoring antipatterens ● Tool obsession ● Not a job title - it’s a skill ● Manual configuration ● We are not Netflix ● Not available to everybody
  17. 17. Summary 21 ● Shift from infrastructure to applications monitoring ● Observable application are the responsibility of the developers ● Metrics/events - what are purpose of the service ● Work on a common “domain language” for logging and metrics ● Mission driven and not tool driven ● Observability “is the new black”
  18. 18. Your expert on observability Here at Opsdis, our main focus is helping our clients get a good overview of their operations. We have decades of experience providing companies knowledge of the state of their operations as well as their business. Feel free to contact us, we are happy to tell you more!

×