Logitoring : 
log-driven monitoring 
the Rocket science 
Eugene Istomin 
IT Architect 
e.istomin@edss.ee 
Cone Center,Tallinn 
and
Topic goal: 
talking about a common way of 
delivering, storing and analyzing 
monitoring/log/trace data flows. 
In brief: 
Does log-driven monitoring 
fits all needs?
4 
TOC 
Metrics in monitoring and logging 
By log-driven monitoring – to 
dataflow -driven services. 
Rsyslog event transport 
Let's try. Live test case
Monitoring ~ protocol/agent, 
desired data descr, centralized 
storage, notifications = Reactive 
5 
Metrics in monitoring and logging 
IT Monitoring - sum of methods used to 
collect defined metrics using checks. 
What is the classical 
monitoring metric? 
Numeric! (int/bool/etc)
6 
Metrics in monitoring and logging 
Classical monitoring interfaces 
● Zabbix 
● Nagios
7 
Metrics in monitoring and logging 
Monitoring is usually used for: 
● Servers status dashboard creation 
● IT-administrators notification 
● Numeric info visualization 
● IT-inventory 
Monitoring In brief: 
● Schema-based 
● Use common network 
protocols or agents 
● Stored data not reusable 
● Needs by IT technics/admins
Logs ~ syslog, pid/severity/ 
program, transport, centralized 
storage. = Proactive 
8 
Metrics in monitoring and logging 
IT Logging - sum of methods used to 
collect pass-through flows information. 
What is the classical 
logs metric? 
String!
9 
Metrics in monitoring and logging 
Classical logging interfaces 
● Loganalyzer 
● Greylog
10 
Metrics in monitoring and logging 
Logging is usually used for: 
● Problem resolving 
● Debugging & development 
Logging In brief: 
● Security access violation 
events storage 
● Schema-less 
● Use syslog or API/REST 
● Stored data are reusable 
● Needs by developers
11 
Metrics in monitoring and logging 
Do the data in monitoring is the same 
as data in logging? Yes, if: 
● the data is well tokenized (known keys) 
A=2, TO=me@z.com... 
● the data have a common syntax 
(JSON/CSV) 
{ “A”:”2”, “TO”:[“me@z.com”,...] }
12 
Metrics in monitoring and logging 
Do the data in monitoring is the same 
as data in logging? Yes, if: 
● the data have known type mapping 
(“field TO = string”, “field ID = int”) 
{ “A”:”int”, “TO”:”array” } 
● storage layer use efficient 
token/key/hash-based algorithms 
“me@z.com”=>“me”,”@”,”z”,”.”,”com”
13 
Metrics in monitoring and logging 
Do the data in monitoring is the same 
as data in logging? Yes, if: 
● user UI/API can ask for mixed 
fields content using complex 
expressions 
(regexp/ranges/sorts) 
"query_string" : { 
"fields" : ["TO.*"], 
"query" : "a AND com OR z" 
}
14 
Metrics in monitoring and logging 
Do the data in monitoring have the same 
nature as data in logging? Yes, 
● Monitoring and logging are 
subsets of events 
● Monitoring is mainly reactive 
● Logging is mainly proactive 
Events is a set includes all 
possible types of messages 
(monitoring, logging, JSON data 
exchange by HTTP or TCP, etc)
15 
TOC 
Metrics in monitoring and logging 
By log-driven monitoring – 
to dataflow -driven 
services. 
Rsyslog event transport 
Let's try. Live test case
16 
Log-driven monitoring 
Log-driven monitoring - 
a sum of techniques that provides 
access to logs events fields from 
high-level decision-maker 
application using complex 
expressions. 
1. { haproxy_status=503},... { jmx_app1_status=500 } 
2. "query_string" : { 
"fields" : ["haproxy_status",”jmx_*_status”], 
"query" : ">=500" 
} 
3. action → notify admins
17 
Log-driven monitoring 
LLD is superb
18 
Log-driven monitoring 
LLD is superb 
Items are created by 
JSON POST using 
exernal CMDB 
database 
All items are 
trappers
19 
Log-driven monitoring 
Fast-as-light Erlang Zabbix sender 
Systemd-based 
service
20 
Log-driven monitoring 
But whats about datastore?
21 
Log-driven monitoring 
mails count per second in normal & percent grade
22 
Log-driven monitoring 
derivative of mails count per second coerced to 
spamscore SUM & MAX
23 
TOC 
Metrics in monitoring and logging 
By log-driven monitoring – to 
dataflow -driven services. 
Rsyslog event transport 
Let's try. Live test case
24 
Rsyslog event transport 
Rsyslog templates for CEE logging 
● CEE (LumberJack) JSON messages 
● “timestamp” field in ES format (us)
25 
Rsyslog event transport 
Rulesets & actions 
All logs must been tokenized: 
● Before Rsyslog (in application) 
● Using Rsyslog (mmnormalize)
26 
Rsyslog event transport 
Rulesets & actions 
● Every ruleset have own queue 
● Queues are disk-backed (watermark-based) 
● Shutdowns and restarts are safe
27 
“Talk is cheap. Show me the code.”
28 
TOC 
Metrics in monitoring and logging 
By log-driven monitoring – to 
dataflow -driven services. 
Rsyslog event transport 
Let's try. Live test case
Show accepted mails, 2.2Km, time to retrieve – 92 ms 
29 
Let's try. Live test case 
Application – MTA + spamfilter, scope-12h 
Show all msg, ~1.7Mm, time to retrieve – 400 ms
30 
Let's try. Live test case 
Application – HTTP(S) gate, scope-12h (08:00-18:00) 
Show all HTTPS traffic per hour 
- ~1.9Mm, 
- time to retrieve – 210 ms
31 
Let's try. Live test case 
Application – All app-specific logs (no per-field tokenizing) 
- ~4Mm, 
- time to retrieve 
- 8 s
Mail your CV: 
info@cone.ee 
“Talk is cheap. Show me the code.” L.Torvalds 
Eugene Istomin 
IT Architect 
e.istomin@edss.ee 
Cone Center,Tallinn 
Thanks!

Logitoring - log-driven monitoring and the Rocket science

  • 2.
    Logitoring : log-drivenmonitoring the Rocket science Eugene Istomin IT Architect e.istomin@edss.ee Cone Center,Tallinn and
  • 3.
    Topic goal: talkingabout a common way of delivering, storing and analyzing monitoring/log/trace data flows. In brief: Does log-driven monitoring fits all needs?
  • 4.
    4 TOC Metricsin monitoring and logging By log-driven monitoring – to dataflow -driven services. Rsyslog event transport Let's try. Live test case
  • 5.
    Monitoring ~ protocol/agent, desired data descr, centralized storage, notifications = Reactive 5 Metrics in monitoring and logging IT Monitoring - sum of methods used to collect defined metrics using checks. What is the classical monitoring metric? Numeric! (int/bool/etc)
  • 6.
    6 Metrics inmonitoring and logging Classical monitoring interfaces ● Zabbix ● Nagios
  • 7.
    7 Metrics inmonitoring and logging Monitoring is usually used for: ● Servers status dashboard creation ● IT-administrators notification ● Numeric info visualization ● IT-inventory Monitoring In brief: ● Schema-based ● Use common network protocols or agents ● Stored data not reusable ● Needs by IT technics/admins
  • 8.
    Logs ~ syslog,pid/severity/ program, transport, centralized storage. = Proactive 8 Metrics in monitoring and logging IT Logging - sum of methods used to collect pass-through flows information. What is the classical logs metric? String!
  • 9.
    9 Metrics inmonitoring and logging Classical logging interfaces ● Loganalyzer ● Greylog
  • 10.
    10 Metrics inmonitoring and logging Logging is usually used for: ● Problem resolving ● Debugging & development Logging In brief: ● Security access violation events storage ● Schema-less ● Use syslog or API/REST ● Stored data are reusable ● Needs by developers
  • 11.
    11 Metrics inmonitoring and logging Do the data in monitoring is the same as data in logging? Yes, if: ● the data is well tokenized (known keys) A=2, TO=me@z.com... ● the data have a common syntax (JSON/CSV) { “A”:”2”, “TO”:[“me@z.com”,...] }
  • 12.
    12 Metrics inmonitoring and logging Do the data in monitoring is the same as data in logging? Yes, if: ● the data have known type mapping (“field TO = string”, “field ID = int”) { “A”:”int”, “TO”:”array” } ● storage layer use efficient token/key/hash-based algorithms “me@z.com”=>“me”,”@”,”z”,”.”,”com”
  • 13.
    13 Metrics inmonitoring and logging Do the data in monitoring is the same as data in logging? Yes, if: ● user UI/API can ask for mixed fields content using complex expressions (regexp/ranges/sorts) "query_string" : { "fields" : ["TO.*"], "query" : "a AND com OR z" }
  • 14.
    14 Metrics inmonitoring and logging Do the data in monitoring have the same nature as data in logging? Yes, ● Monitoring and logging are subsets of events ● Monitoring is mainly reactive ● Logging is mainly proactive Events is a set includes all possible types of messages (monitoring, logging, JSON data exchange by HTTP or TCP, etc)
  • 15.
    15 TOC Metricsin monitoring and logging By log-driven monitoring – to dataflow -driven services. Rsyslog event transport Let's try. Live test case
  • 16.
    16 Log-driven monitoring Log-driven monitoring - a sum of techniques that provides access to logs events fields from high-level decision-maker application using complex expressions. 1. { haproxy_status=503},... { jmx_app1_status=500 } 2. "query_string" : { "fields" : ["haproxy_status",”jmx_*_status”], "query" : ">=500" } 3. action → notify admins
  • 17.
  • 18.
    18 Log-driven monitoring LLD is superb Items are created by JSON POST using exernal CMDB database All items are trappers
  • 19.
    19 Log-driven monitoring Fast-as-light Erlang Zabbix sender Systemd-based service
  • 20.
    20 Log-driven monitoring But whats about datastore?
  • 21.
    21 Log-driven monitoring mails count per second in normal & percent grade
  • 22.
    22 Log-driven monitoring derivative of mails count per second coerced to spamscore SUM & MAX
  • 23.
    23 TOC Metricsin monitoring and logging By log-driven monitoring – to dataflow -driven services. Rsyslog event transport Let's try. Live test case
  • 24.
    24 Rsyslog eventtransport Rsyslog templates for CEE logging ● CEE (LumberJack) JSON messages ● “timestamp” field in ES format (us)
  • 25.
    25 Rsyslog eventtransport Rulesets & actions All logs must been tokenized: ● Before Rsyslog (in application) ● Using Rsyslog (mmnormalize)
  • 26.
    26 Rsyslog eventtransport Rulesets & actions ● Every ruleset have own queue ● Queues are disk-backed (watermark-based) ● Shutdowns and restarts are safe
  • 27.
    27 “Talk ischeap. Show me the code.”
  • 28.
    28 TOC Metricsin monitoring and logging By log-driven monitoring – to dataflow -driven services. Rsyslog event transport Let's try. Live test case
  • 29.
    Show accepted mails,2.2Km, time to retrieve – 92 ms 29 Let's try. Live test case Application – MTA + spamfilter, scope-12h Show all msg, ~1.7Mm, time to retrieve – 400 ms
  • 30.
    30 Let's try.Live test case Application – HTTP(S) gate, scope-12h (08:00-18:00) Show all HTTPS traffic per hour - ~1.9Mm, - time to retrieve – 210 ms
  • 31.
    31 Let's try.Live test case Application – All app-specific logs (no per-field tokenizing) - ~4Mm, - time to retrieve - 8 s
  • 32.
    Mail your CV: info@cone.ee “Talk is cheap. Show me the code.” L.Torvalds Eugene Istomin IT Architect e.istomin@edss.ee Cone Center,Tallinn Thanks!

Editor's Notes

  • #3 <number>
  • #4 <number>
  • #5 <number>
  • #6 <number>
  • #7 <number>
  • #8 <number>
  • #9 <number>
  • #10 <number>
  • #11 <number>
  • #12 <number>
  • #13 <number>
  • #14 <number>
  • #15 <number>
  • #16 <number>
  • #17 <number>
  • #24 <number>
  • #25 <number>
  • #26 <number>
  • #27 <number>
  • #28 <number>
  • #29 <number>
  • #33 <number>