Logs/Metrics Gathering
With OpenShift EFK Stack
DevConf, Brno, January 27 2018
Josef Karásek Jan Wozniak
Software Engineer Software Engineer
1
@Pepe_CZ2
ONE YEAR AGO
@Pepe_CZ
● The project was officially added to the Group 2 in OpenShift
organisation
● The Dev team grew in size:
○ Rich Megginson
○ Noriko Hosoi
○ Lukáš Vlček
○ Jeff Cantrill
○ Eric Wolinetz
○ Jan Wozniak
○ Josef Karásek
ADDITIONS TO THE TEAM
3
WE HAVE GROWN
@Pepe_CZ
● Collecting Distributed Logs
● Common Data Model
● Security model - Multi-Tenancy
● Integration with Red Hat products and their upstream projects
● Scalability
● Enable “Big Data” Analysis
● All Open Source
Watch the talk on YouTube!
MAIN OBJECTIVES
4
WHAT WE WANT TO ACHIEVE
@Pepe_CZ5
LOGGING SYSTEM - ABSTRACT
COMPONENTS
Log
files
Journal Collector
Data
Warehouse
(Cluster)
Visualization
Guests
Containers
Services
Applications
Tlog Syslog
Host
...
Host
Load
Balancer
Logging System
Monitoring
Log
files
Journal Collector
Guests
Containers
Services
Applications
Tlog Syslog
@Pepe_CZ6
CURRENT OPENSHIFT LOGGING
Elasticsearch
(Cluster)
Kibana
ES service
Logging Namespace
Prometheus
OpenShift Cluster
pod
pod
project
pod
pod
project
openshift
docker/cri
OS
Fluentd
journald
/var/log/containers/*.log
Curator
audit
ES
reencrypt
route
Fluentd browserManageIQ
Kopf
Mux
(Fluentd)*
@Pepe_CZ
FLUENTD - COLLECTOR AND
NORMALIZER
RUBY BASED LOG AGENT
● Configuration - Apache like,
ruby based
● Scalable, secure msgpack
secure_forward
● Hundreds of plugins
● Easy to write ruby plugins
● Kubernetes metadata
plugin
● OpenStack reference
architecture
● Use rsyslog via RELP plugin
<filter
kubernetes.journal.container**>
@type record_transformer
enable_ruby
<record>
time
${Time.at((record["_SOURCE_REALTIME_
TIMESTAMP"] ||
record["__REALTIME_TIMESTAMP"]).to_f
/
1000000.0).utc.to_datetime.rfc3339(6
)}
...
7
@Pepe_CZ
WIDELY USED, JAVA BASED
SEARCH ENGINE
ELASTICSEARCH - DATA WAREHOUSE
● Based on Apache Lucene
● Great for full text log
searching
● Very good for TSD
● SearchGuard for security,
authz
● Openshift Elasticsearch
plugin
● OpenStack, oVirt reference
architecture
● Curator for log trimming
{
"_id": "AVm4sS7SHNq31gLBPp4-",
"_index": ".operations.2017.01.18",
"_score": 1.0,
"_source": {
"@timestamp":
"2017-01-17T21:45:41.000000-00:00",
"Hostname": “os.rmeggins.test",
"message": "Journal stopped",
"systemd": {
"t": {
"PID": “109”,
...
},
"_type": "com.redhat.viaq.common"
8
@Pepe_CZ9
KIBANA - VISUALIZATION
Node.js Based - Tightly Coupled with Elasticsearch
@Pepe_CZ10
ARCHITECTURE - LOGGING DETAIL
Elasticsearch
(Cluster)
ES service/externalIP
Logging System - OpenShift Platform
Fluentd
OpenShift ES
plugin
SearchGuard
plugin
Kibana container
Auth proxy
container
OpenShift
OAuth
OpenShift
API
K8s
metadata
User
project
and roles
Browser
Add token
and userid
headers
Token and
userid
headers
Kibana Pod
@Pepe_CZ11
QUICKSTART - oc cluster up --logging
● Deploy OpenShift with oc cluster up
● Shutdown cluster
● Restart docker
● Bring cluster back up with existing configuration
There is currently a bug that the pods cannot inter-network e.g. Fluentd
cannot talk to Elasticsearch unless docker is restarted while the cluster is
down.
$ sudo oc cluster down
$ sudo systemctl restart docker
$ sudo oc cluster up --use-existing-config …
@Pepe_CZ12
QUICKSTART - minishift start --logging
● Set up minishift [1] - use
[1] https://github.com/MiniShift/minishift
minishift start --logging
@Pepe_CZ13
ViaQ - LOGGING THE HARD WAY
● Follow directions on GitHub
● Uses openshift-ansible to set up an all-in-one cluster
● Configures logging for external access - similar to how oVirt uses
logging
● Extensible for more complex deployments
@Pepe_CZ14
EXAMPLE ANSIBLE INVENTORY FILES
● deploy_cluster.yml playbook to deploy OpenShift and logging
● All-in-one inventory based on OpenShift Origin 3.7.1
# Make sure to set version and to install logging
[OSEv3:vars]
openshift_release=v3.7.1
openshift_logging_install_logging=true
openshift_image_tag=v3.7.1
openshift_logging_es_allow_external=true
@Pepe_CZ15
TROUBLESHOOTING
● logging-dump.sh - an “sosreport” for logging [1],[2]
○ Contains pod logs, config
○ Look at the pod log files for errors
○ Good for attaching to bug reports
[1]
https://github.com/openshift/origin-aggregated-logging/blob/master
/hack/README-dump.md
[2]
https://github.com/openshift/origin-aggregated-logging/blob/master
/hack/logging-dump.sh
@Pepe_CZ16
TROUBLESHOOTING
● Query Elasticsearch from command line - es_util
Where <query> could be something like
Instead of project.* use .operations.* for system logs
● Get the list of indices
oc get pods | grep logging-es # get the pod name
espod=logging-es-.....
oc exec -c elasticsearch $espod -- es_util --query 
“project.*/_search?sort=@timestamp:desc&q=<query>” 
| python -mjson.tool | more
level:error
oc exec -c elasticsearch $espod -- indices
@Pepe_CZ17
USING WITH oVirt
● oVirt uses Collectd to gather metrics and monitoring data
● Collectd writes to Fluentd using http input
● Fluentd also gathers oVirt engine logs
● Fluentd sends data to external Elasticsearch endpoint
● Logging is configured with ovirt-metrics-engine and
ovirt-logs-engine projects
● Links:
https://www.ovirt.org/blog/2017/12/ovirt-metrics-store/
https://www.ovirt.org/develop/release-management/features/me
trics/metrics-store/
@Pepe_CZ18
USING WITH OpenStack
● OpenStack can be configured with a Fluentd client
● OpenStack uses secure_forward to send logs to mux
● Upstream documentation is here[1]
● Downstream documentation is here[2]
[1]http://opstools-ansible.readthedocs.io/en/latest/tripleo_integration
.html
[2]https://access.redhat.com/documentation/en-us/red_hat_opensta
ck_platform/10/html/advanced_overcloud_customization/sect-monito
ring_tools_configuration
@Pepe_CZ19
LOGGING CUSTOM APPLICATION
DATA
● Have clear definition of fields in log messages
● Send logs to stdout
● Configure application to output single-line JSON
BEST PRACTICES
{
"hostname":"myhost.test",
"level":"info",
"message":"Server listening on 0.0.0.0:8080",
"time":"2018-01-24T17:35:10+01:00"
}
@Pepe_CZ20
LOGGING CUSTOM APPLICATION
DATA
● Or even:
BEST PRACTICES
{
"application": {
"accounts": {
"hostname":"myhost.test",
"level":"info",
"message":"Server listening on 0.0.0.0:8080",
"time":"2018-01-24T17:35:10+01:00"
}
}
}
@Pepe_CZ21
LOGGING CUSTOM APPLICATION
DATA
These things are easy...
BEST PRACTICES
func initLogger() *log.Entry {
log.SetFormatter(&log.JSONFormatter)
log.SetOutput(os.Stdout)
return log.WithFields(log.Fields{
"hostname": os.Getenv("HOSTNAME"),
})
}
@Pepe_CZ22
LOGGING CUSTOM APPLICATION
DATA
Log line:
Becomes:
JSON FORMATTED MESSAGE FIELD
INFO[0000] 2018-01-24T17:35:10+01:00 message="{"level":"warn","message":"Function
deprecated", "some_field":"some_value"}"
{
"level":"warn",
"some_field":"some_value",
"message":"Function deprecated",
...
}
@Pepe_CZ23
LOGGING CUSTOM APPLICATION
DATA
● Plain text messages
○ ...the default for most loggers
○ Searching such logs becomes a real CSI crime scene investigation
WORST PRACTICE
{
"level":"info",
"message":"ERROR[0000] 2018-01-24T17:35:10+01:00 NullPointerException
in ...",
...
}
@Pepe_CZ24
DEMO
@Pepe_CZ25
FUTURE DIRECTIONS
● Support CRI log format - not docker json-file compatible
● Fluentd does not scale well - look for alternatives: rsyslog,
fluent-bit, Elastic Beats
● Fluentd RELP input - rsyslog to fluentd[1]
● More integration with Prometheus - fluentd metrics, other metrics
● Elasticsearch 5 (OpenShift 3.10), Elasticsearch 6 (OpenShift 3.11 or
later)
● Grafana - display metrics and log data on same dashboard -
aggregate from different sources
● Message Queue integration
[1] https://github.com/ViaQ/fluent-plugin-relp
@Pepe_CZ26
ARCHITECTURE USING QUEUE
Log
sources Collector
Elasticsearch
(Cluster)
Kibana
Host
...
Host
Mux -
Normalizer
Mux -
Normalizer
Logging SystemMessage
Queue
Separate
topics for
Raw and
Normalized
Log
sources Collector
Raw
Raw
Raw
Raw
“Big Data” Analysis
Archival
“Tailing”
Monitoring
Normalized
@Pepe_CZ27
WHERE TO FIND THE CODE?
@Pepe_CZ28
SOURCE CODE & MAILING LIST
● OpenShift Aggregated Logging
○ https://github.com/openshift/origin-aggregated-logging
○ #openshift-dev FreeNode IRC
● ViaQ
○ https://github.com/ViaQ
○ #viaq FreeNode IRC
● CentOS OpsTools SIG
○ https://wiki.centos.org/SpecialInterestGroup/OpsTools
○ #centos-devel FreeNode IRC
○ centos-devel mailing list
@Pepe_CZ
Q & A
29
THANK YOU
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
30

Logs/Metrics Gathering With OpenShift EFK Stack

  • 1.
    Logs/Metrics Gathering With OpenShiftEFK Stack DevConf, Brno, January 27 2018 Josef Karásek Jan Wozniak Software Engineer Software Engineer 1
  • 2.
  • 3.
    @Pepe_CZ ● The projectwas officially added to the Group 2 in OpenShift organisation ● The Dev team grew in size: ○ Rich Megginson ○ Noriko Hosoi ○ Lukáš Vlček ○ Jeff Cantrill ○ Eric Wolinetz ○ Jan Wozniak ○ Josef Karásek ADDITIONS TO THE TEAM 3 WE HAVE GROWN
  • 4.
    @Pepe_CZ ● Collecting DistributedLogs ● Common Data Model ● Security model - Multi-Tenancy ● Integration with Red Hat products and their upstream projects ● Scalability ● Enable “Big Data” Analysis ● All Open Source Watch the talk on YouTube! MAIN OBJECTIVES 4 WHAT WE WANT TO ACHIEVE
  • 5.
    @Pepe_CZ5 LOGGING SYSTEM -ABSTRACT COMPONENTS Log files Journal Collector Data Warehouse (Cluster) Visualization Guests Containers Services Applications Tlog Syslog Host ... Host Load Balancer Logging System Monitoring Log files Journal Collector Guests Containers Services Applications Tlog Syslog
  • 6.
    @Pepe_CZ6 CURRENT OPENSHIFT LOGGING Elasticsearch (Cluster) Kibana ESservice Logging Namespace Prometheus OpenShift Cluster pod pod project pod pod project openshift docker/cri OS Fluentd journald /var/log/containers/*.log Curator audit ES reencrypt route Fluentd browserManageIQ Kopf Mux (Fluentd)*
  • 7.
    @Pepe_CZ FLUENTD - COLLECTORAND NORMALIZER RUBY BASED LOG AGENT ● Configuration - Apache like, ruby based ● Scalable, secure msgpack secure_forward ● Hundreds of plugins ● Easy to write ruby plugins ● Kubernetes metadata plugin ● OpenStack reference architecture ● Use rsyslog via RELP plugin <filter kubernetes.journal.container**> @type record_transformer enable_ruby <record> time ${Time.at((record["_SOURCE_REALTIME_ TIMESTAMP"] || record["__REALTIME_TIMESTAMP"]).to_f / 1000000.0).utc.to_datetime.rfc3339(6 )} ... 7
  • 8.
    @Pepe_CZ WIDELY USED, JAVABASED SEARCH ENGINE ELASTICSEARCH - DATA WAREHOUSE ● Based on Apache Lucene ● Great for full text log searching ● Very good for TSD ● SearchGuard for security, authz ● Openshift Elasticsearch plugin ● OpenStack, oVirt reference architecture ● Curator for log trimming { "_id": "AVm4sS7SHNq31gLBPp4-", "_index": ".operations.2017.01.18", "_score": 1.0, "_source": { "@timestamp": "2017-01-17T21:45:41.000000-00:00", "Hostname": “os.rmeggins.test", "message": "Journal stopped", "systemd": { "t": { "PID": “109”, ... }, "_type": "com.redhat.viaq.common" 8
  • 9.
    @Pepe_CZ9 KIBANA - VISUALIZATION Node.jsBased - Tightly Coupled with Elasticsearch
  • 10.
    @Pepe_CZ10 ARCHITECTURE - LOGGINGDETAIL Elasticsearch (Cluster) ES service/externalIP Logging System - OpenShift Platform Fluentd OpenShift ES plugin SearchGuard plugin Kibana container Auth proxy container OpenShift OAuth OpenShift API K8s metadata User project and roles Browser Add token and userid headers Token and userid headers Kibana Pod
  • 11.
    @Pepe_CZ11 QUICKSTART - occluster up --logging ● Deploy OpenShift with oc cluster up ● Shutdown cluster ● Restart docker ● Bring cluster back up with existing configuration There is currently a bug that the pods cannot inter-network e.g. Fluentd cannot talk to Elasticsearch unless docker is restarted while the cluster is down. $ sudo oc cluster down $ sudo systemctl restart docker $ sudo oc cluster up --use-existing-config …
  • 12.
    @Pepe_CZ12 QUICKSTART - minishiftstart --logging ● Set up minishift [1] - use [1] https://github.com/MiniShift/minishift minishift start --logging
  • 13.
    @Pepe_CZ13 ViaQ - LOGGINGTHE HARD WAY ● Follow directions on GitHub ● Uses openshift-ansible to set up an all-in-one cluster ● Configures logging for external access - similar to how oVirt uses logging ● Extensible for more complex deployments
  • 14.
    @Pepe_CZ14 EXAMPLE ANSIBLE INVENTORYFILES ● deploy_cluster.yml playbook to deploy OpenShift and logging ● All-in-one inventory based on OpenShift Origin 3.7.1 # Make sure to set version and to install logging [OSEv3:vars] openshift_release=v3.7.1 openshift_logging_install_logging=true openshift_image_tag=v3.7.1 openshift_logging_es_allow_external=true
  • 15.
    @Pepe_CZ15 TROUBLESHOOTING ● logging-dump.sh -an “sosreport” for logging [1],[2] ○ Contains pod logs, config ○ Look at the pod log files for errors ○ Good for attaching to bug reports [1] https://github.com/openshift/origin-aggregated-logging/blob/master /hack/README-dump.md [2] https://github.com/openshift/origin-aggregated-logging/blob/master /hack/logging-dump.sh
  • 16.
    @Pepe_CZ16 TROUBLESHOOTING ● Query Elasticsearchfrom command line - es_util Where <query> could be something like Instead of project.* use .operations.* for system logs ● Get the list of indices oc get pods | grep logging-es # get the pod name espod=logging-es-..... oc exec -c elasticsearch $espod -- es_util --query “project.*/_search?sort=@timestamp:desc&q=<query>” | python -mjson.tool | more level:error oc exec -c elasticsearch $espod -- indices
  • 17.
    @Pepe_CZ17 USING WITH oVirt ●oVirt uses Collectd to gather metrics and monitoring data ● Collectd writes to Fluentd using http input ● Fluentd also gathers oVirt engine logs ● Fluentd sends data to external Elasticsearch endpoint ● Logging is configured with ovirt-metrics-engine and ovirt-logs-engine projects ● Links: https://www.ovirt.org/blog/2017/12/ovirt-metrics-store/ https://www.ovirt.org/develop/release-management/features/me trics/metrics-store/
  • 18.
    @Pepe_CZ18 USING WITH OpenStack ●OpenStack can be configured with a Fluentd client ● OpenStack uses secure_forward to send logs to mux ● Upstream documentation is here[1] ● Downstream documentation is here[2] [1]http://opstools-ansible.readthedocs.io/en/latest/tripleo_integration .html [2]https://access.redhat.com/documentation/en-us/red_hat_opensta ck_platform/10/html/advanced_overcloud_customization/sect-monito ring_tools_configuration
  • 19.
    @Pepe_CZ19 LOGGING CUSTOM APPLICATION DATA ●Have clear definition of fields in log messages ● Send logs to stdout ● Configure application to output single-line JSON BEST PRACTICES { "hostname":"myhost.test", "level":"info", "message":"Server listening on 0.0.0.0:8080", "time":"2018-01-24T17:35:10+01:00" }
  • 20.
    @Pepe_CZ20 LOGGING CUSTOM APPLICATION DATA ●Or even: BEST PRACTICES { "application": { "accounts": { "hostname":"myhost.test", "level":"info", "message":"Server listening on 0.0.0.0:8080", "time":"2018-01-24T17:35:10+01:00" } } }
  • 21.
    @Pepe_CZ21 LOGGING CUSTOM APPLICATION DATA Thesethings are easy... BEST PRACTICES func initLogger() *log.Entry { log.SetFormatter(&log.JSONFormatter) log.SetOutput(os.Stdout) return log.WithFields(log.Fields{ "hostname": os.Getenv("HOSTNAME"), }) }
  • 22.
    @Pepe_CZ22 LOGGING CUSTOM APPLICATION DATA Logline: Becomes: JSON FORMATTED MESSAGE FIELD INFO[0000] 2018-01-24T17:35:10+01:00 message="{"level":"warn","message":"Function deprecated", "some_field":"some_value"}" { "level":"warn", "some_field":"some_value", "message":"Function deprecated", ... }
  • 23.
    @Pepe_CZ23 LOGGING CUSTOM APPLICATION DATA ●Plain text messages ○ ...the default for most loggers ○ Searching such logs becomes a real CSI crime scene investigation WORST PRACTICE { "level":"info", "message":"ERROR[0000] 2018-01-24T17:35:10+01:00 NullPointerException in ...", ... }
  • 24.
  • 25.
    @Pepe_CZ25 FUTURE DIRECTIONS ● SupportCRI log format - not docker json-file compatible ● Fluentd does not scale well - look for alternatives: rsyslog, fluent-bit, Elastic Beats ● Fluentd RELP input - rsyslog to fluentd[1] ● More integration with Prometheus - fluentd metrics, other metrics ● Elasticsearch 5 (OpenShift 3.10), Elasticsearch 6 (OpenShift 3.11 or later) ● Grafana - display metrics and log data on same dashboard - aggregate from different sources ● Message Queue integration [1] https://github.com/ViaQ/fluent-plugin-relp
  • 26.
    @Pepe_CZ26 ARCHITECTURE USING QUEUE Log sourcesCollector Elasticsearch (Cluster) Kibana Host ... Host Mux - Normalizer Mux - Normalizer Logging SystemMessage Queue Separate topics for Raw and Normalized Log sources Collector Raw Raw Raw Raw “Big Data” Analysis Archival “Tailing” Monitoring Normalized
  • 27.
  • 28.
    @Pepe_CZ28 SOURCE CODE &MAILING LIST ● OpenShift Aggregated Logging ○ https://github.com/openshift/origin-aggregated-logging ○ #openshift-dev FreeNode IRC ● ViaQ ○ https://github.com/ViaQ ○ #viaq FreeNode IRC ● CentOS OpsTools SIG ○ https://wiki.centos.org/SpecialInterestGroup/OpsTools ○ #centos-devel FreeNode IRC ○ centos-devel mailing list
  • 29.
  • 30.