Hands-off logging for OpenShift in AWS

•

1 like•418 views

Your logging stack is running as an app in your OpenShift platform and something goes wrong within the platform itself, how do you make sure you can still access the logs to diagnose the problem? We take you through our journey in running OpenShift on AWS and how we came to a good answer to that question.

Software

Hands-off logging
for OpenShift in AWS
Lessons learned in Deloitte Platform Engineering
Amir Moghimi, Lead Platform Engineer
Jason Howard, Lead Platform Engineer

Who runs the logging
stack?
Your logging stack is running as apps inside your
OpenShift platform and something goes wrong
within the platform itself, how do you make sure
you can still access the logs to diagnose the
problem?

How do you guarantee you can
access the logs and not miss
entries due to load or
maintenance issues?
Tip
ElasticSearch is not
trivial software to run
and maintain in
production.

Meet James.
He recently started doing a bit of
operations as part of a devops team.
He loved the ideas behind devops
movement but had little operations
experience.

A support ticket
He gets a call to look into why
OpenShift is not able to deploy any
new containers.
First thing he looks at is the logs in
Kibana but only finds that there are no
new log entries since hours ago.

An operational
immaturity left James
feeling frustrated and
he quit the devops
team.
Tip
Always remember the
impact of developers
not considering the
operational complexities
they introduce in the
code.

Don’t run the full logging stack!
Let someone else do it for you.
(With a little help from AWS)
Tip
OpenShift built-in
logging stack, i.e. EFK,
can help you ship the
logs to another logging
stack.

OpenShift documentation:
“Sending logs directly to an
AWS Elasticsearch instance
is not supported. Use
Fluentd Secure Forward to
direct logs to an instance of
Fluentd that you control and
that is configured with the
fluent-plugin-aws-ela
sticsearch-service
plug-in.”

High-level Architecture
OpenShift
Fluentd
DaemonSet
Fluentd
Aggregator
Deployment
AWS
ElasticSearch

Then, James would find
the logs in AWS ES
He could get to the error message in
registry pod that showed the disk is full.
He was then able to go and free some
disk space while thinking why openshift
registry is not able to garbage collect
images when the disk is full!

It’s no surprise we use AWS ElasticSearch
regularly.
There are many
things that can go
wrong with ES.
Tip
Have you heard of split
brain where each node
elects itself as the new
master (thinking that the
other master-eligible
node has died) and the
result is two clusters, or
a split brain?

Make sure you configure the following
parameters for Fluentd log shipper
appropriately:
buffer_chunk_limit
buffer_queue_limit
num_threads
flush_interval
Also the default 200M memory limit is
too low and 400~500M is more
reasonable under load.
Tip
AWS ES has a bulk
upload limit, depending
on the instance size. The
buffer_chunk_limit times
buffer_queue_limit
should not exceed the
bulk upload limit of your
ES instance.

What’s next?
Use Painless scripting in ES 5
for partial index updates in a
fast and secure manner.

What's hot

Docker on AWS OpsWorksJonathan Weiss

Version Control ThinkVitaminAlex Hillman

Short journey into the serverless worldScott van Kalken

Learn ELK in dockerLarry Cai

DevOps 2015 - Dancing with Chefsmalltown

DevOps in a Regulated World - aka 'Ansible, AWS, and Jenkins'rmcleay

EPAM :: LightingTalks :: Oct 2013Vyacheslav Yakovenko

01 overview-servlets-and-environment-setupdhrubo kayal

Automation Using Marvin Framework by Sowmya KrishnanRadhika Puthiyetath

Continuous delivery in AWSAnton Babenko

AWS Connect 2017 - Container (feat. AWS)smalltown

Installing WordPress on AWSManish Jain

Optimizing CakePHP 2.x AppsJuan Basso

Automating aws infrastructure and code deployments using Ansible @WebEngageVishal Uderani

JUST EAT: Embracing DevOpsPeter Mounce

Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...Corley S.r.l.

Using Amazon EC2 to Scale Your Web Applicationravipratapm

Дмитрий Иванов «Мое первое приложение в облаках или почему стоит использовать...DataArt

ECS위에 Log Server 구축하기AWSKRUG - AWS한국사용자모임

Domain's Robot Armydomaingroup

What's hot (20)

Docker on AWS OpsWorks

Version Control ThinkVitamin

Short journey into the serverless world

Learn ELK in docker

DevOps 2015 - Dancing with Chef

DevOps in a Regulated World - aka 'Ansible, AWS, and Jenkins'

EPAM :: LightingTalks :: Oct 2013

01 overview-servlets-and-environment-setup

Automation Using Marvin Framework by Sowmya Krishnan

Continuous delivery in AWS

AWS Connect 2017 - Container (feat. AWS)

Installing WordPress on AWS

Optimizing CakePHP 2.x Apps

Automating aws infrastructure and code deployments using Ansible @WebEngage

JUST EAT: Embracing DevOps

Deploy and Scale your PHP App with AWS ElasticBeanstalk and Docker- PHPTour L...

Using Amazon EC2 to Scale Your Web Application

Дмитрий Иванов «Мое первое приложение в облаках или почему стоит использовать...

ECS위에 Log Server 구축하기

Domain's Robot Army

Similar to Hands-off logging for OpenShift in AWS

ansible_rhel.pdfssuser6d347b

Inside Microsoft AzureErnest Mueller

AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...GeeksLab Odessa

Monitoring&Logging - Stanislav Kolenkin Kuberton

Clustered PHP - DC PHP 2009marcelesser

Scaling Django Apps using AWS Elastic BeanstalkLushen Wu

UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin

Database Management Assignment Help Database Homework Help

Building Your Docker Tech StackBret Fisher

Building your production tech stack for docker container platformDocker, Inc.

Get more than a cache back! - ConFoo MontrealMaarten Balliauw

Planning For High Performance Web ApplicationYue Tian

Overview of PaaS: Java experienceIgor Anishchenko

Overview of PaaS: Java experienceAlex Tumanoff

Succeding with the Apache SOA stackJohan Edstrom

JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongPROIDEA

AWS glue technical enablement trainingInfo Alchemy Corporation

Amazon ECS (December 2015)Julien SIMON

HTTP Plugin for MySQL!Ulf Wendel

StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.

Similar to Hands-off logging for OpenShift in AWS (20)

ansible_rhel.pdf

Inside Microsoft Azure

AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...

Monitoring&Logging - Stanislav Kolenkin

Clustered PHP - DC PHP 2009

Scaling Django Apps using AWS Elastic Beanstalk

UnConference for Georgia Southern Computer Science March 31, 2015

Database Management Assignment Help

Building Your Docker Tech Stack

Building your production tech stack for docker container platform

Get more than a cache back! - ConFoo Montreal

Planning For High Performance Web Application

Overview of PaaS: Java experience

Succeding with the Apache SOA stack

JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong

AWS glue technical enablement training

Amazon ECS (December 2015)

HTTP Plugin for MySQL!

StorageQuery: federated querying on object stores, powered by Alluxio and Presto

Recently uploaded

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

MYjobs Presentation Django-based projectAnoyGreter

What is Fashion PLM and Why Do You Need ItWave PLM

EY_Graph Database Powered SustainabilityNeo4j

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

Asset Management Software - InfographicHr365.us smith

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Recently uploaded (20)

Cloud Data Center Network Construction - IEEE

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

MYjobs Presentation Django-based project

What is Fashion PLM and Why Do You Need It

EY_Graph Database Powered Sustainability

The Evolution of Karaoke From Analog to App.pdf

Asset Management Software - Infographic

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Implementing Zero Trust strategy with Azure

Unveiling the Future: Sylius 2.0 New Features

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Der Spagat zwischen BIAS und FAIRNESS (2024)

Advancing Engineering with AI through the Next Generation of Strategic Projec...

英国UN学位证,北安普顿大学毕业证书1:1制作

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

Hands-off logging for OpenShift in AWS

1. Hands-off logging for OpenShift in AWS Lessons learned in Deloitte Platform Engineering Amir Moghimi, Lead Platform Engineer Jason Howard, Lead Platform Engineer

2. Who runs the logging stack? Your logging stack is running as apps inside your OpenShift platform and something goes wrong within the platform itself, how do you make sure you can still access the logs to diagnose the problem?

3. How do you guarantee you can access the logs and not miss entries due to load or maintenance issues? Tip ElasticSearch is not trivial software to run and maintain in production.

4. Meet James. He recently started doing a bit of operations as part of a devops team. He loved the ideas behind devops movement but had little operations experience.

5. A support ticket He gets a call to look into why OpenShift is not able to deploy any new containers. First thing he looks at is the logs in Kibana but only finds that there are no new log entries since hours ago.

6. An operational immaturity left James feeling frustrated and he quit the devops team. Tip Always remember the impact of developers not considering the operational complexities they introduce in the code.

7. Don’t run the full logging stack! Let someone else do it for you. (With a little help from AWS) Tip OpenShift built-in logging stack, i.e. EFK, can help you ship the logs to another logging stack.

8. OpenShift documentation: “Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-ela sticsearch-service plug-in.”

9. High-level Architecture OpenShift Fluentd DaemonSet Fluentd Aggregator Deployment AWS ElasticSearch

10. Then, James would find the logs in AWS ES He could get to the error message in registry pod that showed the disk is full. He was then able to go and free some disk space while thinking why openshift registry is not able to garbage collect images when the disk is full!

11. It’s no surprise we use AWS ElasticSearch regularly. There are many things that can go wrong with ES. Tip Have you heard of split brain where each node elects itself as the new master (thinking that the other master-eligible node has died) and the result is two clusters, or a split brain?

12. Make sure you configure the following parameters for Fluentd log shipper appropriately: buffer_chunk_limit buffer_queue_limit num_threads flush_interval Also the default 200M memory limit is too low and 400~500M is more reasonable under load. Tip AWS ES has a bulk upload limit, depending on the instance size. The buffer_chunk_limit times buffer_queue_limit should not exceed the bulk upload limit of your ES instance.

13. What’s next? Use Painless scripting in ES 5 for partial index updates in a fast and secure manner.

Hands-off logging for OpenShift in AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hands-off logging for OpenShift in AWS

Similar to Hands-off logging for OpenShift in AWS (20)

Recently uploaded

Recently uploaded (20)

Hands-off logging for OpenShift in AWS