Graylog @ ENA

•Download as PPTX, PDF•

1 like•144 views

Daniel Vaughan

A description of our initial experience of using Graylog with the ENA submission system.

Software

Graylog @ ENA
@DanielVaughan
www.ebi.ac.uk

● Submitters must report errors
● Focuses on individual errors
● Verifying resolution is hard

• Detecting trends
• Resolving
• Preventing
• Reacting

1. Unexpected file format
2. Unexpected file type
3. Unexpected content

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Asset Management Software - InfographicHr365.us smith

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

What is Binary Language? Computer Number SystemsJheuzeDellosa

buds n tech IT solutionsmonugehlot87

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Asset Management Software - Infographic

The Evolution of Karaoke From Analog to App.pdf

Intelligent Home Wi-Fi Solutions | ThinkPalm

why an Opensea Clone Script might be your perfect match.pdf

Salesforce Certified Field Service Consultant

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Cloud Management Software Platforms: OpenStack

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

What is Binary Language? Computer Number Systems

buds n tech IT solutions

Der Spagat zwischen BIAS und FAIRNESS (2024)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Engage Usergroup 2024 - The Good The Bad_The Ugly

Featured

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

Featured (20)

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

Graylog @ ENA

1. Graylog @ ENA @DanielVaughan www.ebi.ac.uk

5. ● Submitters must report errors ● Focuses on individual errors ● Verifying resolution is hard

7. • Detecting trends • Resolving • Preventing • Reacting

8. search, alerting, analysis

9. Search

10. Alerting

11. Analysis

12.

13. 1. Unexpected file format 2. Unexpected file type 3. Unexpected content

14.

15.

16.

17. Conclusion

18. Questions?

Editor's Notes

In recent months at ENA we have been using Graylog, a log aggregation system. We use it to keep track of our interactive and programmatic submission systems. In this talk I want to share what we have learnt so far and to encourage you to try it with your own projects. So why did we start using Graylog?
Our submission systems have been around for a long time, and we get errors. More specifically our submitters get errors and report them to us through our helpdesk. Two types of submitters tend to go to the trouble of reporting: The most helpful submitters The most frustrated submitters There is also a group of submitters who experience errors but don’t report them. We don’t know how many there are but these are, but these are the ones more likely to give up and and submit through NCBI instead. Helpdesk creates JIRAs for each error reported and we work through them. In recent years we have moved away from prioritising based on who shouts the loudest and now attempt to prioritise by potential impact. However, we are still reactive and tend to deal with a single error in isolation.
When we receive an error, the detective work starts. An error report can be detailed but it also can be as simple as “I got error when submitting”. I first spend time finding out submitter details. I then delve into the logs on each server looking for stack traces and other logging messages. When I have pieced this together what has happened it usually does not take too long to get a fix deployed. We inform the submitter and I move onto the next JIRA in the queue. Sometimes the submitter will come back and let us know that the problem has been resolved, sometimes they don’t.
There are limitations with this approach: It relies on submitters reporting errors It focuses on individual errors It is hard to verify that an error has been fixed once and for all
The most import point is that submitters experience errors It is very easy to get into thinking that because we are fixing errors promptly and submitters are grateful that we are doing a good job. I don’t believe we are. Our submission systems should be like air conditioning. You probably have not thought about the air conditioning in this room today as it is working fine. It is boring. It has faded into the background. You have forgotten that a complicated system is there. Our submission systems should be like. They should work so smoothly that submitters think they are trivial.
This is why we started looking at Graylog. We wanted to move away from reacting to individual error reports. Instead we wanted to move towards detecting trends, resolving classes of error and preventing errors before submitters notice.
Graylog provides a central destination to receive and store all logging messages from our applications. It then provides: search alerting analysis
With search, when a submitter reports an error we don’t need to go digging around in server logs any more. We search Graylog. When we have found the error we can then see all occurrences. We can also see all logging information around the errors giving us the context we need to understand them. Setting up monitors to track the error and get real-time counts and graphs of when the error is occuring means that when we deploy a fix we can see the graph drop to zero. We are still reactive to error reports but we are much more effective in resolving them.
We can also define a baseline indicators and display them on a dashboard. If we normally get 10 submissions per hour and it drops to 0 we know something is likely to be wrong. If failed logins rise from 10 per hour to 1000 per hour likewise we know something is amiss. We can set up alerts to flag this unusual activity to slack or email. We can deal with it before submitters start contacting the helpdesk. We become proactive.
For me, where Graylog comes into its own though is with analysis. We can model the journey of submitters through the system. By creating and logging messages that act as checkpoints we can see where submitters are getting stuck and this can be very enlightening. We can spot trends and use that to target our efforts on error prevention. For examples, one option in our submission processes is for the submitter to download a template spreadsheet for entering sample data. They then return and upload the completed spreadsheet.
With logging in place we found that over half the attempts to upload a spreadsheet were failing. Submitters saw a message telling them the spreadsheet was not in the right format. They were not reporting it as an error. However, it was certainly a source of problems as we could see from the logs that in many cases it prevented them continuing with their submission. When we looked at the screen in question we saw there was simply a button named ’Upload Spreadsheet’. There was no indication of what type of spreadsheet we were expecting.
In Graylog we were able to group failures into three categories: Unexpected spreadsheet format - submitters were uploading Excel instead of CSV Unexpected type of file - submitters were uploading non-spreadsheet files. This issue first came to our attention when a submitter tried to upload a 150MB fasta file. The server ran out of memory trying to parse it. Unexpected content - the spreadsheets were the submitters’ own spreadsheets and not based on our templates
This feature was causing a very high failure rate but the fix was simple: Restrict the file extensions that can be uploaded Provide clear explanation text of what the upload spreadsheet function was for. When we applied this change we were able to use Graylog to monitor the failure rate. We could prove that this change eliminated file related failures. Content failures also were reduced to a fraction of what they were before avoiding and lot of submitter confusion. These are some reasons we like Graylog.
In this talk I wanted to concentrate more on why Graylog was useful to us rather than how it works. There is much better documentation and video online than I can create but I will give a very brief overview. Graylog is a standalone server that uses Elasticsearch and MongoDB. Log messages are collected in a JSON format called GELF and stored in MongoDB. Elasticsearch provides the search engine. Our instance is installed on a standard technical services provided VM. It took me a morning to setup manually. If you can use Docker or Amazon Web Services it appears much more straight-forward as you can just download pre-configured images. Graylog itself then provides a web interface that allows setting up of inputs, streams, alerts and dashboards.
There are several different options for getting data into Graylog. We are just using a single input that takes GELF messages on a UDP port. We use a library for Logback that provides an appender to convert logging messages into GELF formatted messages and delivers them to a specified host and port. We then use Slack for our alerts. There is a library of official and third party plugins, from big screen visualization to automatic creation of JIRAs but we are using Graylog out of the box at the moment.
It is very early days for us with Graylog. We are still learning, still exploring what is possible. We realise we are currently using a small fraction of its potential. For example it can cope with 100k messages/second and we are using 100/second at most. We have not even started at looking at plugins so we have only scratched the surface of what it can do. However, it already shows real promise as it enables a different way of working that could potentially make a big different to submitter’s experience of our services. We hope they will first notice a more responsive and efficient service when they do encounter errors and in time find that they are not experiencing errors at all. I encourage anyone who would like to try Graylog in their team to do so and share your experience. We have limited time to spend on Graylog in our group so help will be invaluable. If enough people are using it and finding it useful it may be something we can ask to be managed centrally.

Graylog @ ENA

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Graylog @ ENA

Editor's Notes