FluentD vs. Logstash

FluentD vs. Logstash
Enterprise

Your Speaker
Justin Reock
Chief Evangelist and Field CTO
Gradle Enterprise
Justin has over 20 years of experience working in various
software roles including JEE work. He is an outspoken free
software evangelist, delivering enterprise solutions, technical
leadership, various publications and community education on
databases, architectures, and integration projects.
@justinreock

4 | FluentD vs. Logstash: How to Decide
⬢ Good apps are chatty and tell you a lot about what they are doing behind the scenes
⬢ Being able to read and interpret that information at scale is a leading challenge faced by enterprises,
and that problem is only going to get bigger
⬢ AI/ML, blockchain, edge computing and 5G, IoT, etc, are all going to increase by yet another order of
magnitude the amount of log data that our applications are generating
⬢ Enterprises often have thousands of apps creating log data
⬢ The best logging data in the world is useless when diluted by application noise from other areas of
the infrastructure
⬢ We need to collect, organize, visualize, and index our log data
⬢ Without a good system to do so we are constantly looking for needles in haystacks
⬢ A good enterprise log management system will allow us to parse, persist, and pinpoint specific events
from widely distributed heterogeneous systems quickly and easily
The Need for Enterprise Log Management

⬢ Adding to the challenge of scale, just the act of collecting logs can be tedious when dealing
with the reality of an enterprise application landscape devoid of standards
⬢ Some applications produce multi-line logs such as stack traces without clear delimiters
⬢ Attempts at standards have arisen, but these standards vary across languages – for instance
Java has log4j which formats differently than winston for Node.js
⬢ Bottom line, not all applications produce logs in the same format:
Challenges with Logging
2020-02-03 13:32:12 (info) [SalesCRM]
00230 Invalid password attempt for user
‘gmhopper’
2019-06-14T3:56:16.000+0000 [ERROR] AUTH:
(ServiceApp) Failed login attempt user
‘gmhopper’

⬢ As we move towards 12-factor practices as part of an industry shift to microservices, we will
hopefully begin to treat log messages as system events
⬢ Until then enterprises are left with a galaxy of applications that are still producing text logs
⬢ So, we have created log management frameworks that can emulate this pattern for us by parsing our
log files and centrally persisting them as timeseries events so that they can be searched and analyzed
easily
⬢ An effective centralized log manager should consider all of the necessary enterprise patterns such as
readiness and disaster recovery
⬢ In terms of functionality, the solution should provide:
⬢ An approachable and universal means of collecting, parsing, and preparing log data for storage
⬢ A reliable and consistent place to index and store logs for as long as they will need to be
retained and analyzed
⬢ A presentation layer capable of creating customizable and pragmatic dashboards for our data
which are hopefully visually appealing as well
Considerations for Centralized Log Management

⬢ Comprised of three technologies that perform
central log management:
⬢ ElasticSearch stores log data
⬢ Logstash parses and ships logs to ElasticSearch
⬢ Kibana searches ElasticSearch and visualizes
data
⬢ Arguably the most widely used and popular
modern open source central log manager
available
⬢ Enterprise features and options are available
for a license cost
Contenders
⬢ Provides a sophisticated engine for parsing and
shipping log data
⬢ Wide plugin base ensures ability to ‘fluently’
interpret event data from many endpoints including
logs
⬢ Log data can be intelligently routed to output
endpoints with tagging and routing rules
⬢ Not a full solution, must be combined with
persistence and visualization externally

⬢ Out of the three basic layers of log management: collection, storage, and visualization; collection is
arguably the most complicated aspect
⬢ With so much variance possible in the logging format as well as the widespread location of log files,
just getting all of the details such as parsing rules and coordinates is a significant effort for
enterprises
⬢ Persistence and visualization are well-abstracted and commoditized – i.e. we know how to store
normalized data easily, and we know how to visually interpret normalized data
⬢ So we will focus on the components of these two solutions that are responsible for collecting,
parsing and shipping logs
⬢ FluentD is a collector and shipper by design, and recall that Logstash is the component of the ELK
stack that is responsible for the same thing
⬢ In our presentation then, we will narrow our focus to exploring FluentD and Logstash and their
approach to the common business problem of ingesting heterogeneous log data
Log Collection is Hard

⬢ Note that the ELK Stack’s basic functionality has been extended through the addition of
its Enterprise “beats” plugins
⬢ This presentation will not consider that extended functionality because the beats plugins
are not free software
The Beats Plugin Debacle
Source code in this repository is variously licensed under the Apache License
Version 2.0, an Apache compatible license, or the Elastic License. Outside of
the "x-pack" folder, source code in a given file is licensed under the Apache
License Version 2.0, unless otherwise noted at the beginning of the file or a
LICENSE file present in the directory subtree declares a separate license.
Within the "x-pack" folder, source code in a given file is licensed under the
Elastic License, unless otherwise noted at the beginning of the file or a
LICENSE file present in the directory subtree declares a separate license.
The build produces two sets of binaries - one set that falls under the Elastic
License and another set that falls under Apache License Version 2.0. The
binaries that contain `-oss` in the artifact name are licensed under the Apache
License Version 2.0.
https://github.com/elastic/beats/blob/master/LICENSE.txt

⬢ FluentD’s plugins carry permissive licenses such as
ASF2.0 and MIT
⬢ Over 1100 plugins available of varying types:
⬢ Input/Output - For either ingesting log/event data or
outputting data to an endpoing
⬢ Filter – For normalizing or modifying data in-flight
⬢ Parser – Native parsers for specific data payload
formats
⬢ Formatter – Output plugins for converting stored data
to formats like JSON
FluentD Plugins

⬢ Now that we know a little about the problem we are trying to solve for, let’s vet a couple of
candidates
⬢ We will create a typical enterprise scenario, logging data from individual components of a full
web application stack
⬢ These log files will be in different areas of the system and be written to in various formats and
frequency
⬢ Logstash and FluentD will be utilized to collect and ship log data
⬢ We will pay attention to the individual setup of both solutions
⬢ Afterwards, we will draw some conclusions about the strengths and weaknesses of both
solutions
Summary – Let’s Put Them To The Test

Summary Comparison
So, we have seen two approaches to log collection and shipping
Logstash offers a simple architecture and setup, but, parsing using Grok has limitations and
there aren’t as many plugins
Routing in Logstash is done via simple queue which can become overwhelmed
FluentD’s plugin library makes it easier and more standard to parse logging data
But FluentD’s many moving parts can make the initial configuration and setup more challenging
Both solutions were able to ingest log data from all sources, but Logstash required more by-
hand work to achieve accurate parsing

⬢ This presentation focused on the user experience of ingesting and shipping logs, but, in a
production class enterprise system other factors should be considered as well:
High Availability
⬢ Logstash provides a protocol called Lumberjack which allows active/passive failover between multiple
instances. Active-active can be achieved through beats, but the aforementioned licensing issues exist
⬢ FluentDby contrast provides both native active-active and active-passive deployments with the ability to
forward-on-fail and ensure idempotency where necessary, which also allows for weighted load balancing
Other Considerations

Interoperability
⬢ Our use case just called for a single output endpoint, i.e. ElasticSearch, but what if we want to broadcast to
multiple endpoints?
⬢ Logstash allows us to achieve this with somewhat clunky conditional statements
⬢ FluentDprovides sophisticated tagging and routing of log data to multiple endpoints
Flexible inputs
⬢ Logstash focuses primarly on text log ingestion, but FluentDprovides input for messaging systems like Kafka,
or JMS-compliant ones like ActiveMQ, direct-from-JMX ingestion, RDBMS inspection ala pg_stat plugin, TCP
forwarding, HTTP/REST ingress, UNIX sockets, etc
⬢ The Fluent Bit project allows you to build lightweight forwarders into FluentDwhich helps achieve a better
distributed pattern
Other Considerations

⬢ In the end, consider what you want to do with your log management today and in the future
⬢ Will you be moving to a 12-factor standard or a microservices architecture eventually?
⬢ Is your enterprise becoming more fragmented or less fragmented as you grow?
⬢ What is more important to you? Sophistication or simplicity?
⬢ Both Logstash and FluentD provide exceptional functionality for ingesting logs
⬢ Logstash focuses on simplicity, but often lacks native parsing functionality
⬢ FluentD is highly sophisticated, but may be more challenging to configure initially
⬢ Elastic’s gravitation towards open-core with the Elastic license may be of concern to those who want to
avoid lock-in
⬢ Determining the best fit for your business will depend on having a concrete understanding of the curre
and future state of your infrastructure
Wrap-Up

https://github.com/jreock/logdemo-webapp
Demos Available

FluentD vs. Logstash

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FluentD vs. Logstash

Similar to FluentD vs. Logstash (20)

More from All Things Open

More from All Things Open (20)

Recently uploaded

Recently uploaded (20)

FluentD vs. Logstash