FluentD vs. Logstash
Enterprise
Your Speaker
Justin Reock
Chief Evangelist and Field CTO
Gradle Enterprise
Justin has over 20 years of experience working in various
software roles including JEE work. He is an outspoken free
software evangelist, delivering enterprise solutions, technical
leadership, various publications and community education on
databases, architectures, and integration projects.
@justinreock
Diving Right In….
4 | FluentD vs. Logstash: How to Decide
⬢ Good apps are chatty and tell you a lot about what they are doing behind the scenes
⬢ Being able to read and interpret that information at scale is a leading challenge faced by enterprises,
and that problem is only going to get bigger
⬢ AI/ML, blockchain, edge computing and 5G, IoT, etc, are all going to increase by yet another order of
magnitude the amount of log data that our applications are generating
⬢ Enterprises often have thousands of apps creating log data
⬢ The best logging data in the world is useless when diluted by application noise from other areas of
the infrastructure
⬢ We need to collect, organize, visualize, and index our log data
⬢ Without a good system to do so we are constantly looking for needles in haystacks
⬢ A good enterprise log management system will allow us to parse, persist, and pinpoint specific events
from widely distributed heterogeneous systems quickly and easily
The Need for Enterprise Log Management
5 | FluentD vs. Logstash: How to Decide
⬢ Adding to the challenge of scale, just the act of collecting logs can be tedious when dealing
with the reality of an enterprise application landscape devoid of standards
⬢ Some applications produce multi-line logs such as stack traces without clear delimiters
⬢ Attempts at standards have arisen, but these standards vary across languages – for instance
Java has log4j which formats differently than winston for Node.js
⬢ Bottom line, not all applications produce logs in the same format:
Challenges with Logging
2020-02-03 13:32:12 (info) [SalesCRM]
00230 Invalid password attempt for user
‘gmhopper’
2019-06-14T3:56:16.000+0000 [ERROR] AUTH:
(ServiceApp) Failed login attempt user
‘gmhopper’
6 | FluentD vs. Logstash: How to Decide
⬢ As we move towards 12-factor practices as part of an industry shift to microservices, we will
hopefully begin to treat log messages as system events
⬢ Until then enterprises are left with a galaxy of applications that are still producing text logs
⬢ So, we have created log management frameworks that can emulate this pattern for us by parsing our
log files and centrally persisting them as timeseries events so that they can be searched and analyzed
easily
⬢ An effective centralized log manager should consider all of the necessary enterprise patterns such as
readiness and disaster recovery
⬢ In terms of functionality, the solution should provide:
⬢ An approachable and universal means of collecting, parsing, and preparing log data for storage
⬢ A reliable and consistent place to index and store logs for as long as they will need to be
retained and analyzed
⬢ A presentation layer capable of creating customizable and pragmatic dashboards for our data
which are hopefully visually appealing as well
Considerations for Centralized Log Management
7 | FluentD vs. Logstash: How to Decide
⬢ Comprised of three technologies that perform
central log management:
⬢ ElasticSearch stores log data
⬢ Logstash parses and ships logs to ElasticSearch
⬢ Kibana searches ElasticSearch and visualizes
data
⬢ Arguably the most widely used and popular
modern open source central log manager
available
⬢ Enterprise features and options are available
for a license cost
Contenders
⬢ Provides a sophisticated engine for parsing and
shipping log data
⬢ Wide plugin base ensures ability to ‘fluently’
interpret event data from many endpoints including
logs
⬢ Log data can be intelligently routed to output
endpoints with tagging and routing rules
⬢ Not a full solution, must be combined with
persistence and visualization externally
8 | FluentD vs. Logstash: How to Decide
⬢ Out of the three basic layers of log management: collection, storage, and visualization; collection is
arguably the most complicated aspect
⬢ With so much variance possible in the logging format as well as the widespread location of log files,
just getting all of the details such as parsing rules and coordinates is a significant effort for
enterprises
⬢ Persistence and visualization are well-abstracted and commoditized – i.e. we know how to store
normalized data easily, and we know how to visually interpret normalized data
⬢ So we will focus on the components of these two solutions that are responsible for collecting,
parsing and shipping logs
⬢ FluentD is a collector and shipper by design, and recall that Logstash is the component of the ELK
stack that is responsible for the same thing
⬢ In our presentation then, we will narrow our focus to exploring FluentD and Logstash and their
approach to the common business problem of ingesting heterogeneous log data
Log Collection is Hard
9 | FluentD vs. Logstash: How to Decide
⬢ Note that the ELK Stack’s basic functionality has been extended through the addition of
its Enterprise “beats” plugins
⬢ This presentation will not consider that extended functionality because the beats plugins
are not free software
The Beats Plugin Debacle
Source code in this repository is variously licensed under the Apache License
Version 2.0, an Apache compatible license, or the Elastic License. Outside of
the "x-pack" folder, source code in a given file is licensed under the Apache
License Version 2.0, unless otherwise noted at the beginning of the file or a
LICENSE file present in the directory subtree declares a separate license.
Within the "x-pack" folder, source code in a given file is licensed under the
Elastic License, unless otherwise noted at the beginning of the file or a
LICENSE file present in the directory subtree declares a separate license.
The build produces two sets of binaries - one set that falls under the Elastic
License and another set that falls under Apache License Version 2.0. The
binaries that contain `-oss` in the artifact name are licensed under the Apache
License Version 2.0.
https://github.com/elastic/beats/blob/master/LICENSE.txt
10 | FluentD vs. Logstash: How to Decide
⬢ FluentD’s plugins carry permissive licenses such as
ASF2.0 and MIT
⬢ Over 1100 plugins available of varying types:
⬢ Input/Output - For either ingesting log/event data or
outputting data to an endpoing
⬢ Filter – For normalizing or modifying data in-flight
⬢ Parser – Native parsers for specific data payload
formats
⬢ Formatter – Output plugins for converting stored data
to formats like JSON
FluentD Plugins
11 | FluentD vs. Logstash: How to Decide
⬢ Now that we know a little about the problem we are trying to solve for, let’s vet a couple of
candidates
⬢ We will create a typical enterprise scenario, logging data from individual components of a full
web application stack
⬢ These log files will be in different areas of the system and be written to in various formats and
frequency
⬢ Logstash and FluentD will be utilized to collect and ship log data
⬢ We will pay attention to the individual setup of both solutions
⬢ Afterwards, we will draw some conclusions about the strengths and weaknesses of both
solutions
Summary – Let’s Put Them To The Test
12 | FluentD vs. Logstash: How to Decide
Summary Comparison
So, we have seen two approaches to log collection and shipping
Logstash offers a simple architecture and setup, but, parsing using Grok has limitations and
there aren’t as many plugins
Routing in Logstash is done via simple queue which can become overwhelmed
FluentD’s plugin library makes it easier and more standard to parse logging data
But FluentD’s many moving parts can make the initial configuration and setup more challenging
Both solutions were able to ingest log data from all sources, but Logstash required more by-
hand work to achieve accurate parsing
13 | FluentD vs. Logstash: How to Decide
⬢ This presentation focused on the user experience of ingesting and shipping logs, but, in a
production class enterprise system other factors should be considered as well:
High Availability
⬢ Logstash provides a protocol called Lumberjack which allows active/passive failover between multiple
instances. Active-active can be achieved through beats, but the aforementioned licensing issues exist
⬢ FluentDby contrast provides both native active-active and active-passive deployments with the ability to
forward-on-fail and ensure idempotency where necessary, which also allows for weighted load balancing
Other Considerations
14 | FluentD vs. Logstash: How to Decide
Interoperability
⬢ Our use case just called for a single output endpoint, i.e. ElasticSearch, but what if we want to broadcast to
multiple endpoints?
⬢ Logstash allows us to achieve this with somewhat clunky conditional statements
⬢ FluentDprovides sophisticated tagging and routing of log data to multiple endpoints
Flexible inputs
⬢ Logstash focuses primarly on text log ingestion, but FluentDprovides input for messaging systems like Kafka,
or JMS-compliant ones like ActiveMQ, direct-from-JMX ingestion, RDBMS inspection ala pg_stat plugin, TCP
forwarding, HTTP/REST ingress, UNIX sockets, etc
⬢ The Fluent Bit project allows you to build lightweight forwarders into FluentDwhich helps achieve a better
distributed pattern
Other Considerations
15 | FluentD vs. Logstash: How to Decide
⬢ In the end, consider what you want to do with your log management today and in the future
⬢ Will you be moving to a 12-factor standard or a microservices architecture eventually?
⬢ Is your enterprise becoming more fragmented or less fragmented as you grow?
⬢ What is more important to you? Sophistication or simplicity?
⬢ Both Logstash and FluentD provide exceptional functionality for ingesting logs
⬢ Logstash focuses on simplicity, but often lacks native parsing functionality
⬢ FluentD is highly sophisticated, but may be more challenging to configure initially
⬢ Elastic’s gravitation towards open-core with the Elastic license may be of concern to those who want to
avoid lock-in
⬢ Determining the best fit for your business will depend on having a concrete understanding of the curre
and future state of your infrastructure
Wrap-Up
16 | FluentD vs. Logstash: How to Decide
https://github.com/jreock/logdemo-webapp
Demos Available
Questions?
Thank you!
jreock@gradle.com

FluentD vs. Logstash

  • 1.
  • 2.
    Your Speaker Justin Reock ChiefEvangelist and Field CTO Gradle Enterprise Justin has over 20 years of experience working in various software roles including JEE work. He is an outspoken free software evangelist, delivering enterprise solutions, technical leadership, various publications and community education on databases, architectures, and integration projects. @justinreock
  • 3.
  • 4.
    4 | FluentDvs. Logstash: How to Decide ⬢ Good apps are chatty and tell you a lot about what they are doing behind the scenes ⬢ Being able to read and interpret that information at scale is a leading challenge faced by enterprises, and that problem is only going to get bigger ⬢ AI/ML, blockchain, edge computing and 5G, IoT, etc, are all going to increase by yet another order of magnitude the amount of log data that our applications are generating ⬢ Enterprises often have thousands of apps creating log data ⬢ The best logging data in the world is useless when diluted by application noise from other areas of the infrastructure ⬢ We need to collect, organize, visualize, and index our log data ⬢ Without a good system to do so we are constantly looking for needles in haystacks ⬢ A good enterprise log management system will allow us to parse, persist, and pinpoint specific events from widely distributed heterogeneous systems quickly and easily The Need for Enterprise Log Management
  • 5.
    5 | FluentDvs. Logstash: How to Decide ⬢ Adding to the challenge of scale, just the act of collecting logs can be tedious when dealing with the reality of an enterprise application landscape devoid of standards ⬢ Some applications produce multi-line logs such as stack traces without clear delimiters ⬢ Attempts at standards have arisen, but these standards vary across languages – for instance Java has log4j which formats differently than winston for Node.js ⬢ Bottom line, not all applications produce logs in the same format: Challenges with Logging 2020-02-03 13:32:12 (info) [SalesCRM] 00230 Invalid password attempt for user ‘gmhopper’ 2019-06-14T3:56:16.000+0000 [ERROR] AUTH: (ServiceApp) Failed login attempt user ‘gmhopper’
  • 6.
    6 | FluentDvs. Logstash: How to Decide ⬢ As we move towards 12-factor practices as part of an industry shift to microservices, we will hopefully begin to treat log messages as system events ⬢ Until then enterprises are left with a galaxy of applications that are still producing text logs ⬢ So, we have created log management frameworks that can emulate this pattern for us by parsing our log files and centrally persisting them as timeseries events so that they can be searched and analyzed easily ⬢ An effective centralized log manager should consider all of the necessary enterprise patterns such as readiness and disaster recovery ⬢ In terms of functionality, the solution should provide: ⬢ An approachable and universal means of collecting, parsing, and preparing log data for storage ⬢ A reliable and consistent place to index and store logs for as long as they will need to be retained and analyzed ⬢ A presentation layer capable of creating customizable and pragmatic dashboards for our data which are hopefully visually appealing as well Considerations for Centralized Log Management
  • 7.
    7 | FluentDvs. Logstash: How to Decide ⬢ Comprised of three technologies that perform central log management: ⬢ ElasticSearch stores log data ⬢ Logstash parses and ships logs to ElasticSearch ⬢ Kibana searches ElasticSearch and visualizes data ⬢ Arguably the most widely used and popular modern open source central log manager available ⬢ Enterprise features and options are available for a license cost Contenders ⬢ Provides a sophisticated engine for parsing and shipping log data ⬢ Wide plugin base ensures ability to ‘fluently’ interpret event data from many endpoints including logs ⬢ Log data can be intelligently routed to output endpoints with tagging and routing rules ⬢ Not a full solution, must be combined with persistence and visualization externally
  • 8.
    8 | FluentDvs. Logstash: How to Decide ⬢ Out of the three basic layers of log management: collection, storage, and visualization; collection is arguably the most complicated aspect ⬢ With so much variance possible in the logging format as well as the widespread location of log files, just getting all of the details such as parsing rules and coordinates is a significant effort for enterprises ⬢ Persistence and visualization are well-abstracted and commoditized – i.e. we know how to store normalized data easily, and we know how to visually interpret normalized data ⬢ So we will focus on the components of these two solutions that are responsible for collecting, parsing and shipping logs ⬢ FluentD is a collector and shipper by design, and recall that Logstash is the component of the ELK stack that is responsible for the same thing ⬢ In our presentation then, we will narrow our focus to exploring FluentD and Logstash and their approach to the common business problem of ingesting heterogeneous log data Log Collection is Hard
  • 9.
    9 | FluentDvs. Logstash: How to Decide ⬢ Note that the ELK Stack’s basic functionality has been extended through the addition of its Enterprise “beats” plugins ⬢ This presentation will not consider that extended functionality because the beats plugins are not free software The Beats Plugin Debacle Source code in this repository is variously licensed under the Apache License Version 2.0, an Apache compatible license, or the Elastic License. Outside of the "x-pack" folder, source code in a given file is licensed under the Apache License Version 2.0, unless otherwise noted at the beginning of the file or a LICENSE file present in the directory subtree declares a separate license. Within the "x-pack" folder, source code in a given file is licensed under the Elastic License, unless otherwise noted at the beginning of the file or a LICENSE file present in the directory subtree declares a separate license. The build produces two sets of binaries - one set that falls under the Elastic License and another set that falls under Apache License Version 2.0. The binaries that contain `-oss` in the artifact name are licensed under the Apache License Version 2.0. https://github.com/elastic/beats/blob/master/LICENSE.txt
  • 10.
    10 | FluentDvs. Logstash: How to Decide ⬢ FluentD’s plugins carry permissive licenses such as ASF2.0 and MIT ⬢ Over 1100 plugins available of varying types: ⬢ Input/Output - For either ingesting log/event data or outputting data to an endpoing ⬢ Filter – For normalizing or modifying data in-flight ⬢ Parser – Native parsers for specific data payload formats ⬢ Formatter – Output plugins for converting stored data to formats like JSON FluentD Plugins
  • 11.
    11 | FluentDvs. Logstash: How to Decide ⬢ Now that we know a little about the problem we are trying to solve for, let’s vet a couple of candidates ⬢ We will create a typical enterprise scenario, logging data from individual components of a full web application stack ⬢ These log files will be in different areas of the system and be written to in various formats and frequency ⬢ Logstash and FluentD will be utilized to collect and ship log data ⬢ We will pay attention to the individual setup of both solutions ⬢ Afterwards, we will draw some conclusions about the strengths and weaknesses of both solutions Summary – Let’s Put Them To The Test
  • 12.
    12 | FluentDvs. Logstash: How to Decide Summary Comparison So, we have seen two approaches to log collection and shipping Logstash offers a simple architecture and setup, but, parsing using Grok has limitations and there aren’t as many plugins Routing in Logstash is done via simple queue which can become overwhelmed FluentD’s plugin library makes it easier and more standard to parse logging data But FluentD’s many moving parts can make the initial configuration and setup more challenging Both solutions were able to ingest log data from all sources, but Logstash required more by- hand work to achieve accurate parsing
  • 13.
    13 | FluentDvs. Logstash: How to Decide ⬢ This presentation focused on the user experience of ingesting and shipping logs, but, in a production class enterprise system other factors should be considered as well: High Availability ⬢ Logstash provides a protocol called Lumberjack which allows active/passive failover between multiple instances. Active-active can be achieved through beats, but the aforementioned licensing issues exist ⬢ FluentDby contrast provides both native active-active and active-passive deployments with the ability to forward-on-fail and ensure idempotency where necessary, which also allows for weighted load balancing Other Considerations
  • 14.
    14 | FluentDvs. Logstash: How to Decide Interoperability ⬢ Our use case just called for a single output endpoint, i.e. ElasticSearch, but what if we want to broadcast to multiple endpoints? ⬢ Logstash allows us to achieve this with somewhat clunky conditional statements ⬢ FluentDprovides sophisticated tagging and routing of log data to multiple endpoints Flexible inputs ⬢ Logstash focuses primarly on text log ingestion, but FluentDprovides input for messaging systems like Kafka, or JMS-compliant ones like ActiveMQ, direct-from-JMX ingestion, RDBMS inspection ala pg_stat plugin, TCP forwarding, HTTP/REST ingress, UNIX sockets, etc ⬢ The Fluent Bit project allows you to build lightweight forwarders into FluentDwhich helps achieve a better distributed pattern Other Considerations
  • 15.
    15 | FluentDvs. Logstash: How to Decide ⬢ In the end, consider what you want to do with your log management today and in the future ⬢ Will you be moving to a 12-factor standard or a microservices architecture eventually? ⬢ Is your enterprise becoming more fragmented or less fragmented as you grow? ⬢ What is more important to you? Sophistication or simplicity? ⬢ Both Logstash and FluentD provide exceptional functionality for ingesting logs ⬢ Logstash focuses on simplicity, but often lacks native parsing functionality ⬢ FluentD is highly sophisticated, but may be more challenging to configure initially ⬢ Elastic’s gravitation towards open-core with the Elastic license may be of concern to those who want to avoid lock-in ⬢ Determining the best fit for your business will depend on having a concrete understanding of the curre and future state of your infrastructure Wrap-Up
  • 16.
    16 | FluentDvs. Logstash: How to Decide https://github.com/jreock/logdemo-webapp Demos Available
  • 17.
  • 18.