1. A ETL TOOLS REVIEW
D AV I D V. P.
L I N K E D I N .CO M / I N / DAV I D - V - P - 7 4 0 4 6 7 5 9
E D UA R D O P R I E TO VA LD I V I ES O
L I N K E D I N .CO M / I N / E D UA RD O P R I E TOVA LD I V I ES O
1
& COMPARISON
4. 4
INTRODUCTION (2)
• LOG COLLECTIONS: ALL LOGS IN ONE PLACE MAKES EASIER SEARCHING THROUGH MULTIPLE SERVERS AND
SIMPLIFYING LOG ANALYSIS AND CORRELATION TASKS.
• SECURITY: HAVING LOCAL LOGS ELSEWHERE FROM LOCAL SERVERS MEANS FAULT TOLERANCE, WHAT
INCREASES SECURITY.
• PREVENT DISK OVERLOADED: DISTRIBUTING LOGS FROM CORE SERVERS LEAVES THEM LESS BUSY TO
COMMIT THEIR CORE TASKS.
• HIGH AVAILABILITY: BETTER DISTRIBUTION OF TASKS IMPROVES SYSTEM AVAILABILITY.
• BETTER CONTROL OF THE RESOURCES: HAVING UNIFIED LOGGING LAYER ALLOWS TO MONITOR PROPERLY
THE USE OF THE INFRAESTRUCTURE.
UNIFIED LOGGING LAYER: KEY PIECE FOR A CENTRALIZED LOGGING SYSTEM
6. USE CASES (2)
6
• STORE DATA IN MULTIPLE SYSTEMS SUCH AS MONGODB, AMAZON S3, ELASTICSEARCH, HDFS AND MANY MORE FOR MANY
PURPOSES, INCLUDING FLEXIBLE ANALYTICS, ARCHIVING AND FULL-TEXT SEARCH.
• BUILD A SIMPLE COMPLEX EVENT PROCESSING SYSTEM LIKE SPLUNK TO SUPPORT EXPLORATORY DATA ANALYSIS ACROSS THE
ORGANIZATION.
• DEVELOP A REAL-TIME MONITORING SYSTEM BUT OPEN SOURCE THAT, UPON DETECTING A PROBLEM IN THE SYSTEM BY
FILTERING LOGS, GENERATES ALERTING AND SENDS OUT AN INCIDENT REPORT EMAIL TO THE TEAM VIA TWILLIO.
• CREATE A COST-EFFECTIVE, MORE FLEXIBLE ALTERNATIVE FOR WEB ANALYTICS TOOLS BY TURNING FLUENTD INTO A
CUSTOMIZABLE REST API ENDPOINT.
• COLLECT AND CORRELATE WEB SERVER ACCESS LOGS, APPLICATION ERROR LOGS AND WINDOWS EVENTS TO HELP
UNDERSTAND WHETHER SERVER/SYSTEM ISSUES COME FROM THE WEB SERVER/OS ITSELF OR AN APPLICATION-LEVEL BUG.
7. GOOD NEWS FROM FLUENTD
7
1. EASY INSTALLATION: SMALL (< 10MB) AND DISTRIBUTED AS RUBYGEMS, RPM AND DEB PACKAGES. 2.TAG-BASED ROUTING: SIMPLIFY AND SCALE
DATA PIPELINE MANAGEMENT 3.UNLIMITED CONNECTIVITY: TAKE ADVANTAGE OF 500+ COMMUNITY-CONTRIBUTED PLUGINS CONNECT IT TO
MANY DATA SOURCES AND DATA OUTPUTS. SOME FLUENTD USERS COLLECT DATA FROM THOUSANDS OF MACHINES IN REAL-TIME. THANKS TO ITS
SMALL MEMORY FOOTPRINT (30~40MB), YOU CAN SAVE A LOT OF MEMORY AT SCALE.
PROVEN RELIABILITY AND PERFORMANCE: 2,000+ DATA-DRIVEN COMPANIES RELY ON FLUENTD TO DIFFERENTIATE THEIR PRODUCTS AND SERVICES. IT
IS ONE OF THE DATA COLLECTION TOOLS RECOMMENDED BY AMAZON WEB SERVICES WHO NOTED IN THEIR 2013 WHITE PAPER THAT WHILE ITS
ARCHITECTURE IS VERY SIMILAR TO APACHE FLUME OR SCRIBE, FLUENTD HAD BETTER DOCUMENTATION AND SUPPORT AND WAS EASIER TO BOTH
INSTALL AND MAINTAIN. GOOGLE CLOUD PLATFORM'S BIGQUERY RECOMMENDS FLUENTD AS DEFAULT REAL-TIME DATA-INGESTION TOOL, AND USES
GOOGLE'S CUSTOMIZED VERSION OF FLUENTD, CALLED GOOGLE-FLUENTD, AS A DEFAULT LOGGING AGENT.
FLUENTD IS APACHE 2.0 LICENSED, FULLY OPEN SOURCE SOFTWARE. THAT MEANS YOUR IMAGINATION, NOT LICENSE RESTRICTIONS, IS THE LIMIT OF
WHAT YOU CAN ACHIEVE WITH FLUENTD. THE SOURCE CODE IS AVAILABLE ON GITHUB. THE MOST PERFORMANCE SENSITIVE PARTS OF FLUENTD ARE
WRITTEN IN C. THE RUBY CODE ACTS AS A WRAPPER THAT PROVIDES FLEXIBILITY TO THE OVERALL SOLUTION.
500+ COMMUNITY-CONTRIBUTED PLUGINS & 2,000+ DATA-DRIVEN COMPANIES. THE SOURCE CODE IS AVAILABLE ON GITHUB AND FLUENTD IS APACHE
2.0 LICENSED, FULLY OPEN SOURCE SOFTWARE.