Mar 21, 2014
www.treasuredata.com/
Fluentd &
AWS!
Masahiro Nakagawa
Treasure Data, Inc
1
Who are you?
• Masahiro Nakagawa
• @repeatedly
• Treasure Data, Inc
• Senior Software Engineer
• Fluentd, td-agent, etc...
• Dlang, MessagePack, ...
2
Treasure Data on AWS
4
Frontend
Queue
Worker
Hadoop
Fluentd
Applications push
metrics to Fluentd
(via local Fluentd)
Librato Metrics
for realtime analysis
Treasure
Data
for historical analysis
Fluentd sums up data minutes
(partial aggregation)
Backend overview
Impala
Presto
Hadoop
Used AWS products
• RDS
• Store service data
• Queue / Scheduler
• S3
• Columnar storage
• EC2
• Clusters: Hadoop,Workers,APIs, etc…
6
Separate
Storage and Processor!
Classmethod use case!
7
Fluentd
(Treasure Agent)
8
Structured logging
Reliable forwarding
Pluggable architecture
http://fluentd.org/
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Processing
Related Products
Store Process
Cloudera
Horton Works
Treasure Data
Collect Visualize
Tableau
Excel
R
easier & shorter time
???
Before…
12
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
FluentLog Server
High Latency!
must wait for a day...
Divide & Conquer & Retry
13
error retry
error retry retry
retry
After!
14
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
Fluentd Fluentd Fluentd
Fluentd Fluentd
In streaming!
Lambda Architecture
15
http://www.drdobbs.com/database/applying-the-big-data-lambda-architectur/240162604
In short
• Open sourced log collector written in Ruby
• Customization is essential
small core + many plugins
16
Fluentd is a robust log collector
designed for processing data streams
Core Plugins
• Divide & Conquer
• Buffring & Retrying
• Error handling
• Message routing
• Parallelize
• read / receive data
• write / send data
17
M x N → M + N
18
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
buffer / buffer / routing
Pluggable Architecture
19
Buffer Output
Input
> Forward
> HTTP
> File tail
> dstat
> ...
> Forward
> File
> MongoDB
> ...
> File
> Memory
Engine
Output
> rewrite
> ...
Pluggable Pluggable
Next release
20
• Fluentd v0.10.45
• in_tail supports multiline and * watch
• in_exec supports json / msgpack
• several fixes
• td-agent 1.1.19
AWS use cases
21
Collecting instance logs
22
• A sign of Immutable Infrastructure
• Hard to manage state-full instance
• Almost instance should be disposable
• Excluding DB, Master, etc...
• How to manage such instance logs?
• Common problem on Cloud environment
• Start Fluentd at launch phase
• It is also useful for Docker / other containers
• Including metadata or host to identify
Collecting using Fluentd
23
Collector Aggregator
AWS Plugins
24
http://fluentd.org/plugin/
• s3
• dynamodb
• redshift
• rds
• elb
• cloudwatch
• sns
• sqs
• ses
• kinesis (soon!)

Fluentd and AWS at classmethod