SlideShare a Scribd company logo
1 of 69
Download to read offline
Masahiro Nakagawa
Apr 18, 2015
Game Server meetup #4
Fluentd /
Embulk
For reliable transfer
Who are you?
> Masahiro Nakagawa
> github/twitter: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> Living at OSS :)
> D language - Phobos committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of several meetups (Presto, DTM, etc…)
> etc…
Structured logging	

!
Reliable forwarding	

!
Pluggable architecture
http://fluentd.org/
What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> Working in production
> http://www.fluentd.org/testimonials
Background
Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Analytics Flow
Store Process
Cloudera
Horton Works
Treasure Data
Collect Visualize
Tableau
Excel
R
easier & shorter time
???
TD Service Architecture
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Dive into Concept
Divide & Conquer & Retry
error retry
error retry retry
retry
Batch
Stream
Other stream
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
FluentLog Server
High Latency!
must wait for a day...
Before…
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
Fluentd Fluentd Fluentd
Fluentd Fluentd
In streaming!
After…
Why JSON / MessagePack? (1
> Schema on Write (Traditional MPP DB)
> Writing data using schema for improving

query performance
> Pros
> minimum query overhead
> Cons
> Need to design schema and workload before
> Data load is expensive operation
Why JSON / MessagePack? (2
> Schema on Read (Hadoop)
> Writing data without schema and map schema
at query time
> Pros
> Robust over schema and workload change
> Data load is cheap operation
> Cons
> High overhead at query time
Features
Core Plugins
> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data

Core Plugins
> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data

Common	

Concerns
Use Case	

Specific
> default second unit
> from data source
Event structure(log message)
✓ Time
> for message routing
> where is from?
✓ Tag
> JSON format
> MessagePack

internally
> schema-free
✓ Record
Architecture (v0.12 or later)
EngineInput
Filter Output
Buffer
> grep
> record_transfomer	

> …
> Forward	

> File tail	

> ...
> Forward	

> File	

> ...
Output
> File	

> Memory
not pluggable
FormatterParser
Configuration and operation
> No central / master node
> @include helps configuration sharing
> Operation depends on your environment
> Use your deamon / deploy tools
> Use Chef in Treasure Data
> Apache like syntax
How to use
Setup fluentd (e.g. Ubuntu)
$ apt-get install ruby!
!
$ gem install fluentd!
!
$ edit fluent.conf!
!
$ fluentd -c fluent.conf
http://docs.fluentd.org/articles/faq#w-what-version-of-ruby-does-fluentd-support
Treasure Agent (td-agent)
> Treasure Data distribution of Fluentd
> include ruby, popular plugins and etc
> Treasure Agent 2 is current stable
> Recommend to use v2, not v1
> rpm, deb and dmg
> Latest version is 2.2.0 with fluentd v0.12
Setup td-agent
$ curl -L http://toolbelt.treasuredata.com/
sh/install-redhat-td-agent2.sh | sh!
!
$ edit /etc/td-agent/td-agent.conf!
!
$ sudo service td-agent start
See: http://docs.fluentd.org/categories/installation
Apache to Mongo
tail
insert
event
buffering
routing
127.0.0.1 - - [11/Dec/2014:07:26:27] "GET / ...
127.0.0.1 - - [11/Dec/2014:07:26:30] "GET / ...
127.0.0.1 - - [11/Dec/2014:07:26:32] "GET / ...
127.0.0.1 - - [11/Dec/2014:07:26:40] "GET / ...
127.0.0.1 - - [11/Dec/2014:07:27:01] "GET / ...
...
Fluentd
Web Server
2014-02-04 01:33:51	

apache.log	

{	

"host": "127.0.0.1",	

"method": "GET",	

...	

}
Plugins - use rubygems
$ fluent-gem search -rd fluent-plugin!
!
$ fluent-gem search -rd fluent-mixin!
!
$ fluent-gem install fluent-plugin-mongo
In td-agent:

/usr/sbin/td-agent-gem install fluent-plugin-mongo
# receive events via HTTP
<source>
@type http
port 8888
</source>
!
# read logs from a file
<source>
@type tail
path /var/log/httpd.log
format apache
tag apache.access
</source>
!
# save access logs to MongoDB
<match apache.access>
@type mongo
database apache
collection log
</match>
# save alerts to a file	

<match alert.**>	

@type file	

path /var/log/fluent/alerts	

</match>	

!
# forward other logs to servers	

<match **>	

@type forward	

<server>	

host 192.168.0.11	

weight 20	

</server>	

<server>	

host 192.168.0.12	

weight 60	

</server>	

</match>	

!
@include http://example.com/conf
> Apply filtering routine to event stream
> No more tag tricks!











Filter
<match access.**>	

@type record_reformer	

tag reformed.${tag}	

</match>	

!
<match reformed.**>	

@type growthforecast	

</match>
<filter access.**>	

@type record_transformer	

…	

</filter>
v0.10: v0.12:
<match access.**>	

@type growthforecast	

</match>
Before
After
or Embulk
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
buffering / processing / routing
M x N → M + N
Roadmap
> v0.10 (old stable)
> v0.12 (current stable)
> Filter / Label / At-least-once
> v0.14 (spring - early summer, 2015)
> New plugin APIs, ServerEngine, Time…
> v1 (summer - fall, 2015)
> Fix new features / APIs
https://github.com/fluent/fluentd/wiki/V1-Roadmap
Use-cases
Simple forwarding
# logs from a file	
<source>	
type tail	
path /var/log/httpd.log	
pos_file /tmp/pos_file	
format apache2	
tag backend.apache	
</source>	
!
# logs from client libraries	
<source>	
type forward	
port 24224	
</source>	
!
# store logs to MongoDB	
<match backend.*>	
type mongo	
database fluent	
collection test	
</match>
# Ruby!
Fluent.open(“myapp”)!
Fluent.event(“login”, {“user” => 38})!
#=> 2014-12-11 07:56:01 myapp.login {“user”:38}
> Ruby	

> Java	

> Perl	

> PHP	

> Python	

> D	

> Scala	

> ...
Client libraries
Less Simple Forwarding
- At-most-once / At-least-once

- HA (failover)	

- Load-balancing
All data
Near realtime and batch combo!
Hot data
# logs from a file	
<source>	
type tail	
path /var/log/httpd.log	
pos_file /tmp/pos_file	
format apache2	
tag web.access	
</source>	
!
# logs from client libraries	
<source>	
type forward	
port 24224	
</source>	
!
# store logs to ES and HDFS	
<match web.*>	
type copy	
<store>	
type elasticsearch	
logstash_format true	
</store>	
<store>	
type webhdfs	
host namenode	
port 50070	
path /path/on/hdfs/	
</store>	
</match>
CEP for Stream Processing
Norikra is a SQL based CEP engine: http://norikra.github.io/
Container Logging
> Kubernetes
!
!
!
!
!
> Google Compute Engine
> https://cloud.google.com/logging/docs/install/compute_install
Fluentd on Kubernetes / GCE
Treasure Data
Frontend
Job Queue
Worker
Hadoop
Presto
Fluentd
Applications push
metrics to Fluentd

(via local Fluentd)
Datadog
for realtime monitoring
Treasure Data
for historical analysis
Fluentd sums up data minutes

(partial aggregation)
hundreds of app servers
sends event logs
sends event logs
sends event logs
Rails app td-agent
td-agent
td-agent
Google
Spreadsheet
Treasure Data
MySQL
Logs are available
after several mins.
Daily/Hourly
Batch
KPI
visualizationFeedback rankings
Rails app
Rails app
Unlimited scalability
Flexible schema
Realtime
Less performance impact
Cookpad
✓ Over 100 RoR servers (2012/2/4)
Slideshare
http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/
Log Analysis System And its designs in LINE Corp. 2014 early
Line BusinessConnect
http://developers.linecorp.com/blog/?p=3386
Eco-system
fluent-bit
> Made for Embedded Linux
> OpenEmbedded & Yocto Project
> Intel Edison, RasPi & Beagle Black boards
> https://github.com/fluent/fluent-bit 
> Standalone application or Library mode
> Built-in plugins
> input: cpu, kmsg, output: fluentd
> First release at the end of Mar 2015
fluentd-forwarder
> Forwarding agent written in Go
> Focusing log forwarding to Fluentd
> Work on Windows
> Bundle TCP input/output and TD output
> No flexible plugin mechanizm
> We have a plan to add some input/output
> Similar product
> fluent-agent-lite, fluent-agent-hydra, ik
fluentd-ui
> Manage Fluentd instance via Web UI
> https://github.com/fluent/fluentd-ui











Bulk loading
!
Parallel processing
!
Pluggable architecture
http://embulk.org/
The problems at Treasure Data
> Treasure Data Service on the Cloud
> Customers want to try Treasure Data, but
> SEs write scripts to bulk load their data.
Hard work :(
> Customers want to migrate their big data, but
> Hard work :(
> Fluentd solved streaming data collection, but
> bulk data loading is another problem.
Embulk
> Bulk Loader version of Fluentd
> Pluggable architecture
> JRuby, JVM languages
> High performance parallel processing
> Share your script as a plugin
> https://github.com/embulk
The problems of bulk load
> Data cleaning (normalization)
> How to normalize broken records?
> Error handling
> How to remove broken records?
> Idempotent retrying
> How to retry without duplicated loading?
> Performance optimization
HDFS
MySQL
Amazon S3
Embulk
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
✓ Parallel execution
✓ Data validation
✓ Error recovery
✓ Deterministic behaviour
✓ Idempotent retrying
Plugins Plugins
bulk load
http://www.embulk.org/plugins/
How to use
Setup embulk (e.g. Linux/Mac)
$ curl --create-dirs -o ~/.embulk/bin/embulk
-L “http://dl.embulk.org/embulk-latest.jar"!
!
$ chmod +x ~/.embulk/bin/embulk!
!
$ echo 'export PATH="$HOME/.embulk/bin:
$PATH"' >> ~/.bashrc!
!
$ source ~/.bashrc
Try example
$ embulk example ./try1!
!
$ embulk guess ./example.yml -o config.yml!
!
$ embulk preview config.yml!
!
$ embulk run config.yml
# install
$ wget http://dl.embulk.org/embulk-latest.jar -O
embulk.jar
$ chmod 755 embulk.jar

!
# guess
$ vi example.yml
$ ./embulk guess example.yml

-o config.yml
Guess format & schema in:
type: file
path_prefix: /path/to/sample_
out:

type: stdout
in:
type: file
path_prefix: /path/to/sample_
decoders:
- {type: gzip}
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
skip_header_lines: 1
columns:
- {name: id, type: long}
- {name: account, type: long}
- {name: time, type: timestamp,
format: '%Y-%m-%d %H:%M:%S’}
- {name: purchase, type: timestamp,
format: ‘%Y%m%d'}
- {name: comment, type: string}
out:

type: stdout
guess
by guess plugins
# install
$ wget http://dl.embulk.org/embulk-latest.jar -O
embulk.jar
$ chmod 755 embulk.jar

!
# guess
$ vi example.yml
$ ./embulk guess example.yml

-o config.yml

!
# preview
$ ./embulk preview config.yml
$ vi config.yml # if necessary
+--------------------------------------+---------------+--------------------+
| time:timestamp | uid:long | word:string |
+--------------------------------------+---------------+--------------------+
| 2015-01-27 19:23:49 UTC | 32,864 | embulk |
| 2015-01-27 19:01:23 UTC | 14,824 | jruby |
| 2015-01-28 02:20:02 UTC | 27,559 | plugin |
| 2015-01-29 11:54:36 UTC | 11,270 | fluentd |
+--------------------------------------+---------------+--------------------+
Preview & fix config
# install
$ wget http://dl.embulk.org/embulk-latest.jar -O
embulk.jar
$ chmod 755 embulk.jar

!
# guess
$ vi example.yml
$ ./embulk guess example.yml

-o config.yml

!
# preview
$ ./embulk preview config.yml
$ vi config.yml # if necessary
!
# run
$ ./embulk run config.yml -o config.yml
exec: {}
in:
type: file
path_prefix: /path/to/sample_
decoders:
- {type: gzip}
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
skip_header_lines: 1
columns:
- {name: id, type: long}
- {name: account, type: long}
- {name: time, type: timestamp,

format: '%Y-%m-%d %H:%M:%S’}
- {name: purchase, type: timestamp,
format: ‘%Y%m%d'}
- {name: comment, type: string}
last_path: /path/to/sample_001.csv.gz
out:

type: stdout
Deterministic run
exec: {}
in:
type: file
path_prefix: /path/to/sample_
decoders:
- {type: gzip}
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
skip_header_lines: 1
columns:
- {name: id, type: long}
- {name: account, type: long}
- {name: time, type: timestamp,

format: '%Y-%m-%d %H:%M:%S’}
- {name: purchase, type: timestamp,
format: ‘%Y%m%d'}
- {name: comment, type: string}
last_path: /path/to/sample_01.csv.gz
out:

type: stdout
Repeat
# install
$ wget http://dl.embulk.org/embulk-latest.jar -O
embulk.jar
$ chmod 755 embulk.jar

!
# guess
$ vi example.yml
$ ./embulk guess example.yml

-o config.yml

!
# preview
$ ./embulk preview config.yml
$ vi config.yml # if necessary
!
# run
$ ./embulk run config.yml -o config.yml
!
# repeat
$ ./embulk run config.yml -o config.yml
$ ./embulk run config.yml -o config.yml
Use-cases
Quipper from GDS slide
Other cases
> Treasure Data
> Embulk worker for automatic import
> Web services
> Send existing logs to Elasticsearch
> Business / Batch systems
> Database to Database
> etc…
Check: treasuredata.com
Cloud service for the entire data pipeline

More Related Content

What's hot

MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
MongoDB
 
Node js presentation
Node js presentationNode js presentation
Node js presentation
martincabrera
 

What's hot (20)

PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
Recent Updates at Embulk Meetup #3
Recent Updates at Embulk Meetup #3Recent Updates at Embulk Meetup #3
Recent Updates at Embulk Meetup #3
 
[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리[214]유연하고 확장성 있는 빅데이터 처리
[214]유연하고 확장성 있는 빅데이터 처리
 
Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Amazed by aws 1st session
Amazed by aws 1st sessionAmazed by aws 1st session
Amazed by aws 1st session
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
 
Big Master Data PHP BLT #1
Big Master Data PHP BLT #1Big Master Data PHP BLT #1
Big Master Data PHP BLT #1
 
Spring Data MongoDB 介紹
Spring Data MongoDB 介紹Spring Data MongoDB 介紹
Spring Data MongoDB 介紹
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Fluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreFluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect More
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
Node.js
Node.jsNode.js
Node.js
 
Node js presentation
Node js presentationNode js presentation
Node js presentation
 

Viewers also liked

ゲームサーバ開発現場の考え方
ゲームサーバ開発現場の考え方ゲームサーバ開発現場の考え方
ゲームサーバ開発現場の考え方
Daisaku Mochizuki
 

Viewers also liked (13)

EmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤とEmbulkとDigdagとデータ分析基盤と
EmbulkとDigdagとデータ分析基盤と
 
分割と整合性と戦う
分割と整合性と戦う分割と整合性と戦う
分割と整合性と戦う
 
サーバーのおしごと
サーバーのおしごとサーバーのおしごと
サーバーのおしごと
 
負荷がたかいいんだから~♪(仮)
負荷がたかいいんだから~♪(仮)負荷がたかいいんだから~♪(仮)
負荷がたかいいんだから~♪(仮)
 
Imprementation of realtime_networkgame
Imprementation of realtime_networkgameImprementation of realtime_networkgame
Imprementation of realtime_networkgame
 
MMOのサーバについて 剣と魔法のログレス ~いにしえの女神~ での実装例
MMOのサーバについて 剣と魔法のログレス ~いにしえの女神~ での実装例MMOのサーバについて 剣と魔法のログレス ~いにしえの女神~ での実装例
MMOのサーバについて 剣と魔法のログレス ~いにしえの女神~ での実装例
 
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜 リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
 
Halo2 におけるHFSM(階層型有限状態マシン) 【ビヘイビアツリー解説】
Halo2 におけるHFSM(階層型有限状態マシン)  【ビヘイビアツリー解説】Halo2 におけるHFSM(階層型有限状態マシン)  【ビヘイビアツリー解説】
Halo2 におけるHFSM(階層型有限状態マシン) 【ビヘイビアツリー解説】
 
負荷対策しておもったことまとめ~JMeterでSocket.IOもいけるでよ~
負荷対策しておもったことまとめ~JMeterでSocket.IOもいけるでよ~負荷対策しておもったことまとめ~JMeterでSocket.IOもいけるでよ~
負荷対策しておもったことまとめ~JMeterでSocket.IOもいけるでよ~
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
自宅で出来る!ゲームサーバの作り方
自宅で出来る!ゲームサーバの作り方自宅で出来る!ゲームサーバの作り方
自宅で出来る!ゲームサーバの作り方
 
サーバー未経験者がソーシャルゲームを通して知ったサーバーの事
サーバー未経験者がソーシャルゲームを通して知ったサーバーの事サーバー未経験者がソーシャルゲームを通して知ったサーバーの事
サーバー未経験者がソーシャルゲームを通して知ったサーバーの事
 
ゲームサーバ開発現場の考え方
ゲームサーバ開発現場の考え方ゲームサーバ開発現場の考え方
ゲームサーバ開発現場の考え方
 

Similar to Fluentd and Embulk Game Server 4

Similar to Fluentd and Embulk Game Server 4 (20)

Fluentd - RubyKansai 65
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Fluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At FossasiaFluentd Unified Logging Layer At Fossasia
Fluentd Unified Logging Layer At Fossasia
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
 
Fluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUFluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EU
 
Fluentd - road to v1 -
Fluentd - road to v1 -Fluentd - road to v1 -
Fluentd - road to v1 -
 
Fluentd at HKOScon
Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOScon
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
 
Fluentd v0.12 master guide
Fluentd v0.12 master guideFluentd v0.12 master guide
Fluentd v0.12 master guide
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
SQL for Everything at CWT2014
SQL for Everything at CWT2014SQL for Everything at CWT2014
SQL for Everything at CWT2014
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
 
Rapid java backend and api development for mobile devices
Rapid java backend and api development for mobile devicesRapid java backend and api development for mobile devices
Rapid java backend and api development for mobile devices
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)
 

More from N Masahiro

More from N Masahiro (20)

Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalk
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Presto changes
Presto changesPresto changes
Presto changes
 
Fluentd v0.14 Overview
Fluentd v0.14 OverviewFluentd v0.14 Overview
Fluentd v0.14 Overview
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and Fluentd
 
How to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataHow to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdata
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Can you say the same words even in oss
Can you say the same words even in ossCan you say the same words even in oss
Can you say the same words even in oss
 
I am learing the programming
I am learing the programmingI am learing the programming
I am learing the programming
 
Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)Fluentd meetup dive into fluent plugin (outdated)
Fluentd meetup dive into fluent plugin (outdated)
 
D vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoyaD vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoya
 
Goodbye Doost
Goodbye DoostGoodbye Doost
Goodbye Doost
 
Final presentation at pfintern
Final presentation at pfinternFinal presentation at pfintern
Final presentation at pfintern
 
Kernel VM 5 LT
Kernel VM 5 LTKernel VM 5 LT
Kernel VM 5 LT
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 

Fluentd and Embulk Game Server 4

  • 1. Masahiro Nakagawa Apr 18, 2015 Game Server meetup #4 Fluentd / Embulk For reliable transfer
  • 2. Who are you? > Masahiro Nakagawa > github/twitter: @repeatedly > Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer > Living at OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of several meetups (Presto, DTM, etc…) > etc…
  • 3. Structured logging ! Reliable forwarding ! Pluggable architecture http://fluentd.org/
  • 4. What’s Fluentd? > Data collector for unified logging layer > Streaming data transfer based on JSON > Written in Ruby > Gem based various plugins > http://www.fluentd.org/plugins > Working in production > http://www.fluentd.org/testimonials
  • 6. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 7. Data Analytics Flow Store Process Cloudera Horton Works Treasure Data Collect Visualize Tableau Excel R easier & shorter time ???
  • 8. TD Service Architecture Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported
  • 10. Divide & Conquer & Retry error retry error retry retry retry Batch Stream Other stream
  • 13. Why JSON / MessagePack? (1 > Schema on Write (Traditional MPP DB) > Writing data using schema for improving
 query performance > Pros > minimum query overhead > Cons > Need to design schema and workload before > Data load is expensive operation
  • 14. Why JSON / MessagePack? (2 > Schema on Read (Hadoop) > Writing data without schema and map schema at query time > Pros > Robust over schema and workload change > Data load is cheap operation > Cons > High overhead at query time
  • 16. Core Plugins > Divide & Conquer
 > Buffering & Retrying
 > Error handling
 > Message routing
 > Parallelism > Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data

  • 17. Core Plugins > Divide & Conquer
 > Buffering & Retrying
 > Error handling
 > Message routing
 > Parallelism > Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data
 Common Concerns Use Case Specific
  • 18. > default second unit > from data source Event structure(log message) ✓ Time > for message routing > where is from? ✓ Tag > JSON format > MessagePack
 internally > schema-free ✓ Record
  • 19. Architecture (v0.12 or later) EngineInput Filter Output Buffer > grep > record_transfomer > … > Forward > File tail > ... > Forward > File > ... Output > File > Memory not pluggable FormatterParser
  • 20. Configuration and operation > No central / master node > @include helps configuration sharing > Operation depends on your environment > Use your deamon / deploy tools > Use Chef in Treasure Data > Apache like syntax
  • 22. Setup fluentd (e.g. Ubuntu) $ apt-get install ruby! ! $ gem install fluentd! ! $ edit fluent.conf! ! $ fluentd -c fluent.conf http://docs.fluentd.org/articles/faq#w-what-version-of-ruby-does-fluentd-support
  • 23. Treasure Agent (td-agent) > Treasure Data distribution of Fluentd > include ruby, popular plugins and etc > Treasure Agent 2 is current stable > Recommend to use v2, not v1 > rpm, deb and dmg > Latest version is 2.2.0 with fluentd v0.12
  • 24. Setup td-agent $ curl -L http://toolbelt.treasuredata.com/ sh/install-redhat-td-agent2.sh | sh! ! $ edit /etc/td-agent/td-agent.conf! ! $ sudo service td-agent start See: http://docs.fluentd.org/categories/installation
  • 25. Apache to Mongo tail insert event buffering routing 127.0.0.1 - - [11/Dec/2014:07:26:27] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:26:30] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:26:32] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:27:01] "GET / ... ... Fluentd Web Server 2014-02-04 01:33:51 apache.log { "host": "127.0.0.1", "method": "GET", ... }
  • 26. Plugins - use rubygems $ fluent-gem search -rd fluent-plugin! ! $ fluent-gem search -rd fluent-mixin! ! $ fluent-gem install fluent-plugin-mongo In td-agent:
 /usr/sbin/td-agent-gem install fluent-plugin-mongo
  • 27. # receive events via HTTP <source> @type http port 8888 </source> ! # read logs from a file <source> @type tail path /var/log/httpd.log format apache tag apache.access </source> ! # save access logs to MongoDB <match apache.access> @type mongo database apache collection log </match> # save alerts to a file <match alert.**> @type file path /var/log/fluent/alerts </match> ! # forward other logs to servers <match **> @type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server> </match> ! @include http://example.com/conf
  • 28. > Apply filtering routine to event stream > No more tag tricks!
 
 
 
 
 
 Filter <match access.**> @type record_reformer tag reformed.${tag} </match> ! <match reformed.**> @type growthforecast </match> <filter access.**> @type record_transformer … </filter> v0.10: v0.12: <match access.**> @type growthforecast </match>
  • 31. Nagios MongoDB Hadoop Alerting Amazon S3 Analysis Archiving MySQL Apache Frontend Access logs syslogd App logs System logs Backend Databases buffering / processing / routing M x N → M + N
  • 32. Roadmap > v0.10 (old stable) > v0.12 (current stable) > Filter / Label / At-least-once > v0.14 (spring - early summer, 2015) > New plugin APIs, ServerEngine, Time… > v1 (summer - fall, 2015) > Fix new features / APIs https://github.com/fluent/fluentd/wiki/V1-Roadmap
  • 35. # logs from a file <source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag backend.apache </source> ! # logs from client libraries <source> type forward port 24224 </source> ! # store logs to MongoDB <match backend.*> type mongo database fluent collection test </match>
  • 36. # Ruby! Fluent.open(“myapp”)! Fluent.event(“login”, {“user” => 38})! #=> 2014-12-11 07:56:01 myapp.login {“user”:38} > Ruby > Java > Perl > PHP > Python > D > Scala > ... Client libraries
  • 37. Less Simple Forwarding - At-most-once / At-least-once
 - HA (failover) - Load-balancing
  • 38. All data Near realtime and batch combo! Hot data
  • 39. # logs from a file <source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access </source> ! # logs from client libraries <source> type forward port 24224 </source> ! # store logs to ES and HDFS <match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store> </match>
  • 40. CEP for Stream Processing Norikra is a SQL based CEP engine: http://norikra.github.io/
  • 42. > Kubernetes ! ! ! ! ! > Google Compute Engine > https://cloud.google.com/logging/docs/install/compute_install Fluentd on Kubernetes / GCE
  • 43. Treasure Data Frontend Job Queue Worker Hadoop Presto Fluentd Applications push metrics to Fluentd
 (via local Fluentd) Datadog for realtime monitoring Treasure Data for historical analysis Fluentd sums up data minutes
 (partial aggregation)
  • 44. hundreds of app servers sends event logs sends event logs sends event logs Rails app td-agent td-agent td-agent Google Spreadsheet Treasure Data MySQL Logs are available after several mins. Daily/Hourly Batch KPI visualizationFeedback rankings Rails app Rails app Unlimited scalability Flexible schema Realtime Less performance impact Cookpad ✓ Over 100 RoR servers (2012/2/4)
  • 46. Log Analysis System And its designs in LINE Corp. 2014 early
  • 49. fluent-bit > Made for Embedded Linux > OpenEmbedded & Yocto Project > Intel Edison, RasPi & Beagle Black boards > https://github.com/fluent/fluent-bit > Standalone application or Library mode > Built-in plugins > input: cpu, kmsg, output: fluentd > First release at the end of Mar 2015
  • 50. fluentd-forwarder > Forwarding agent written in Go > Focusing log forwarding to Fluentd > Work on Windows > Bundle TCP input/output and TD output > No flexible plugin mechanizm > We have a plan to add some input/output > Similar product > fluent-agent-lite, fluent-agent-hydra, ik
  • 51. fluentd-ui > Manage Fluentd instance via Web UI > https://github.com/fluent/fluentd-ui
 
 
 
 
 

  • 52. Bulk loading ! Parallel processing ! Pluggable architecture http://embulk.org/
  • 53. The problems at Treasure Data > Treasure Data Service on the Cloud > Customers want to try Treasure Data, but > SEs write scripts to bulk load their data. Hard work :( > Customers want to migrate their big data, but > Hard work :( > Fluentd solved streaming data collection, but > bulk data loading is another problem.
  • 54. Embulk > Bulk Loader version of Fluentd > Pluggable architecture > JRuby, JVM languages > High performance parallel processing > Share your script as a plugin > https://github.com/embulk
  • 55. The problems of bulk load > Data cleaning (normalization) > How to normalize broken records? > Error handling > How to remove broken records? > Idempotent retrying > How to retry without duplicated loading? > Performance optimization
  • 56. HDFS MySQL Amazon S3 Embulk CSV Files SequenceFile Salesforce.com Elasticsearch Cassandra Hive Redis ✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behaviour ✓ Idempotent retrying Plugins Plugins bulk load http://www.embulk.org/plugins/
  • 57.
  • 58.
  • 60. Setup embulk (e.g. Linux/Mac) $ curl --create-dirs -o ~/.embulk/bin/embulk -L “http://dl.embulk.org/embulk-latest.jar"! ! $ chmod +x ~/.embulk/bin/embulk! ! $ echo 'export PATH="$HOME/.embulk/bin: $PATH"' >> ~/.bashrc! ! $ source ~/.bashrc
  • 61. Try example $ embulk example ./try1! ! $ embulk guess ./example.yml -o config.yml! ! $ embulk preview config.yml! ! $ embulk run config.yml
  • 62. # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml Guess format & schema in: type: file path_prefix: /path/to/sample_ out:
 type: stdout in: type: file path_prefix: /path/to/sample_ decoders: - {type: gzip} parser: charset: UTF-8 newline: CRLF type: csv delimiter: ',' quote: '"' skip_header_lines: 1 columns: - {name: id, type: long} - {name: account, type: long} - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S’} - {name: purchase, type: timestamp, format: ‘%Y%m%d'} - {name: comment, type: string} out:
 type: stdout guess by guess plugins
  • 63. # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml
 ! # preview $ ./embulk preview config.yml $ vi config.yml # if necessary +--------------------------------------+---------------+--------------------+ | time:timestamp | uid:long | word:string | +--------------------------------------+---------------+--------------------+ | 2015-01-27 19:23:49 UTC | 32,864 | embulk | | 2015-01-27 19:01:23 UTC | 14,824 | jruby | | 2015-01-28 02:20:02 UTC | 27,559 | plugin | | 2015-01-29 11:54:36 UTC | 11,270 | fluentd | +--------------------------------------+---------------+--------------------+ Preview & fix config
  • 64. # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml
 ! # preview $ ./embulk preview config.yml $ vi config.yml # if necessary ! # run $ ./embulk run config.yml -o config.yml exec: {} in: type: file path_prefix: /path/to/sample_ decoders: - {type: gzip} parser: charset: UTF-8 newline: CRLF type: csv delimiter: ',' quote: '"' skip_header_lines: 1 columns: - {name: id, type: long} - {name: account, type: long} - {name: time, type: timestamp,
 format: '%Y-%m-%d %H:%M:%S’} - {name: purchase, type: timestamp, format: ‘%Y%m%d'} - {name: comment, type: string} last_path: /path/to/sample_001.csv.gz out:
 type: stdout Deterministic run
  • 65. exec: {} in: type: file path_prefix: /path/to/sample_ decoders: - {type: gzip} parser: charset: UTF-8 newline: CRLF type: csv delimiter: ',' quote: '"' skip_header_lines: 1 columns: - {name: id, type: long} - {name: account, type: long} - {name: time, type: timestamp,
 format: '%Y-%m-%d %H:%M:%S’} - {name: purchase, type: timestamp, format: ‘%Y%m%d'} - {name: comment, type: string} last_path: /path/to/sample_01.csv.gz out:
 type: stdout Repeat # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml
 ! # preview $ ./embulk preview config.yml $ vi config.yml # if necessary ! # run $ ./embulk run config.yml -o config.yml ! # repeat $ ./embulk run config.yml -o config.yml $ ./embulk run config.yml -o config.yml
  • 68. Other cases > Treasure Data > Embulk worker for automatic import > Web services > Send existing logs to Elasticsearch > Business / Batch systems > Database to Database > etc…
  • 69. Check: treasuredata.com Cloud service for the entire data pipeline