FluentdとRedshiftの素敵な関係

FluentdとRedshiftの
素敵な関係
第18回 AWS User Group - Japan 東京勉強会
@just_do_neet

Redshiftは便利
•他の登壇者の方々が熱く語られていると想います
ので割愛......
•とはいえ不満点もある
•データをいかにS3/Redshiftまで運ぶか
•大量のデータを一括登録すると時間がかかる
•かといって細切れだと面倒
•自前でコントロールしようとすると手間
2

Fluentd
3

Fluentd
•OSSのlog collector
•導入のし易さ、性能、信頼性、拡張性++
•豊富なplugin
•ﬂuent-plugin-s3
•ﬂuent-plugin-redshift
4

fluent-plugin-redshift
5
•https://github.com/hapyrus/fluent-plugin-redshift/
•Redshiftにデータを登録できるFluentd plugin
•CSV/TSV/JSONなどに対応
•Redshiftへのデータ反映のタイミングを調整可能
(buffer_chunk_limit / flush_interval)
•chunk単位でS3にデータ保存→copyコマンドでRedshift
に反映

fluent-plugin-redshift
6
•https://github.com/hapyrus/fluent-plugin-redshift/
•Fluentdを介してRedshiftにデータを登録できる
plugin
•CSV/TSV/JSONなどに対応
•Redshiftへのデータ反映のタイミングを調整可能
(buffer_chunk_limit / flush_interval)
•chunk単位でS3にデータ保存→copyコマンドで
Redshiftに反映
<match
my.tag>

type
redshift

#
s3
(for
copying
data
to
redshift)

aws_key_id
YOUR_AWS_KEY_ID

aws_sec_key
YOUR_AWS_SECRET_KEY

s3_bucket
YOUR_S3_BUCKET

s3_endpoint
YOUR_S3_BUCKET_END_POINT

path
YOUR_S3_PATH

timestamp_key_format
year=%Y/month=%m/day=%d/hour=%H/%Y%m%d-‐%H%M

#
redshift

redshift_host
YOUR_AMAZON_REDSHIFT_CLUSTER_END_POINT

redshift_port
YOUR_AMAZON_REDSHIFT_CLUSTER_PORT

redshift_dbname
YOUR_AMAZON_REDSHIFT_CLUSTER_DATABASE_NAME

redshift_user
YOUR_AMAZON_REDSHIFT_CLUSTER_USER_NAME

redshift_password
YOUR_AMAZON_REDSHIFT_CLUSTER_PASSWORD

redshift_schemaname
YOUR_AMAZON_REDSHIFT_CLUSTER_TARGET_SCHEMA_NAME

redshift_tablename
YOUR_AMAZON_REDSHIFT_CLUSTER_TARGET_TABLE_NAME

file_type
[tsv|csv|json|msgpack]

#
buffer

buffer_type
file

buffer_path
/var/log/fluent/redshift

flush_interval
15m

buffer_chunk_limit
1g
</match>
redshift plugin
設定例

例１：nginxのログ
•nginxのアクセスログをFluentdを介してRedshift
に保存
•in_tail(ファイル読み込み) → out_redshift
7

に保存
8

log_format

ltsv

'time:$time_localt'

'host:$remote_addrt'

'req:$requestt'

'status:$statust'

'size:$body_bytes_sentt'

'referer:$http_referert'

'ua:$http_user_agentt';
time:02/Oct/2013:20:32:31
+0900

host:xxx.xxx.xxx.xxx

req:GET
/musicians/
famous/
HTTP/1.1

status:200

size:2172

referer:http://www.sada.co.jp/
index.html

ua:Mozilla/5.0
(iPhone;
CPU
iPhone
OS
7_0_2
like
Mac
OS
X)

AppleWebKit/537.51.1
(KHTML,
like
Gecko)
Version/7.0
Mobile/11A501
Safari/
9537.53
nginxのログフォー
マット

に保存
9
#
Redshift
DDL
create
table
access_log(

time
varchar(255),

host
varchar(255),

req
varchar(255),

status
integer,

size
integer,

referer
varchar(255),

ua
varchar(255)

);
Redshift上の
テーブル定義

に保存
10
#
from
access_log
<source>

type
tail

tag
nginx.access

format
ltsv

path
/var/log/nginx/access.log

pos_file
/var/log/fluentd/nginx_access.log.pos
</source>
#
to
Redshift

<match
nginx.access>

type
jsonbucket

out_tag
redshift.nginx.access

json_key
log
</match>
<match
redshift.nginx.access>

type
redshift

#
s3
(for
copying
data
to
redshift)

(snip.)

#
redshift

(snip.)

redshift_tablename
access_log

file_type
json

#
buffer

(snip.)
</match>
Fluentdの設定
in_tail out_redshift

に保存
11
#Fluentd
log_file
2013-‐10-‐04
20:33:16
+0900
[info]:
completed
copying
to
redshift.
s3_uri=s3://
xxxxxx/redshift/access_log/year=2013/month=10/day=04/
hour=20/20131004-‐2033_01.gz
Fluentdのlog
Redshiftへの書き込
み成功時に出力

に保存
12
redshift=#
select
*
from
access_log
limit
1;

time

|

host

|

req

|

status

|

size

|

referer

|

ua

|
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+

04/Oct/2013:20:32:31
+0900
|
xxx.xxx.xxx.xxx
|
GET
/musicians/famous/
HTTP/
1.1
|
200
|
2172
|
http://www.sada.co.jp/index.html
|
Mozilla/5.0
(iPhone;
CPU

iPhone
OS
7_0_2
like
Mac
OS
X)
(KHTML,
like
Gecko)

Version/7.0
Mobile/11A501
Safari/9537.53
Redshiftのテーブルの
中身

例２：地域情報の付与
•元のデータはそのまま、Fluentd内でデータを加
工してRedshiftに保存したい
•fluent-plugin-record-modifier
•fluent-plugin-time_parser
•fluent-plugin-reassemble
•fluent-plugin-geoip
13

•https://github.com/y-ken/ﬂuent-plugin-geoip
•MaxMind社提供のgeoipデータベースを用いて、
IPアドレスから地域情報（緯度・経度・都市名）
を取得しデータに付与
•データベースは有償・無償ともに使用可
14

15
#
Redshift
DDL
create
table
access_log(

time
varchar(255),

host
varchar(255),

req
varchar(255),

status
integer,

size
integer,

referer
varchar(255),

ua
varchar(255),

city
varchar(100),

latitude
real,

longitude
real
);
Redshift上の
テーブル定義
（地域情報を拡張）

16
#
add
location
info
<match
nginx.access>

type
geoip

geoip_lookup_key
host

enable_key_city
city

enable_key_latitude
latitude

enable_key_longitude
longitude

add_tag_prefix
geoip.
</match>
Fluentdの設定
out_geoip

18
tableau で可視化

•Fluentd x Redshiftについて。
•Fluentdを使うとデータの登録や加工が思いのまま
です。
•便利なので使いましょう。
•環境構築が面倒な方向けには「ﬂydata」という便
利なサービスがあるらしいですよ
まとめ
20

FluentdとRedshiftの素敵な関係

More Related Content

What's hot

Viewers also liked

Similar to FluentdとRedshiftの素敵な関係

More from moai kids

FluentdとRedshiftの素敵な関係