Amebaサービスのログ解析基盤
Upcoming SlideShare
Loading in...5
×
 

Amebaサービスのログ解析基盤

on

  • 8,342 views

第一回 mixi × サイバーエージェント合同勉強会の発表資料です。

第一回 mixi × サイバーエージェント合同勉強会の発表資料です。

Statistics

Views

Total Views
8,342
Views on SlideShare
6,211
Embed Views
2,131

Actions

Likes
21
Downloads
84
Comments
0

14 Embeds 2,131

http://ameblo.jp 1112
http://brfrn169.hatenablog.com 509
http://d.hatena.ne.jp 419
http://infra.rrdtool.net 57
http://webcache.googleusercontent.com 11
http://www.slideshare.net 9
http://twitter.com 3
http://s.ameblo.jp 3
https://twitter.com 2
url_unknown 2
http://paper.li 1
http://blog.ameba.jp 1
http://slide.yoshiday.net 1
http://digg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Amebaサービスのログ解析基盤 Amebaサービスのログ解析基盤 Presentation Transcript

  • AmebaNeutralTechnologyGroup
  • • ( )• 27• NeutralTechnologyGroup( 4 )•• twitter @brfrn169 hatena brfrn169
  • • Hadoop/Hive Patriot• Flume + HBase Stinger
  • Hadoop/Hive Patirot
  • Patriot[2009 ]11 → → GO[2010 ]3711 WebUI
  • •• -• - -
  • Patriot• Ameba••
  • • - HDFS• - Hive(Map/Reduce)• - Patriot WebUI• - Hue
  • Patirot•••
  • Hadoop• HDFS - ( 64M) -• MapReuce - - - map (= ) reduce (= )
  • Hadoop ( ) Hadoop HDFS HDFS HDFS API DataNode HDFS map/reduceSecondary TaskTrackerNameNode HDFS map/reduce HDFS NameNode HDFS JobTracker DataNode HDFS map/reduce TaskTracker MapReduce map/reduce HDFS JobClient
  • Hive• Hadoop• Facebook• HiveQL SQL MapReduce• Pig( )
  • Hive• ‣ HiveQL - SQL MapReduce ‣ - - Derby - Patriot MySQL ‣ - Partition
  • Hive• pigg_login.log 2011-05-13 00:12:34 yamada_taro 2011-05-13 02:23:45 suzuki_ichiro 2011-05-13 03:34:56 brfrn169 2011-05-13 04:56:34 yamada_taro 2011-05-13 05:23:45 suzuki_ichiro 2011-05-13 06:45:56 yamada_taro 2011-05-13 07:56:23 yamada_hanako 2011-05-13 08:45:56 yamada_taro 2011-05-13 09:12:34 yamada_hanako
  • Hive• DDL CREATE TABLE pigg_login ( time STRING, ameba_id STRING) PARTITIONED BY (dt STRING), ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’, STORED AS TEXTFILE;• LOAD DATA INPATH ‘/path/pigg_login.log’ INTO TABLE pigg_login PARTITION (dt=‘2011-05-13’);
  • Hive• HiveQL ‣ UU( ) - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt=‘2011-05-13’; - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE ‘2011-05-__’; ‣ UU (JOIN, GROUP BY) - SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
  • Patriot• - 4 - - -
  • Patriot• Hive Hive Job Hadoop View DB MySQL
  • Patriot • namenode 2CoreCPU 16GB RAMsecondary namenode jobtracker 2CoreCPU 4CoreCPU 16GB RAM 24GB RAM datanode/jobtracker × 18 4CoreCPU 16GB RAM 1TB HDD × 4
  • Patriot• - CDH3u0(Hadoop0.20, Hive0.7, Hue1.2.0) - Puppet - Nagios, Ganglia - ExtJS3.2.1 - Hinemos 3.2
  • Patriot• DB SCP Hadoop • • gzip,SeqenceFile HDFS • Hive
  • Patriot• DSL (1) import { service "gyaos" backup_dir "/data/log/gyaos" data { type "scp" ← mysql hdfs servers ["172.xxx.yyy.zzz", " 172.xxx.yyy.zzz "] user "cy_httpd" path "/home/cy_httpd/logs/tomcat/lifelog/*.#{$dt}*" limit 10000 }
  • Patriot• DSL (2) load { type "hive" ← mysql table { name "game_login" regexp "^[^t]*t([^t]*)tlogin" output "$1" partition :dt => "#{$dt}", :service => "gyaos" } table { name "game_user" regexp "^([^t]*)t([^t]*)tregist_game" output "$2t$1" partition :dt => "#{$dt}", :service => "gyaos" }}}
  • Patriot• Hadoop Hive Job Batch DB MySQL
  • Patriot • DSLmysql { host "localhost" port 3306 username "patriot-batch" password "xxx database "gyaos"}analyze { name "gyaos_new_user_num_daily" primary "dt" hive_ql "select count(1), #{$dt} from game_user where dt=#{$dt} and service=gyaos"}analyze { name "gyaos_unregist_user_num_daily" primary "dt" hive_ql "select count(1), #{$dt} from game_user g join ameba_member a on (g.ameba_id =a.ameba_id) where a.unregist_date <> and to_date(a.unregist_date)=#{$dt} andg.service=gyaos"}
  • Patriot• ‣ HiveQL - - ‣ 20GB ‣
  • Patriot• WebUI( )
  • Patriot• WebUI( )
  • Patriot• WebUI( )
  • Patriot• Hue ‣ HiveQL WebUI ‣
  • Patriot• Hue
  • Patriot•• - Flume• DSL•• -
  • Flume + HBase Stinger
  • Stinger••• Flume + HBase
  • Flume• ‣ ‣ Cloudera ‣ Flume Agent ‣ Flume Collector ‣ Flume Master
  • Flume
  • HBase• ‣ Google BigTable ‣ HDFS ‣ HDFS /
  • HBase• ‣ Row Key( ) ‣ - - - ‣ -
  • HBase•
  • HBase•
  • HBase• ‣ ‣ - ‣ - /
  • Stinger• log flume master flume agent flume collector increment push polling websocket node + soket.io HBase
  • Stinger• HBase ‣ Row Key - md5( ID + )+ ID + Column Family : hourly Column Family : daily 12 am 12 am 12 am 12 am total login male 20’s total login male 20’s 100 35 10 12 100 35 10 12
  • Stinger• ‣
  • Stinger••
  • • Patriot • • Hadoop/Hive• Stinger • • Flume + HBase