Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Amebaサービスのログ解析基盤

10,796 views

Published on

第一回 mixi × サイバーエージェント合同勉強会の発表資料です。

Published in: Technology
  • Sex in your area is here: ❶❶❶ http://bit.ly/2Q98JRS ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2Q98JRS ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Amebaサービスのログ解析基盤

  1. 1. AmebaNeutralTechnologyGroup
  2. 2. • ( )• 27• NeutralTechnologyGroup( 4 )•• twitter @brfrn169 hatena brfrn169
  3. 3. • Hadoop/Hive Patriot• Flume + HBase Stinger
  4. 4. Hadoop/Hive Patirot
  5. 5. Patriot[2009 ]11 → → GO[2010 ]3711 WebUI
  6. 6. •• -• - -
  7. 7. Patriot• Ameba••
  8. 8. • - HDFS• - Hive(Map/Reduce)• - Patriot WebUI• - Hue
  9. 9. Patirot•••
  10. 10. Hadoop• HDFS - ( 64M) -• MapReuce - - - map (= ) reduce (= )
  11. 11. Hadoop ( ) Hadoop HDFS HDFS HDFS API DataNode HDFS map/reduceSecondary TaskTrackerNameNode HDFS map/reduce HDFS NameNode HDFS JobTracker DataNode HDFS map/reduce TaskTracker MapReduce map/reduce HDFS JobClient
  12. 12. Hive• Hadoop• Facebook• HiveQL SQL MapReduce• Pig( )
  13. 13. Hive• ‣ HiveQL - SQL MapReduce ‣ - - Derby - Patriot MySQL ‣ - Partition
  14. 14. Hive• pigg_login.log 2011-05-13 00:12:34 yamada_taro 2011-05-13 02:23:45 suzuki_ichiro 2011-05-13 03:34:56 brfrn169 2011-05-13 04:56:34 yamada_taro 2011-05-13 05:23:45 suzuki_ichiro 2011-05-13 06:45:56 yamada_taro 2011-05-13 07:56:23 yamada_hanako 2011-05-13 08:45:56 yamada_taro 2011-05-13 09:12:34 yamada_hanako
  15. 15. Hive• DDL CREATE TABLE pigg_login ( time STRING, ameba_id STRING) PARTITIONED BY (dt STRING), ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’, STORED AS TEXTFILE;• LOAD DATA INPATH ‘/path/pigg_login.log’ INTO TABLE pigg_login PARTITION (dt=‘2011-05-13’);
  16. 16. Hive• HiveQL ‣ UU( ) - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt=‘2011-05-13’; - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE ‘2011-05-__’; ‣ UU (JOIN, GROUP BY) - SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
  17. 17. Patriot• - 4 - - -
  18. 18. Patriot• Hive Hive Job Hadoop View DB MySQL
  19. 19. Patriot • namenode 2CoreCPU 16GB RAMsecondary namenode jobtracker 2CoreCPU 4CoreCPU 16GB RAM 24GB RAM datanode/jobtracker × 18 4CoreCPU 16GB RAM 1TB HDD × 4
  20. 20. Patriot• - CDH3u0(Hadoop0.20, Hive0.7, Hue1.2.0) - Puppet - Nagios, Ganglia - ExtJS3.2.1 - Hinemos 3.2
  21. 21. Patriot• DB SCP Hadoop • • gzip,SeqenceFile HDFS • Hive
  22. 22. Patriot• DSL (1) import { service "gyaos" backup_dir "/data/log/gyaos" data { type "scp" ← mysql hdfs servers ["172.xxx.yyy.zzz", " 172.xxx.yyy.zzz "] user "cy_httpd" path "/home/cy_httpd/logs/tomcat/lifelog/*.#{$dt}*" limit 10000 }
  23. 23. Patriot• DSL (2) load { type "hive" ← mysql table { name "game_login" regexp "^[^t]*t([^t]*)tlogin" output "$1" partition :dt => "#{$dt}", :service => "gyaos" } table { name "game_user" regexp "^([^t]*)t([^t]*)tregist_game" output "$2t$1" partition :dt => "#{$dt}", :service => "gyaos" }}}
  24. 24. Patriot• Hadoop Hive Job Batch DB MySQL
  25. 25. Patriot • DSLmysql { host "localhost" port 3306 username "patriot-batch" password "xxx database "gyaos"}analyze { name "gyaos_new_user_num_daily" primary "dt" hive_ql "select count(1), #{$dt} from game_user where dt=#{$dt} and service=gyaos"}analyze { name "gyaos_unregist_user_num_daily" primary "dt" hive_ql "select count(1), #{$dt} from game_user g join ameba_member a on (g.ameba_id =a.ameba_id) where a.unregist_date <> and to_date(a.unregist_date)=#{$dt} andg.service=gyaos"}
  26. 26. Patriot• ‣ HiveQL - - ‣ 20GB ‣
  27. 27. Patriot• WebUI( )
  28. 28. Patriot• WebUI( )
  29. 29. Patriot• WebUI( )
  30. 30. Patriot• Hue ‣ HiveQL WebUI ‣
  31. 31. Patriot• Hue
  32. 32. Patriot•• - Flume• DSL•• -
  33. 33. Flume + HBase Stinger
  34. 34. Stinger••• Flume + HBase
  35. 35. Flume• ‣ ‣ Cloudera ‣ Flume Agent ‣ Flume Collector ‣ Flume Master
  36. 36. Flume
  37. 37. HBase• ‣ Google BigTable ‣ HDFS ‣ HDFS /
  38. 38. HBase• ‣ Row Key( ) ‣ - - - ‣ -
  39. 39. HBase•
  40. 40. HBase•
  41. 41. HBase• ‣ ‣ - ‣ - /
  42. 42. Stinger• log flume master flume agent flume collector increment push polling websocket node + soket.io HBase
  43. 43. Stinger• HBase ‣ Row Key - md5( ID + )+ ID + Column Family : hourly Column Family : daily 12 am 12 am 12 am 12 am total login male 20’s total login male 20’s 100 35 10 12 100 35 10 12
  44. 44. Stinger• ‣
  45. 45. Stinger••
  46. 46. • Patriot • • Hadoop/Hive• Stinger • • Flume + HBase
  47. 47.

×