This document summarizes the Ameba and Neutral Technology Group's Patriot and Stinger big data platforms. Patriot is based on Hadoop and Hive and allows analyzing log data through a web UI and Hue. It loads data from MySQL into HDFS and analyzes it with HiveQL. Stinger collects log data with Flume agents and stores it incrementally in HBase for real-time analytics of hourly and daily metrics through a node.js web application. Both platforms aim to enable analyzing large-scale log data.
15. Hive
• DDL
CREATE TABLE pigg_login (
time STRING,
ameba_id STRING)
PARTITIONED BY (dt STRING),
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’,
STORED AS TEXTFILE;
•
LOAD DATA INPATH ‘/path/pigg_login.log’
INTO TABLE pigg_login
PARTITION (dt=‘2011-05-13’);
16. Hive
• HiveQL
‣ UU( )
- SELECT count(distinct ameba_id) FROM pigg_login WHERE
dt=‘2011-05-13’;
- SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE
‘2011-05-__’;
‣ UU (JOIN, GROUP BY)
- SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p
on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
25. Patriot
• DSL
mysql {
host "localhost"
port 3306
username "patriot-batch"
password "xxx
database "gyaos"
}
analyze {
name "gyaos_new_user_num_daily"
primary "dt"
hive_ql "select count(1), '#{$dt}' from game_user where dt='#{$dt}' and service='gyaos'"
}
analyze {
name "gyaos_unregist_user_num_daily"
primary "dt"
hive_ql "select count(1), '#{$dt}' from game_user g join ameba_member a on (g.ameba_id =
a.ameba_id) where a.unregist_date <> '' and to_date(a.unregist_date)='#{$dt}' and
g.service='gyaos'"
}
43. Stinger
• HBase
‣ Row Key
- md5( ID + )+ ID +
Column Family : hourly Column Family : daily
12 am 12 am 12 am 12 am total login male 20’s
total login male 20’s
100 35 10 12 100 35 10 12