Data Analytics withHadoop/Hive onMultiple Data Centers.               Hirotaka Niisato               GMO Internet, Inc.
about myself●    Hirotaka Niisato(@hirotakaster)●    Programmer●    GMO Internet, SIProp Project●    Work    Robotics Kine...
Data Analytics System●    KPI reporting system for Cloud System●    GMO Apps Cloud●    Over 500 Titles    mobage, gree, mi...
Analytics Specification●    Social Game Data KPI    DAU/PV, Play Time, Sales    A/B Testing, Conversion … etc●    Hourly, ...
System Architecture  SNS                                                         Game  User                            SNS...
Specification, Statistics●    Multiple NameNode per Data Center●    Hardware Spacification    CPU : 8~16CPU(HT)    MEM: 12...
Data Flowload data local inpath hogehoge-access_log.*.log.gzoverwrite into table original_logspartition (log_date=2012-07-...
Conversion Count HQLINSERT OVERWRITE TABLE conversion_click PARTITION (log_date= :logDate, log_number=:logNumber)   SELECT...
Monitoring/Management(Zabbix)
Memory Management●    Namenode Memory    File, Block, Directory●    Hadoop Archive●    Server Memory
Trouble●    Re-Analytics●    Backup and Recovery●    NameNode HA●    Hive vs MapReduce
Thank you
Upcoming SlideShare
Loading in …5
×

Data analytics with hadoop hive on multiple data centers

4,615 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,615
On SlideShare
0
From Embeds
0
Number of Embeds
2,875
Actions
Shares
0
Downloads
1
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Data analytics with hadoop hive on multiple data centers

  1. 1. Data Analytics withHadoop/Hive onMultiple Data Centers. Hirotaka Niisato GMO Internet, Inc.
  2. 2. about myself● Hirotaka Niisato(@hirotakaster)● Programmer● GMO Internet, SIProp Project● Work Robotics Kinect Android Networking MAKE: Solr Volunteer ...
  3. 3. Data Analytics System● KPI reporting system for Cloud System● GMO Apps Cloud● Over 500 Titles mobage, gree, mixi, Hangame, facebook, nikoniko … etc● Data Center Japan, US(west coast)
  4. 4. Analytics Specification● Social Game Data KPI DAU/PV, Play Time, Sales A/B Testing, Conversion … etc● Hourly, Daily, Weekly, Monthly● Since 2010/06 ~
  5. 5. System Architecture SNS Game User SNS Platform MasterCloud System Management Monitoring System System Cloud Server (Game Server) Logging Scheduler ・・・・・・・・ Server MySQL Hadoop/Hive (for Hive) Data Center A Data Center N
  6. 6. Specification, Statistics● Multiple NameNode per Data Center● Hardware Spacification CPU : 8~16CPU(HT) MEM: 12~64Gbyte HD : RAID 1, 5, 1+0● Statistics 6,000,000 blocks/44,000 jobs/day 1,000 over AP servers logging
  7. 7. Data Flowload data local inpath hogehoge-access_log.*.log.gzoverwrite into table original_logspartition (log_date=2012-07-26, log_number=13);host string from deserializeridentity string from deserializeruser string from deserializer Cloud Servertime string from deserializer (Game Server)method string from deserializerrequest string from deserializerstatus string from deserializer Loggingsize string from deserializer Management Server Systemreferer string from deserializeragent string from deserializerlog_date stringlog_number tinyint Hadoop/Hive Schedulerhost stringtime stringmethod string HiveDriverrequest stringuserid stringlog_date string Filter → Hourly, Daily, Weekly, Monthly Reportlog_number tinyint (AB Testing, Conversion, DAU..etc)
  8. 8. Conversion Count HQLINSERT OVERWRITE TABLE conversion_click PARTITION (log_date= :logDate, log_number=:logNumber) SELECT regexp_extract(request, convid=([a-zA-Z0-9%]), 1), regexp_extract(request, convflg=(A|B){1}, 1), count(1), :logMonth, :logWeek FROM parsed_log WHERE request RLIKE convid=[a-zA-Z0-9%] AND request RLIKE convflg=(A|B){1} AND log_date = :logDate AND log_number = :logNumber GROUP BY regexp_extract(request, convid=([a-zA-Z0-9%]), 1), regexp_extract(request, convflg=(A|B){1}, 1)
  9. 9. Monitoring/Management(Zabbix)
  10. 10. Memory Management● Namenode Memory File, Block, Directory● Hadoop Archive● Server Memory
  11. 11. Trouble● Re-Analytics● Backup and Recovery● NameNode HA● Hive vs MapReduce
  12. 12. Thank you

×