SlideShare a Scribd company logo
1 of 47
Download to read offline
Ameba




NeutralTechnologyGroup
•               (                )

•   27

•
    NeutralTechnologyGroup(              4       )



•

•     twitter       @brfrn169   hatena       brfrn169
•   Hadoop/Hive     Patriot



•   Flume + HBase
    Stinger
Hadoop/Hive
              Patirot
Patriot
[2009   ]
11
        →

                →   GO
[2010   ]
3
7
11      WebUI
•
•
    -
•
    -
    -
Patriot
•   Ameba



•
•
•
    -   HDFS



•
    -   Hive(Map/Reduce)



•
    -   Patriot WebUI



•
    -   Hue
Patirot
•
•

•
Hadoop
•   HDFS
    -                           (            64M)

    -


•   MapReuce
    -
    -
    -   map   (=   )   reduce       (=   )
Hadoop             (                           )
                             Hadoop


   HDFS                                                HDFS
                 HDFS API             DataNode

                                                                 HDFS
                                                    map/reduce
Secondary                             TaskTracker
NameNode                                                         HDFS
                                                    map/reduce
  HDFS
                NameNode

                                                       HDFS
                JobTracker            DataNode

                                                                 HDFS
                                                    map/reduce
                                      TaskTracker
 MapReduce                                          map/reduce   HDFS

                 JobClient
Hive
•   Hadoop

•   Facebook



•   HiveQL       SQL        MapReduce

•                Pig(   )
Hive
•
    ‣   HiveQL
        -  SQL                   MapReduce



    ‣
        -
        -                Derby
        -   Patriot     MySQL



    ‣
        -   Partition
Hive
•              pigg_login.log


       2011-05-13 00:12:34	

   yamada_taro
       2011-05-13 02:23:45	

   suzuki_ichiro
       2011-05-13 03:34:56	

   brfrn169
       2011-05-13 04:56:34	

   yamada_taro
       2011-05-13 05:23:45	

   suzuki_ichiro
       2011-05-13 06:45:56	

   yamada_taro
       2011-05-13 07:56:23	

   yamada_hanako
       2011-05-13 08:45:56	

   yamada_taro
       2011-05-13 09:12:34	

   yamada_hanako
Hive
•   DDL
      CREATE TABLE pigg_login (
      time STRING,
      ameba_id STRING)
      PARTITIONED BY (dt STRING),
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’,
      STORED AS TEXTFILE;




•
      LOAD DATA INPATH ‘/path/pigg_login.log’
      INTO TABLE pigg_login
      PARTITION (dt=‘2011-05-13’);
Hive
•   HiveQL

    ‣                   UU(                         )
        -   SELECT count(distinct ameba_id) FROM pigg_login WHERE
            dt=‘2011-05-13’;

        -   SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE
            ‘2011-05-__’;


    ‣                             UU (JOIN, GROUP BY)
        -   SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p
            on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
Patriot

•
    -     4



    -

    -
    -
Patriot

•

                           Hive
                              Hive Job   Hadoop




    View
                     DB   MySQL
Patriot

    •                          namenode
                               2CoreCPU
                               16GB RAM



secondary namenode                            jobtracker
    2CoreCPU                                  4CoreCPU
    16GB RAM                                  24GB RAM




                               datanode/jobtracker × 18
                                     4CoreCPU
                                     16GB RAM
                                    1TB HDD × 4
Patriot

•
    -   CDH3u0(Hadoop0.20, Hive0.7, Hue1.2.0)

    -   Puppet

    -   Nagios, Ganglia


    -   ExtJS3.2.1

    -   Hinemos 3.2
Patriot

•
    DB


         SCP

                                    Hadoop



          •
          • gzip,SeqenceFile HDFS
          • Hive
Patriot

•                             DSL (1)

    import {
     service "gyaos"
     backup_dir "/data/log/gyaos"
     data {
      type "scp" ←            mysql           hdfs
         servers ["172.xxx.yyy.zzz", " 172.xxx.yyy.zzz "]
         user "cy_httpd"
         path "/home/cy_httpd/logs/tomcat/lifelog/*.#{$dt}*"
         limit 10000
     }
Patriot

•                          DSL (2)
    load {
      type "hive" ←     mysql
     table {
       name "game_login"
       regexp "^[^t]*t([^t]*)tlogin"
       output "$1"
       partition :dt => "#{$dt}", :service => "gyaos"
     }
     table {
       name "game_user"
       regexp "^([^t]*)t([^t]*)tregist_game"
       output "$2t$1"
       partition :dt => "#{$dt}", :service => "gyaos"
     }}}
Patriot

•                       Hadoop




             Hive Job
    Batch




                            DB
                         MySQL
Patriot

  •                                     DSL
mysql {
  host "localhost"
  port 3306
  username "patriot-batch"
  password "xxx
  database "gyaos"
}
analyze {
  name "gyaos_new_user_num_daily"
  primary "dt"
  hive_ql "select count(1), '#{$dt}' from game_user where dt='#{$dt}' and service='gyaos'"
}
analyze {
  name "gyaos_unregist_user_num_daily"
  primary "dt"
  hive_ql "select count(1), '#{$dt}' from game_user g join ameba_member a on (g.ameba_id =
a.ameba_id) where a.unregist_date <> '' and to_date(a.unregist_date)='#{$dt}' and
g.service='gyaos'"
}
Patriot

•
    ‣ HiveQL
      -
      -
    ‣               20GB

    ‣
Patriot

• WebUI(         )
Patriot

• WebUI(         )
Patriot

• WebUI(         )
Patriot

• Hue
 ‣ HiveQL    WebUI

 ‣
Patriot

• Hue
Patriot
•
•
    -   Flume

•   DSL

•
•
    -
Flume + HBase


      Stinger
Stinger

•
•

•   Flume + HBase
Flume

•
    ‣
    ‣ Cloudera

    ‣ Flume Agent
    ‣ Flume Collector
    ‣ Flume Master
Flume
HBase

•
    ‣   Google BigTable

    ‣   HDFS

    ‣   HDFS              /
HBase
•
    ‣        Row Key(   )

    ‣
        -
        -
        -
    ‣
        -
HBase
•
HBase

•
HBase
•
    ‣
    ‣
        -
    ‣
        -           /
Stinger

•
                                                 log



    flume master                     flume agent                flume collector

                                                       increment


       push                   polling

    websocket

                node + soket.io
                                         HBase
Stinger

• HBase
 ‣ Row Key
  - md5(                        ID +         )+                 ID +


  Column Family : hourly               Column Family : daily
  12 am 12 am 12 am 12 am              total   login   male    20’s
  total login male 20’s

  100    35     10         12          100     35      10      12
Stinger
•
    ‣
Stinger
•
•
•   Patriot

    •
    •   Hadoop/Hive


•   Stinger

    •
    •   Flume + HBase
•

More Related Content

What's hot

End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
Jeremy Hanna
 

What's hot (20)

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Apache drill
Apache drillApache drill
Apache drill
 
מיכאל
מיכאלמיכאל
מיכאל
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 

Viewers also liked

Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理
maruyama097
 
20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門
Toshiyuki Manabe
 
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnightYahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo!デベロッパーネットワーク
 

Viewers also liked (18)

Log解析の基礎@phpcon2014
Log解析の基礎@phpcon2014Log解析の基礎@phpcon2014
Log解析の基礎@phpcon2014
 
Hadoopデータプラットフォーム #cwt2013
Hadoopデータプラットフォーム #cwt2013Hadoopデータプラットフォーム #cwt2013
Hadoopデータプラットフォーム #cwt2013
 
Hadoop入門
Hadoop入門Hadoop入門
Hadoop入門
 
Kuduを調べてみた #dogenzakalt
Kuduを調べてみた #dogenzakaltKuduを調べてみた #dogenzakalt
Kuduを調べてみた #dogenzakalt
 
Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理Facebookのリアルタイム Big Data 処理
Facebookのリアルタイム Big Data 処理
 
Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~
 
Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09
Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09
Spark Streaming の基本とスケールする時系列データ処理 - Spark Meetup December 2015/12/09
 
Hadoopを用いた大規模ログ解析
Hadoopを用いた大規模ログ解析Hadoopを用いた大規模ログ解析
Hadoopを用いた大規模ログ解析
 
20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門20150608 初心者によるazure machinelearning入門
20150608 初心者によるazure machinelearning入門
 
Flumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システムFlumeを活用したAmebaにおける大規模ログ収集システム
Flumeを活用したAmebaにおける大規模ログ収集システム
 
Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例Amebaにおけるログ解析基盤Patriotの活用事例
Amebaにおけるログ解析基盤Patriotの活用事例
 
HBaseを用いたグラフDB「Hornet」の設計と運用
HBaseを用いたグラフDB「Hornet」の設計と運用HBaseを用いたグラフDB「Hornet」の設計と運用
HBaseを用いたグラフDB「Hornet」の設計と運用
 
変わる!? リクルートグループのデータ解析基盤
変わる!? リクルートグループのデータ解析基盤変わる!? リクルートグループのデータ解析基盤
変わる!? リクルートグループのデータ解析基盤
 
Yahoo! JAPANを支えるビッグデータプラットフォーム技術
Yahoo! JAPANを支えるビッグデータプラットフォーム技術Yahoo! JAPANを支えるビッグデータプラットフォーム技術
Yahoo! JAPANを支えるビッグデータプラットフォーム技術
 
Yahoo! JAPANのデータ基盤とHadoop #dbts2016
Yahoo! JAPANのデータ基盤とHadoop #dbts2016Yahoo! JAPANのデータ基盤とHadoop #dbts2016
Yahoo! JAPANのデータ基盤とHadoop #dbts2016
 
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnightYahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
Yahoo! JAPANが持つデータ分析ソリューションの紹介 #yjdsnight
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話TLS 1.3 と 0-RTT のこわ〜い話
TLS 1.3 と 0-RTT のこわ〜い話
 

Similar to Amebaサービスのログ解析基盤

800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
Tatsuya Sasaki
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
S S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
 
COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
Tatsuya Sasaki
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 

Similar to Amebaサービスのログ解析基盤 (20)

Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Meet Hadoop Family: part 4
Meet Hadoop Family: part 4Meet Hadoop Family: part 4
Meet Hadoop Family: part 4
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 

More from Toshihiro Suzuki (8)

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
第25回 Hadoopソースコードリーディング 「HBase 最新情報」
第25回 Hadoopソースコードリーディング 「HBase 最新情報」第25回 Hadoopソースコードリーディング 「HBase 最新情報」
第25回 Hadoopソースコードリーディング 「HBase 最新情報」
 
HDP ハンズオンセミナー
HDP ハンズオンセミナーHDP ハンズオンセミナー
HDP ハンズオンセミナー
 
HBase at Ameba
HBase at AmebaHBase at Ameba
HBase at Ameba
 
HBaseを用いたグラフDB「Hornet」
HBaseを用いたグラフDB「Hornet」HBaseを用いたグラフDB「Hornet」
HBaseを用いたグラフDB「Hornet」
 
HBaseでグラフ構造を扱う(開発中)
HBaseでグラフ構造を扱う(開発中)HBaseでグラフ構造を扱う(開発中)
HBaseでグラフ構造を扱う(開発中)
 
MySQLによってタフになる会12章
MySQLによってタフになる会12章MySQLによってタフになる会12章
MySQLによってタフになる会12章
 
第2回 Hadoop 輪読会
第2回 Hadoop 輪読会第2回 Hadoop 輪読会
第2回 Hadoop 輪読会
 

Recently uploaded

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Amebaサービスのログ解析基盤

  • 2. ( ) • 27 • NeutralTechnologyGroup( 4 ) • • twitter @brfrn169 hatena brfrn169
  • 3. Hadoop/Hive Patriot • Flume + HBase Stinger
  • 4. Hadoop/Hive Patirot
  • 5. Patriot [2009 ] 11 → → GO [2010 ] 3 7 11 WebUI
  • 6. • • - • - -
  • 7. Patriot • Ameba • •
  • 8. - HDFS • - Hive(Map/Reduce) • - Patriot WebUI • - Hue
  • 10. Hadoop • HDFS - ( 64M) - • MapReuce - - - map (= ) reduce (= )
  • 11. Hadoop ( ) Hadoop HDFS HDFS HDFS API DataNode HDFS map/reduce Secondary TaskTracker NameNode HDFS map/reduce HDFS NameNode HDFS JobTracker DataNode HDFS map/reduce TaskTracker MapReduce map/reduce HDFS JobClient
  • 12. Hive • Hadoop • Facebook • HiveQL SQL MapReduce • Pig( )
  • 13. Hive • ‣ HiveQL - SQL MapReduce ‣ - - Derby - Patriot MySQL ‣ - Partition
  • 14. Hive • pigg_login.log 2011-05-13 00:12:34 yamada_taro 2011-05-13 02:23:45 suzuki_ichiro 2011-05-13 03:34:56 brfrn169 2011-05-13 04:56:34 yamada_taro 2011-05-13 05:23:45 suzuki_ichiro 2011-05-13 06:45:56 yamada_taro 2011-05-13 07:56:23 yamada_hanako 2011-05-13 08:45:56 yamada_taro 2011-05-13 09:12:34 yamada_hanako
  • 15. Hive • DDL CREATE TABLE pigg_login ( time STRING, ameba_id STRING) PARTITIONED BY (dt STRING), ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘t’, STORED AS TEXTFILE; • LOAD DATA INPATH ‘/path/pigg_login.log’ INTO TABLE pigg_login PARTITION (dt=‘2011-05-13’);
  • 16. Hive • HiveQL ‣ UU( ) - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt=‘2011-05-13’; - SELECT count(distinct ameba_id) FROM pigg_login WHERE dt LIKE ‘2011-05-__’; ‣ UU (JOIN, GROUP BY) - SELECT p.age, count(distinct l.ameba_id) FROM pigg_login l JOIN profile p on (l.ameba_id=p.ameba_id) WHERE l.dt= ‘2011-05-13‘ GROUP BY p.age;
  • 17. Patriot • - 4 - - -
  • 18. Patriot • Hive Hive Job Hadoop View DB MySQL
  • 19. Patriot • namenode 2CoreCPU 16GB RAM secondary namenode jobtracker 2CoreCPU 4CoreCPU 16GB RAM 24GB RAM datanode/jobtracker × 18 4CoreCPU 16GB RAM 1TB HDD × 4
  • 20. Patriot • - CDH3u0(Hadoop0.20, Hive0.7, Hue1.2.0) - Puppet - Nagios, Ganglia - ExtJS3.2.1 - Hinemos 3.2
  • 21. Patriot • DB SCP Hadoop • • gzip,SeqenceFile HDFS • Hive
  • 22. Patriot • DSL (1) import { service "gyaos" backup_dir "/data/log/gyaos" data { type "scp" ← mysql hdfs servers ["172.xxx.yyy.zzz", " 172.xxx.yyy.zzz "] user "cy_httpd" path "/home/cy_httpd/logs/tomcat/lifelog/*.#{$dt}*" limit 10000 }
  • 23. Patriot • DSL (2) load { type "hive" ← mysql table { name "game_login" regexp "^[^t]*t([^t]*)tlogin" output "$1" partition :dt => "#{$dt}", :service => "gyaos" } table { name "game_user" regexp "^([^t]*)t([^t]*)tregist_game" output "$2t$1" partition :dt => "#{$dt}", :service => "gyaos" }}}
  • 24. Patriot • Hadoop Hive Job Batch DB MySQL
  • 25. Patriot • DSL mysql { host "localhost" port 3306 username "patriot-batch" password "xxx database "gyaos" } analyze { name "gyaos_new_user_num_daily" primary "dt" hive_ql "select count(1), '#{$dt}' from game_user where dt='#{$dt}' and service='gyaos'" } analyze { name "gyaos_unregist_user_num_daily" primary "dt" hive_ql "select count(1), '#{$dt}' from game_user g join ameba_member a on (g.ameba_id = a.ameba_id) where a.unregist_date <> '' and to_date(a.unregist_date)='#{$dt}' and g.service='gyaos'" }
  • 26. Patriot • ‣ HiveQL - - ‣ 20GB ‣
  • 30. Patriot • Hue ‣ HiveQL WebUI ‣
  • 32. Patriot • • - Flume • DSL • • -
  • 33. Flume + HBase Stinger
  • 34. Stinger • • • Flume + HBase
  • 35. Flume • ‣ ‣ Cloudera ‣ Flume Agent ‣ Flume Collector ‣ Flume Master
  • 36. Flume
  • 37. HBase • ‣ Google BigTable ‣ HDFS ‣ HDFS /
  • 38. HBase • ‣ Row Key( ) ‣ - - - ‣ -
  • 41. HBase • ‣ ‣ - ‣ - /
  • 42. Stinger • log flume master flume agent flume collector increment push polling websocket node + soket.io HBase
  • 43. Stinger • HBase ‣ Row Key - md5( ID + )+ ID + Column Family : hourly Column Family : daily 12 am 12 am 12 am 12 am total login male 20’s total login male 20’s 100 35 10 12 100 35 10 12
  • 44. Stinger •
  • 46. Patriot • • Hadoop/Hive • Stinger • • Flume + HBase
  • 47.