SlideShare a Scribd company logo
1 of 40
Download to read offline
R 2. MAPREDUCE BASICS

               !       "       #       $       %         &




      '())*+
        ))              '())*+
                          ))                 '())*+
                                               ))                 '())*+
                                                                    ))


     ( -   , .         / 0     / 1         ( 2     / .           , 3     / 4

     /5',67*+           /5',67*+             /5',67*+             /5',67*+


     ( -   , .               / 8           ( 2     / .           , 3     / 4

     )
     )(+969657*+       )
                       )(+969657*+         )
                                           )(+969657*+           )
                                                                 )(+969657*+

              :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED

                   (   - 2            ,    . 3               /   . 8 4



             +*@</*+               +*@</*+            +*@</*+


                G 2                  H 3                 I 8
R 2. MAPREDUCE BASICS

               !       "       #       $       %         &




      '())*+
        ))              '())*+
                          ))                 '())*+
                                               ))                 '())*+
                                                                    ))


     ( -   , .         / 0     / 1         ( 2     / .           , 3     / 4

     /5',67*+           /5',67*+             /5',67*+             /5',67*+


     ( -   , .               / 8           ( 2     / .           , 3     / 4

     )
     )(+969657*+       )
                       )(+969657*+         )
                                           )(+969657*+           )
                                                                 )(+969657*+

              :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED

                   (   - 2            ,    . 3               /   . 8 4



             +*@</*+               +*@</*+            +*@</*+


                G 2                  H 3                 I 8
11



         Input Splits




 Map                    ...     <K, V>



Shuffle                        <K, list(V)>



Reduce                  ...    <list(V)>




         Output Files
Data Processing Framework                    Distributed File System                Log Servers




                            Hadoop MapReduce
Framework Users                                                    HDFS

                  Query




                                                                                                                5
                          Result


                                               Data Processing Framework (Continuous MapReduce)
    Framework Users
Figure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3]
is used as an example.
                     Query
                                                                                     Cloud Servers
                                                                                       with Logs
                          Results
frameworks in a traditional store-first-query-later model [17]. Companies migrate log
data from the source nodes to an append-only distributed file system such as GFS [18] or
HDFS [3]. The distributed file system replicates the log data for availability and fault-
       HDFS

tolerance. Once the data is placed in the file system, users can execute queries using
bulk-processing frameworks and retrieve results from the distributed file system. Figure
1.1 illustrates this model.
  Distributed File System
each input record, and the reduc
                                                                        of values, v[], that share the same
                                                                        for queries that are either highly
                                                                        duce functions that are distribu
                                                                        gates [14]. Thus we expect that u
                                                                        MapReduce combiner, allowing
                                                                        to merge values of a single key to
                                                                        and distribute processing overhe
                                                                           !"#$%"
                                                                        biner allows iMR to process win
                                                                            !"#$"
                                                                        further reduce data volumes throu
                                                                           %#&'()*
                                              !"#$$%&!'()*+,-./01
                                                                        tion. The only non-standard (but
Figure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-."
                                               avoids                   MapReduce jobs may impleme
                                                                                    )*+,-."

cost and latency of the store-first-query-later design by %#&'()* describe in Section 2.3.2.
                                                                        we          %#&'()*
                                                                   &'(                 &'(
moving processing onto the data sources.                                   However, the primary way in w
                                                             +&,-+.#",&#/0          +&,-+.#",&#/0
                                                )*+,-."         )*+,-." that they emit a stream of results
                                                                                   )*+,-."         )*+,-."

                                                %#&'()*
speed of social network updates or accuracy of ad target- &'(
                                                                %#&'()* uous input, e.g., server log files
                                                                                   %#&'()*         %#&'()*
                                                   &'(                                &'(             &'(
ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0      cessors [7], iMR bounds comp
                                               +&,-+.#",&#/0                      +&,-+.#",&#/0   +&,-+.#",&#/0

on previous work in stream processing [5, 7, 9] to sup-                 haps infinite) data streams by pr
Map Reduce and Stream Processing
",-./#"0-1.2 !               !'( !)%   !*+   !()
  E7F/!.:7!2# "#$%&
                             3014!5
 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A
             %      &      &      &      B      B
   10,!#         %&     '(     '(     )%     *+     ()

                      6!7819:7-;,<./,</10<.

   10<.# C+$ =>%?@//A =>&?@//A    C%$ =>&?@//A =>B?@//A
                  CD       CD              CD       CD
          +/3,<               )+/3,<               %&+/3,<
!"#$%"                                       ",-.
                                                                                 E7F
                                  !"#$"
                                  %#&'()*
!"#$$%&!'()*+,-./01                                                              >.

)01201*%$$,      )*+,-."                    )*+,-."

                  %#&'()*                   %#&'()*
                     &'(                       &'(
                +&,-+.#",&#/0               +&,-+.#",&#/0
 )*+,-."          )*+,-."                  )*+,-."          )*+,-."

 %#&'()*          %#&'()*                   %#&'()*         %#&'()*
    &'(               &'(                    &'(               &'(          Figure
+&,-+.#",&#/0     +&,-+.#",&#/0           +&,-+.#",&#/0     +&,-+.#",&#/0   sub-wi
                                                                            have a
3: iMR nodes process local log files to produce
 dows or panes. The system assumes log records
ogical timestamp and arrive in order.

     !#5 !# & !$ 67 !#5 84 9 !4 & !$

       " %        " %   " %
      !4&!4      !#&!# !$&!$


          '(()*("+*,-".*,-")+/"0,1"02*3
  :;/0<     "    "           %     %      :;/0<
   '       !#   !$          !#    !$       =



: iMR aggregates individual panes Pi in the net-
o produce a result, the root may either combine
# Call at each hit record
map(k1, hitRecord) {
    timestamp = hitRecord.time
    # look up paneId from timestamp
    paneId = lookupPane(timestamp)
    if (paneId.endFlag == True) {
        # Notify whole data of the pane is sent
        notify(paneId)
    }
    emitIntermediate(paneId, 1, timestamp)
}

                                      Map Reduce and Stream Processing
combine(paneId, countList) {
    hitCount = 0
    for count in countList {
        hitCount += count
    }
    # Send the message to the downstream node
    emitIntermediate(paneId, hitCount)
}                                        Map Reduce and Stream Processing
# if node == root of aggregation tree
reduce(paneId ,countList) {
    hitCount = 0
    for count in countList {
        hitCount += count
    }
    sv = SlideValue.new(paneId)
    sv.hitCount = hitCount
    return sv
}                                       Map Reduce and Stream Processing
# Window   slide
init(slide) {
    rangeValue = RangeValue.new
    rangeValue.hitCount = 0
    return rangeValue
}
# Reduce
merge(rangeValue, slideValue) {
    rangeValue.hitCount += slideValue.hitCount
}
#     slide   window
unmerge(rangeValue, slideValue) {
    rangeValue.hitCount -= slideValue.hitCount
}                                 Map Reduce and Stream Processing
K-Means Clustering in Map Reduce
Figure 2: MapReduce Classifier Training and Evaluation Procedure




                                A Comparison of Approaches for Large-Scale Data Mining
Google Pregel Graph Processing
Google Pregel Graph Processing
Map Reduce ~Continuous Map Reduce Design~

More Related Content

Viewers also liked

Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Takahiro Inoue
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDBTakahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Takahiro Inoue
 
MapReduceプログラミング入門
MapReduceプログラミング入門MapReduceプログラミング入門
MapReduceプログラミング入門Satoshi Noto
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4jTakahiro Inoue
 
Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントCloudera Japan
 

Viewers also liked (14)

Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDB
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)
 
MapReduceプログラミング入門
MapReduceプログラミング入門MapReduceプログラミング入門
MapReduceプログラミング入門
 
MapReduce解説
MapReduce解説MapReduce解説
MapReduce解説
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4j
 
Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 

Similar to Map Reduce ~Continuous Map Reduce Design~

Introducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business ProcessesIntroducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business ProcessesSébastien Mosser
 
Empowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public DisplayEmpowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public DisplayTetsuo Yamabe
 
OSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle ManagerOSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle ManagerSkills Matter
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018Prasun Anand
 
Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500Cisco Russia
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеYandex
 
Django101 geodjango
Django101 geodjangoDjango101 geodjango
Django101 geodjangoCalvin Cheng
 
Moosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedMoosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedHeinrich Seeger
 
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...Antonio Marcos Alberti
 
Mapredtutorial
MapredtutorialMapredtutorial
MapredtutorialAnup Mohta
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
Gmaps Railscamp2008
Gmaps Railscamp2008Gmaps Railscamp2008
Gmaps Railscamp2008xilinus
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphAndrew Yongjoon Kong
 
Saving Gaia with GeoDjango
Saving Gaia with GeoDjangoSaving Gaia with GeoDjango
Saving Gaia with GeoDjangoCalvin Cheng
 
Celery - A Distributed Task Queue
Celery - A Distributed Task QueueCelery - A Distributed Task Queue
Celery - A Distributed Task QueueDuy Do
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
MapfilterreducepresentationManjuKumara GH
 
Application security
Application securityApplication security
Application securitykrusty43
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 

Similar to Map Reduce ~Continuous Map Reduce Design~ (20)

Introducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business ProcessesIntroducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business Processes
 
Empowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public DisplayEmpowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public Display
 
OSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle ManagerOSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle Manager
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018
 
Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Django101 geodjango
Django101 geodjangoDjango101 geodjango
Django101 geodjango
 
Moosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedMoosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimized
 
Hadoop
HadoopHadoop
Hadoop
 
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
 
Mapredtutorial
MapredtutorialMapredtutorial
Mapredtutorial
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
Gmaps Railscamp2008
Gmaps Railscamp2008Gmaps Railscamp2008
Gmaps Railscamp2008
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraph
 
Saving Gaia with GeoDjango
Saving Gaia with GeoDjangoSaving Gaia with GeoDjango
Saving Gaia with GeoDjango
 
Celery - A Distributed Task Queue
Celery - A Distributed Task QueueCelery - A Distributed Task Queue
Celery - A Distributed Task Queue
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
Mapfilterreducepresentation
 
Application security
Application securityApplication security
Application security
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 

More from Takahiro Inoue

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2Takahiro Inoue
 

More from Takahiro Inoue (20)

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB Oplog入門
MongoDB Oplog入門MongoDB Oplog入門
MongoDB Oplog入門
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Map Reduce ~Continuous Map Reduce Design~

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. R 2. MAPREDUCE BASICS ! " # $ % & '())*+ )) '())*+ )) '())*+ )) '())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5',67*+ /5',67*+ /5',67*+ /5',67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  • 7. R 2. MAPREDUCE BASICS ! " # $ % & '())*+ )) '())*+ )) '())*+ )) '())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5',67*+ /5',67*+ /5',67*+ /5',67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  • 8. 11 Input Splits Map ... <K, V> Shuffle <K, list(V)> Reduce ... <list(V)> Output Files
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Data Processing Framework Distributed File System Log Servers Hadoop MapReduce Framework Users HDFS Query 5 Result Data Processing Framework (Continuous MapReduce) Framework Users Figure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3] is used as an example. Query Cloud Servers with Logs Results frameworks in a traditional store-first-query-later model [17]. Companies migrate log data from the source nodes to an append-only distributed file system such as GFS [18] or HDFS [3]. The distributed file system replicates the log data for availability and fault- HDFS tolerance. Once the data is placed in the file system, users can execute queries using bulk-processing frameworks and retrieve results from the distributed file system. Figure 1.1 illustrates this model. Distributed File System
  • 15.
  • 16.
  • 17.
  • 18. each input record, and the reduc of values, v[], that share the same for queries that are either highly duce functions that are distribu gates [14]. Thus we expect that u MapReduce combiner, allowing to merge values of a single key to and distribute processing overhe !"#$%" biner allows iMR to process win !"#$" further reduce data volumes throu %#&'()* !"#$$%&!'()*+,-./01 tion. The only non-standard (but Figure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-." avoids MapReduce jobs may impleme )*+,-." cost and latency of the store-first-query-later design by %#&'()* describe in Section 2.3.2. we %#&'()* &'( &'( moving processing onto the data sources. However, the primary way in w +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." that they emit a stream of results )*+,-." )*+,-." %#&'()* speed of social network updates or accuracy of ad target- &'( %#&'()* uous input, e.g., server log files %#&'()* %#&'()* &'( &'( &'( ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0 cessors [7], iMR bounds comp +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 on previous work in stream processing [5, 7, 9] to sup- haps infinite) data streams by pr
  • 19.
  • 20.
  • 21. Map Reduce and Stream Processing
  • 22.
  • 23. ",-./#"0-1.2 ! !'( !)% !*+ !() E7F/!.:7!2# "#$%& 3014!5 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A % & & & B B 10,!# %& '( '( )% *+ () 6!7819:7-;,<./,</10<. 10<.# C+$ =>%?@//A =>&?@//A C%$ =>&?@//A =>B?@//A CD CD CD CD +/3,< )+/3,< %&+/3,<
  • 24.
  • 25. !"#$%" ",-. E7F !"#$" %#&'()* !"#$$%&!'()*+,-./01 >. )01201*%$$, )*+,-." )*+,-." %#&'()* %#&'()* &'( &'( +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." )*+,-." )*+,-." %#&'()* %#&'()* %#&'()* %#&'()* &'( &'( &'( &'( Figure +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 sub-wi have a
  • 26. 3: iMR nodes process local log files to produce dows or panes. The system assumes log records ogical timestamp and arrive in order. !#5 !# & !$ 67 !#5 84 9 !4 & !$ " % " % " % !4&!4 !#&!# !$&!$ '(()*("+*,-".*,-")+/"0,1"02*3 :;/0< " " % % :;/0< ' !# !$ !# !$ = : iMR aggregates individual panes Pi in the net- o produce a result, the root may either combine
  • 27.
  • 28.
  • 29. # Call at each hit record map(k1, hitRecord) { timestamp = hitRecord.time # look up paneId from timestamp paneId = lookupPane(timestamp) if (paneId.endFlag == True) { # Notify whole data of the pane is sent notify(paneId) } emitIntermediate(paneId, 1, timestamp) } Map Reduce and Stream Processing
  • 30. combine(paneId, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(paneId, hitCount) } Map Reduce and Stream Processing
  • 31. # if node == root of aggregation tree reduce(paneId ,countList) { hitCount = 0 for count in countList { hitCount += count } sv = SlideValue.new(paneId) sv.hitCount = hitCount return sv } Map Reduce and Stream Processing
  • 32. # Window slide init(slide) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Reduce merge(rangeValue, slideValue) { rangeValue.hitCount += slideValue.hitCount } # slide window unmerge(rangeValue, slideValue) { rangeValue.hitCount -= slideValue.hitCount } Map Reduce and Stream Processing
  • 33.
  • 34.
  • 35.
  • 36. K-Means Clustering in Map Reduce
  • 37. Figure 2: MapReduce Classifier Training and Evaluation Procedure A Comparison of Approaches for Large-Scale Data Mining
  • 38. Google Pregel Graph Processing
  • 39. Google Pregel Graph Processing