SlideShare a Scribd company logo
1 of 67
961
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•          (@sasata299)

• 2009 8                  JOIN

•
• Hadoop
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•      961

• 30         3   1

•
   -
   -
•
•       (       ,   , ...)

    -
    -       (   ,       ,    , …)

    -
•         Hadoop     MySQL

•
    -   GROUP BY

•                   7000      …

•                  (´Д`   )
• MySQL
   -
•
   -
  -
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
Hadoop
• Google   MapReduce   OSS

•
   -
   -
   -
   -
Hadoop
master (   )

               slave (   )
Hadoop
master (   )

               slave (   )




Map
Hadoop
master (    )

                       slave (   )



           <key,value>
Map
           Shuffle & Sort
Hadoop
master (    )

                       slave (            )



           <key,value>
Map                              Reduce
           Shuffle & Sort
• Hadoop Streaming (Ruby       )

• EC2 Cloudera Hadoop
   - Cloudera CDH1
   - Hadoop           0.18.3

•                S3
MySQL → Hadoop
•
• GROUP BY        MapReduce
    -       (          )
    -           key

• JOIN   MapReduce
    •
(1)   master



(2)   S3
(1)   master



(2)   S3
(1)   master
               master   slave scp


(2)   S3
(1)        master
                    master      slave scp


(2)        S3


      S3            slave scp
MySQL vs Hadoop


   7000




    MySQL    Hadoop
   MySQL    Hadoop
MySQL vs Hadoop

            ( Д )
   7000
            30

    MySQL    Hadoop
   MySQL    Hadoop
Hadoop++
   ←Hadoop


        ↓MySQL
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
• Hadoop
   -
• Hadoop                        (HADOOP-6254)
   -   S3
   -   SocketTimeoutException
• EMR (Elastic MapReduce)
   -   Amazon               Hadoop

• Cloudera   CDH2
   -
AMI
            (Amazon Machine
       UP       Image)




EMR


CDH2
AMI
            (Amazon Machine
       UP       Image)




EMR


CDH2
EMR




Job Flow (   )
EMR
             BootStrap Action




Job Flow (               )
EMR
             BootStrap Action


             Step (Hadoop Job)




Job Flow (               )
EMR
             BootStrap Action


             Step (Hadoop Job)




Job Flow (               )
•
    -
    - --alive
• AMI
   -            AMI

    - BootStrap Action
Created job flow j-8IXS98OW1WEE
                         ID
Hadoop
•
    - mapred.child.java.opts
    -   streaming

•
    -
    -   ElasticMapReduce-master 5100
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
• Map
   -
• Reduce
   -       key   Reduce
   -
UU


      Map


     Reduce
UU


           Map
     ID

          Reduce
UU


           Map


          Reduce
     ID
UU


           Map


          Reduce
     ID
Map


Reduce
Map
ID key

         Reduce
Map


               Reduce
key   Reduce
Map
100
                 100

                       Reduce
  key   Reduce
×
                        Map
100
                 ×
                 100

                       Reduce
  key   Reduce
×
                           Map
100
                   ×100

                          Reduce
      Reduce   key sort
×
                           Map
100
                   ×100

                          Reduce
      Reduce   key sort
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•                Hadoop
    -
    -
    -   Reduce
Hadoop and EMR Options for Processing Large Data

More Related Content

Similar to Hadoop and EMR Options for Processing Large Data

COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用Tatsuya Sasaki
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドTatsuya Sasaki
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Naoki Yanai
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallHadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallRyu Kobayashi
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)Takumi Asai
 
サンプルから見るMap reduceコード
サンプルから見るMap reduceコードサンプルから見るMap reduceコード
サンプルから見るMap reduceコードShinpei Ohtani
 
サンプルから見るMapReduceコード
サンプルから見るMapReduceコードサンプルから見るMapReduceコード
サンプルから見るMapReduceコードShinpei Ohtani
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 

Similar to Hadoop and EMR Options for Processing Large Data (20)

COOKPADでのHadoop利用
COOKPADでのHadoop利用COOKPADでのHadoop利用
COOKPADでのHadoop利用
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 FallHadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 Fall
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
 
サンプルから見るMap reduceコード
サンプルから見るMap reduceコードサンプルから見るMap reduceコード
サンプルから見るMap reduceコード
 
サンプルから見るMapReduceコード
サンプルから見るMapReduceコードサンプルから見るMapReduceコード
サンプルから見るMapReduceコード
 
Hadoop
HadoopHadoop
Hadoop
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Rug hogan-10-03-2012
Rug hogan-10-03-2012Rug hogan-10-03-2012
Rug hogan-10-03-2012
 
Hadoop and MapReduce
Hadoop and MapReduceHadoop and MapReduce
Hadoop and MapReduce
 
Azure Data Warehouse
Azure Data WarehouseAzure Data Warehouse
Azure Data Warehouse
 

More from Tatsuya Sasaki

からあげエンジニアについて
からあげエンジニアについてからあげエンジニアについて
からあげエンジニアについてTatsuya Sasaki
 
クックパッドでのemr利用事例
クックパッドでのemr利用事例クックパッドでのemr利用事例
クックパッドでのemr利用事例Tatsuya Sasaki
 
からあげとビーチと私
からあげとビーチと私からあげとビーチと私
からあげとビーチと私Tatsuya Sasaki
 
メタプログラミングでDSLを書こう
メタプログラミングでDSLを書こうメタプログラミングでDSLを書こう
メタプログラミングでDSLを書こうTatsuya Sasaki
 
NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴Tatsuya Sasaki
 
Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法Tatsuya Sasaki
 

More from Tatsuya Sasaki (7)

からあげエンジニアについて
からあげエンジニアについてからあげエンジニアについて
からあげエンジニアについて
 
クックパッドでのemr利用事例
クックパッドでのemr利用事例クックパッドでのemr利用事例
クックパッドでのemr利用事例
 
からあげとビーチと私
からあげとビーチと私からあげとビーチと私
からあげとビーチと私
 
メタプログラミングでDSLを書こう
メタプログラミングでDSLを書こうメタプログラミングでDSLを書こう
メタプログラミングでDSLを書こう
 
NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴
 
Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法
 
YUI
YUIYUI
YUI
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Hadoop and EMR Options for Processing Large Data

Editor's Notes