Submit Search
Upload
Hadoop and EMR Options for Processing Large Data
•
Download as KEY, PDF
•
36 likes
•
5,248 views
AI-enhanced title
Tatsuya Sasaki
Follow
2010/10/18のJJUG CCC 2010 Fallの講演で使用したスライドです
Read less
Read more
Technology
Spiritual
Report
Share
Report
Share
1 of 67
Download now
Recommended
Hadoopを業務で使ってみました
Hadoopを業務で使ってみました
Tatsuya Sasaki
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
Tatsuya Sasaki
マーケティングのためのHadoop利用
マーケティングのためのHadoop利用
Tatsuya Sasaki
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Tatsuya Sasaki
Large Scale Data Processing & Storage
Large Scale Data Processing & Storage
Ilayaraja P
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
amarsri
ソーシャルアプリでの Amazon Elastic MapReduce 活用事例
ソーシャルアプリでの Amazon Elastic MapReduce 活用事例
Takahiro Kamatani
Big Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
Recommended
Hadoopを業務で使ってみました
Hadoopを業務で使ってみました
Tatsuya Sasaki
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
Tatsuya Sasaki
マーケティングのためのHadoop利用
マーケティングのためのHadoop利用
Tatsuya Sasaki
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Tatsuya Sasaki
Large Scale Data Processing & Storage
Large Scale Data Processing & Storage
Ilayaraja P
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
amarsri
ソーシャルアプリでの Amazon Elastic MapReduce 活用事例
ソーシャルアプリでの Amazon Elastic MapReduce 活用事例
Takahiro Kamatani
Big Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
COOKPADでのHadoop利用
COOKPADでのHadoop利用
Tatsuya Sasaki
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
Tatsuya Sasaki
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
Toshihiro Suzuki
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Naoki Yanai
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
Hadoop london
Hadoop london
Yahoo Developer Network
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
npinto
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
ThoughtWorks
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 Fall
Ryu Kobayashi
Hadoop Overview kdd2011
Hadoop Overview kdd2011
Milind Bhandarkar
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
サンプルから見るMap reduceコード
サンプルから見るMap reduceコード
Shinpei Ohtani
サンプルから見るMapReduceコード
サンプルから見るMapReduceコード
Shinpei Ohtani
Hadoop
Hadoop
Rajesh Piryani
Brust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
Rug hogan-10-03-2012
Rug hogan-10-03-2012
designandanalytics
Hadoop and MapReduce
Hadoop and MapReduce
Hemanth Kumar Mantri
Azure Data Warehouse
Azure Data Warehouse
Nikolay Stanev
からあげエンジニアについて
からあげエンジニアについて
Tatsuya Sasaki
クックパッドでのemr利用事例
クックパッドでのemr利用事例
Tatsuya Sasaki
More Related Content
Similar to Hadoop and EMR Options for Processing Large Data
COOKPADでのHadoop利用
COOKPADでのHadoop利用
Tatsuya Sasaki
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
Tatsuya Sasaki
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
Toshihiro Suzuki
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Naoki Yanai
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
Hadoop london
Hadoop london
Yahoo Developer Network
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
npinto
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
ThoughtWorks
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 Fall
Ryu Kobayashi
Hadoop Overview kdd2011
Hadoop Overview kdd2011
Milind Bhandarkar
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
Takumi Asai
サンプルから見るMap reduceコード
サンプルから見るMap reduceコード
Shinpei Ohtani
サンプルから見るMapReduceコード
サンプルから見るMapReduceコード
Shinpei Ohtani
Hadoop
Hadoop
Rajesh Piryani
Brust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
Rug hogan-10-03-2012
Rug hogan-10-03-2012
designandanalytics
Hadoop and MapReduce
Hadoop and MapReduce
Hemanth Kumar Mantri
Azure Data Warehouse
Azure Data Warehouse
Nikolay Stanev
Similar to Hadoop and EMR Options for Processing Large Data
(20)
COOKPADでのHadoop利用
COOKPADでのHadoop利用
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Hadoop london
Hadoop london
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Hadoop Overview & Architecture
Hadoop Overview & Architecture
Hadoop Conference Japan 2011 Fall
Hadoop Conference Japan 2011 Fall
Hadoop Overview kdd2011
Hadoop Overview kdd2011
データ解析技術入門(Hadoop編)
データ解析技術入門(Hadoop編)
サンプルから見るMap reduceコード
サンプルから見るMap reduceコード
サンプルから見るMapReduceコード
サンプルから見るMapReduceコード
Hadoop
Hadoop
Brust hadoopecosystem
Brust hadoopecosystem
Rug hogan-10-03-2012
Rug hogan-10-03-2012
Hadoop and MapReduce
Hadoop and MapReduce
Azure Data Warehouse
Azure Data Warehouse
More from Tatsuya Sasaki
からあげエンジニアについて
からあげエンジニアについて
Tatsuya Sasaki
クックパッドでのemr利用事例
クックパッドでのemr利用事例
Tatsuya Sasaki
からあげとビーチと私
からあげとビーチと私
Tatsuya Sasaki
メタプログラミングでDSLを書こう
メタプログラミングでDSLを書こう
Tatsuya Sasaki
NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴
Tatsuya Sasaki
Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法
Tatsuya Sasaki
YUI
YUI
Tatsuya Sasaki
More from Tatsuya Sasaki
(7)
からあげエンジニアについて
からあげエンジニアについて
クックパッドでのemr利用事例
クックパッドでのemr利用事例
からあげとビーチと私
からあげとビーチと私
メタプログラミングでDSLを書こう
メタプログラミングでDSLを書こう
NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴
Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法
YUI
YUI
Recently uploaded
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
MarianaLemus7
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
null - The Open Security Community
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
BookNet Canada
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Recently uploaded
(20)
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Hadoop and EMR Options for Processing Large Data
1.
961
2.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
3.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
4.
•
(@sasata299) • 2009 8 JOIN • • Hadoop
5.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
6.
7.
•
961 • 30 3 1 • - -
8.
• •
( , , ...) - - ( , , , …) -
9.
10.
11.
12.
13.
14.
•
Hadoop MySQL • - GROUP BY • 7000 … • (´Д` )
15.
• MySQL
- • - -
16.
17.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
18.
Hadoop • Google
MapReduce OSS • - - - -
19.
Hadoop master (
) slave ( )
20.
Hadoop master (
) slave ( ) Map
21.
Hadoop master (
) slave ( ) <key,value> Map Shuffle & Sort
22.
Hadoop master (
) slave ( ) <key,value> Map Reduce Shuffle & Sort
23.
• Hadoop Streaming
(Ruby ) • EC2 Cloudera Hadoop - Cloudera CDH1 - Hadoop 0.18.3 • S3
24.
MySQL → Hadoop • •
GROUP BY MapReduce - ( ) - key • JOIN MapReduce •
25.
(1)
master (2) S3
26.
(1)
master (2) S3
27.
(1)
master master slave scp (2) S3
28.
(1)
master master slave scp (2) S3 S3 slave scp
29.
MySQL vs Hadoop
7000 MySQL Hadoop MySQL Hadoop
30.
MySQL vs Hadoop
( Д ) 7000 30 MySQL Hadoop MySQL Hadoop
31.
Hadoop++
←Hadoop ↓MySQL
32.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
33.
• Hadoop
- • Hadoop (HADOOP-6254) - S3 - SocketTimeoutException
34.
• EMR (Elastic
MapReduce) - Amazon Hadoop • Cloudera CDH2 -
35.
AMI
(Amazon Machine UP Image) EMR CDH2
36.
AMI
(Amazon Machine UP Image) EMR CDH2
37.
EMR Job Flow (
)
38.
EMR
BootStrap Action Job Flow ( )
39.
EMR
BootStrap Action Step (Hadoop Job) Job Flow ( )
40.
EMR
BootStrap Action Step (Hadoop Job) Job Flow ( )
41.
•
- - --alive • AMI - AMI - BootStrap Action
42.
43.
Created job flow
j-8IXS98OW1WEE ID
44.
Hadoop
45.
46.
•
- mapred.child.java.opts - streaming • - - ElasticMapReduce-master 5100
47.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
48.
• Map
- • Reduce - key Reduce -
49.
UU
Map Reduce
50.
UU
Map ID Reduce
51.
UU
Map Reduce ID
52.
UU
Map Reduce ID
53.
Map Reduce
54.
Map ID key
Reduce
55.
Map
Reduce key Reduce
56.
Map 100
100 Reduce key Reduce
57.
×
Map 100 × 100 Reduce key Reduce
58.
×
Map 100 ×100 Reduce Reduce key sort
59.
×
Map 100 ×100 Reduce Reduce key sort
60.
Hadoop •
- Hadoop
61.
Hadoop •
- Hadoop
62.
Hadoop •
- Hadoop
63.
Hadoop •
- Hadoop
64.
Hadoop •
- Hadoop
65.
• • • Hadoop
(Cloudera) • Elastic MapReduce • •
66.
•
Hadoop - - - Reduce
Editor's Notes
Download now