SlideShare a Scribd company logo
1 of 43
COOKPAD
Hadoop
http://cookpad.com/
@sasata299 (   )
@sasata299 (   )
@sasata299 (   )
•          Hadoop

• Hadoop
•
•          Hadoop

• Hadoop
•
COOKPAD
               896
          30         3   1
COOKPAD
               896
          30         3   1




                             etc..
”   ”
GROUP BY




                 MySQL
(   3.5   )
GROUP BY




                 MySQL
(   3.5   )




          7000
…
…
…
Hadoop


• Google MapReduce OSS
•
•
Hadoop
  •
  •
  •
master(             )




                             slave(        )




             key   Reducer


Mapper                           Reducer
(2009/10   )



•
•
•   MapReduce
•               etc..
Hadoop
•      Hadoop

• EC2 S3
                            ver. 0.18.3




• Cloudera & Hadoop Streaming
S3 Native FileSystem
  • Hadoop
  •                    5GB
  •            s3n:// ← ”n”



S3 Block FileSystem
  • Hadoop
      • HDFS
  •
  •            s3://
7000
  1/223




 30
Hadoop++
   ←Hadoop


      ↓MySQL
•          Hadoop

• Hadoop
•
cat hoge.csv | ruby mapper.rb | ruby reducer.rb



Reducer
 Mapper                               Reducer
 Mapper→Reducer              key       Reducer
master

1) -file
               master   slave                   scp

      hadoop jar xxx.jar
       -mapper hoge.rb -reducer fuga.rb
       -file hoge.rb -file fuga.rb
       -file

2) mapper, reducer
          File.open(‘            ’) {|f| ...}
S3

1) -cacheFile
            S3                    slave

      hadoop jar xxx.jar
       -mapper hoge.rb -reducer fuga.rb
       -file hoge.rb -file fuga.rb
       -cacheFile s3n://path/to/        #


2) mapper, reducer
      File.open(‘         ’) {|f| ...}
p target_ids.size # 50000

   ARGF.each do |log|
    log.chomp!
    id, type, ... = log.split(/,/)
    next if target_ids.include?(id)
   end

target_ids   5
                                      …
[13930, 29011, 39291, ...] # 50000

                  1000


{
    ‘139’ => [13930, 13989, 13991, ...], # 50
    ‘290’ => [29011, 29098, 29076, ...], # 50
    ‘392’ => [39291, 39244, 39251, ...], # 50
}
50
hash = Hash.new {|h,k| h[k] = []}
target_ids.each do |id|
  hash[ id.to_s[0,3] ] << id
end

ARGF.each do |log|
 log.chomp!
 id, type, ... = log.split(/,/)
 next if hash[ id[0,3] ].include?(id)
end
•          Hadoop

• Hadoop
•
8              7                                        8      …




                                                  http://ow.ly/2bdW1



S3 Native FileSystem
java.net.SocketTimeoutException: Read timed out
8Amazon7 Elastic             MapReduce                  8      …




                                                  http://ow.ly/2bdW1



S3 Native FileSystem
java.net.SocketTimeoutException: Read timed out
8Amazon7 Elastic             MapReduce                  8      …




                                                  http://ow.ly/2bdW1


Amazon Elastic MapReduce
S3 Native FileSystem
java.net.SocketTimeoutException: Read timed out
   Hadoop 0.21
Amazon Elastic MapReduce
COOKPADでのHadoop利用
COOKPADでのHadoop利用

More Related Content

What's hot

Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Apache spark session
Apache spark sessionApache spark session
Apache spark sessionknowbigdata
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Skills Matter
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Jeremy Hanna
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGAdam Kawa
 
A Hands-on Introduction to MapReduce (in Python)
A Hands-on Introduction to MapReduce (in Python)A Hands-on Introduction to MapReduce (in Python)
A Hands-on Introduction to MapReduce (in Python)David Massart
 
Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01Nitish Bhardwaj
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010Thejas Nair
 
Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Nitish Bhardwaj
 
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービスニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービスmosa siru
 
Go, memcached, microservices
Go, memcached, microservicesGo, memcached, microservices
Go, memcached, microservicesmosa siru
 
1004 kmz file on google earth from google drive
1004 kmz file on google earth from google drive1004 kmz file on google earth from google drive
1004 kmz file on google earth from google driveSatoru Hoshiba
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Skills Matter
 
Unix cmd on_free_bsd
Unix cmd on_free_bsdUnix cmd on_free_bsd
Unix cmd on_free_bsd小均 張
 

What's hot (18)

Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Apache spark session
Apache spark sessionApache spark session
Apache spark session
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 
A Hands-on Introduction to MapReduce (in Python)
A Hands-on Introduction to MapReduce (in Python)A Hands-on Introduction to MapReduce (in Python)
A Hands-on Introduction to MapReduce (in Python)
 
Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
 
Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)
 
Pptx present
Pptx presentPptx present
Pptx present
 
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービスニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
 
Go, memcached, microservices
Go, memcached, microservicesGo, memcached, microservices
Go, memcached, microservices
 
1004 kmz file on google earth from google drive
1004 kmz file on google earth from google drive1004 kmz file on google earth from google drive
1004 kmz file on google earth from google drive
 
2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure2012 apache hadoop_map_reduce_windows_azure
2012 apache hadoop_map_reduce_windows_azure
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010
 
Unix cmd on_free_bsd
Unix cmd on_free_bsdUnix cmd on_free_bsd
Unix cmd on_free_bsd
 
Hadoop
HadoopHadoop
Hadoop
 

Similar to COOKPADでのHadoop利用

マーケティングのためのHadoop利用
マーケティングのためのHadoop利用マーケティングのためのHadoop利用
マーケティングのためのHadoop利用Tatsuya Sasaki
 
Hadoopを業務で使ってみた
Hadoopを業務で使ってみたHadoopを業務で使ってみた
Hadoopを業務で使ってみたTatsuya Sasaki
 
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析Tatsuya Sasaki
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Jesang Yoon
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msJodok Batlogg
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...HostedbyConfluent
 
Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesAndrew Turner
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...Amazon Web Services
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraAndrey Kudryavtsev
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at FacebookS S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 

Similar to COOKPADでのHadoop利用 (20)

マーケティングのためのHadoop利用
マーケティングのためのHadoop利用マーケティングのためのHadoop利用
マーケティングのためのHadoop利用
 
Hadoopを業務で使ってみた
Hadoopを業務で使ってみたHadoopを業務で使ってみた
Hadoopを業務で使ってみた
 
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기Amazon Aurora로 안전하게 migration 하기
Amazon Aurora로 안전하게 migration 하기
 
You know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900msYou know, for search. Querying 24 Billion Documents in 900ms
You know, for search. Querying 24 Billion Documents in 900ms
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
Hadoop
HadoopHadoop
Hadoop
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
Introduction to HCFS
Introduction to HCFSIntroduction to HCFS
Introduction to HCFS
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
 
Scaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web ServicesScaling Mapufacture on Amazon Web Services
Scaling Mapufacture on Amazon Web Services
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 

More from Tatsuya Sasaki

からあげエンジニアについて
からあげエンジニアについてからあげエンジニアについて
からあげエンジニアについてTatsuya Sasaki
 
クックパッドでのemr利用事例
クックパッドでのemr利用事例クックパッドでのemr利用事例
クックパッドでのemr利用事例Tatsuya Sasaki
 
からあげとビーチと私
からあげとビーチと私からあげとビーチと私
からあげとビーチと私Tatsuya Sasaki
 
メタプログラミングでDSLを書こう
メタプログラミングでDSLを書こうメタプログラミングでDSLを書こう
メタプログラミングでDSLを書こうTatsuya Sasaki
 
NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴Tatsuya Sasaki
 
Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法Tatsuya Sasaki
 
Hadoopを業務で使ってみました
Hadoopを業務で使ってみましたHadoopを業務で使ってみました
Hadoopを業務で使ってみましたTatsuya Sasaki
 

More from Tatsuya Sasaki (8)

からあげエンジニアについて
からあげエンジニアについてからあげエンジニアについて
からあげエンジニアについて
 
クックパッドでのemr利用事例
クックパッドでのemr利用事例クックパッドでのemr利用事例
クックパッドでのemr利用事例
 
からあげとビーチと私
からあげとビーチと私からあげとビーチと私
からあげとビーチと私
 
メタプログラミングでDSLを書こう
メタプログラミングでDSLを書こうメタプログラミングでDSLを書こう
メタプログラミングでDSLを書こう
 
NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴NoSQLデータベースが登場した背景と特徴
NoSQLデータベースが登場した背景と特徴
 
Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法Hadoopをemr経由で利用する方法
Hadoopをemr経由で利用する方法
 
Hadoopを業務で使ってみました
Hadoopを業務で使ってみましたHadoopを業務で使ってみました
Hadoopを業務で使ってみました
 
YUI
YUIYUI
YUI
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

COOKPADでのHadoop利用

  • 6. Hadoop • Hadoop •
  • 7. Hadoop • Hadoop •
  • 8. COOKPAD 896 30 3 1
  • 9. COOKPAD 896 30 3 1 etc..
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. GROUP BY MySQL ( 3.5 )
  • 15. GROUP BY MySQL ( 3.5 ) 7000
  • 16.
  • 17.
  • 18.
  • 20. Hadoop • • •
  • 21. master( ) slave( ) key Reducer Mapper Reducer
  • 22. (2009/10 ) • • • MapReduce • etc..
  • 23. Hadoop • Hadoop • EC2 S3 ver. 0.18.3 • Cloudera & Hadoop Streaming
  • 24. S3 Native FileSystem • Hadoop • 5GB • s3n:// ← ”n” S3 Block FileSystem • Hadoop • HDFS • • s3://
  • 25.
  • 27. Hadoop++ ←Hadoop ↓MySQL
  • 28. Hadoop • Hadoop •
  • 29. cat hoge.csv | ruby mapper.rb | ruby reducer.rb Reducer Mapper Reducer Mapper→Reducer key Reducer
  • 30.
  • 31. master 1) -file master slave scp hadoop jar xxx.jar -mapper hoge.rb -reducer fuga.rb -file hoge.rb -file fuga.rb -file 2) mapper, reducer File.open(‘ ’) {|f| ...}
  • 32. S3 1) -cacheFile S3 slave hadoop jar xxx.jar -mapper hoge.rb -reducer fuga.rb -file hoge.rb -file fuga.rb -cacheFile s3n://path/to/ # 2) mapper, reducer File.open(‘ ’) {|f| ...}
  • 33.
  • 34. p target_ids.size # 50000 ARGF.each do |log| log.chomp! id, type, ... = log.split(/,/) next if target_ids.include?(id) end target_ids 5 …
  • 35. [13930, 29011, 39291, ...] # 50000 1000 { ‘139’ => [13930, 13989, 13991, ...], # 50 ‘290’ => [29011, 29098, 29076, ...], # 50 ‘392’ => [39291, 39244, 39251, ...], # 50 }
  • 36. 50 hash = Hash.new {|h,k| h[k] = []} target_ids.each do |id| hash[ id.to_s[0,3] ] << id end ARGF.each do |log| log.chomp! id, type, ... = log.split(/,/) next if hash[ id[0,3] ].include?(id) end
  • 37. Hadoop • Hadoop •
  • 38. 8 7 8 … http://ow.ly/2bdW1 S3 Native FileSystem java.net.SocketTimeoutException: Read timed out
  • 39. 8Amazon7 Elastic MapReduce 8 … http://ow.ly/2bdW1 S3 Native FileSystem java.net.SocketTimeoutException: Read timed out
  • 40. 8Amazon7 Elastic MapReduce 8 … http://ow.ly/2bdW1 Amazon Elastic MapReduce S3 Native FileSystem java.net.SocketTimeoutException: Read timed out Hadoop 0.21

Editor's Notes