Your SlideShare is downloading. ×

データ解析技術入門(Hadoop編)

3,250

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,250
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ( &Hadoop ) 2013 4 12 Takumi Asai
  • 2. (26 )–– H21 H23 NTT Communications IP– H23 NTT– twitter:@p_i_o4545– blog:http://pioneerinocean.hatenablog.com/ • • R Hadoop ( )– •
  • 3. ( :4/12) Hadoop( : ) R Ruby R
  • 4. / / /⇒wikipedia
  • 5. =
  • 6. 21 ( )⇒Google,Facebook
  • 7. 1000 D R
  • 8. VS IT RDBMSSPSS R IT
  • 9. VSFSP Web FSP TESCO
  • 10. VSWinWin
  • 11. Hadoop Hadoop – Apache Java – Google MapReduce,Google File System(GFS) • google
  • 12. Hadoop Hadoop – HDFS MapReduce – Hbase HDFS – Google GFS – MapReduce – Google MapReduce – Key-Value Java
  • 13. HDFSNamenode,2Namenode,Datanode 3 Data Node Data Node Name Node Data Node Data Node Secondary Name Node Data Node Data Node
  • 14. HDFS• HDFS (64MB ) abcdefg #Block1 hijklmn (64MB) opqrstu abcdefg hijklmn opqrstu vwxyz vwxyz #Block2 (64MB) 150M #Block3 (22MB)
  • 15. HDFS – – – abcdefg #Block1 Data Node:A has 1,2 hijklmn (64MB) opqrstu Data Node:B has 2,3 vwxyz Data Node:C has 1,3 #Block2 (64MB) Data Node:D has 1 #Block3 (22MB) Data Node:E has 2,3
  • 16. Namenode(NN)– Namenode– HDFS––Datanode(DN)–– blk_xxxxxx– Secondary Data Node Name Node Name Node
  • 17. Secondary NamenodeSecondary Namenode(2NN)– 2NN Namenode– Namenode– • 32NN NN– Namenode– Namenode •– 2NN
  • 18. Namenode !– Namenode HDFS– NN 2NN– HDFS––
  • 19. HDFS HDFS Data Node Data Node Name Node Active Data Node Data Node Name Node Standby Data Node Data NodeStandby 2NN 2NN
  • 20. HDFS HDFS – Datanode – Datanode Namenode – Namenode – Namenode ⇔Datanode Datanode⇔Datanode • • • Linux – ls,cat – rwx • x HDFS
  • 21. MapReduce MapReduce – – – Map/Reduce 2 – Map/Reduce ,Mapper/Reducer – Map,Reduce Shuffle
  • 22. MapReduceHDFS Task Tracker Task Tracker ( ) Job Tracker Task Tracker ( ) Task Tracker Task Tracker Task Tracker JobTracker TaskTracker
  • 23. Data Node Data NodeTask Tracker Task Tracker Name Node Job Tracker Data Node Data NodeTask Tracker Task Tracker Secondary Name Node Data Node ※ HDFS Data NodeTask Tracker ※ Mapreduce Task Tracker
  • 24. Mapreduce YARN – HDFS Mapreduce – YARN(Mapreduce Ver2) – Mapreduce – YARN – YARN
  • 25. MapReduce WordCount – MapReduce (Hello World ) Hello Hadoop Goodbye World Hello Goodbye World World Hadoop Map <Hello,1> <Hadoop,1> <Goodbye,1> <World,1> <Hello,1> <Goodbye,1> <World,1> <World,1> <Hadoop,1> Shuffle <Goodbye,[1,1]> <Hadoop,[1,1]> <Hello,[1,1]> <World,[1,1,1]> Reduce <Goodbye,2> <Hadoop,2> <Hello,2> <World,3>
  • 26. MapReduce Mapper Reducer – – – – HDFS ” ” Map reduce Map reduce Map
  • 27. MapReduce – WordCount – Map Reduce – • fizz buzz fizzbuzz fizz – Ruby Ruby – Map #{ }¥t1 OK – Reduce
  • 28. MapReduce hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred Hadoop 3 hdfs 1 Mapred 4 – OK • #{ }¥t#{ } – cat test.txt | ruby map.rb | sort | ruby reduce.rb • Hadoop
  • 29. MapReduce :Map hdfs Hadoop Hadoop Mapred Mapred Mapred Hadoop Mapred hdfs 1 Hadoop 1 Hadoop 1 Mapred 1 Mapred 1
  • 30. Map#!/usr/bin/env rubySTDIN.each_line do |line|line.split.each do |word| puts "#{word}¥t1" endend
  • 31. Reducewordhash = {}STDIN.each_line do |line| word, count = line.strip.split if wordhash.has_key?(word) wordhash[word] += count.to_i else wordhash[word] = count.to_i endendwordhash.each {|record, count| puts "#{record}¥t#{count}"}
  • 32. Hadoop Hadoop – – Java OK – • .
  • 33. Hadoop
  • 34. Hadoop– • Pig • Hive– • Sqoop– • Mahout– Hadoop • whirr etc…
  • 35. Hadoop– HDFS • RAID •– HDFS Mapreduce • Amazon S3– •– •– •
  • 36. (Hadoop)– RDB–– Hive Pig––
  • 37. (Hadoop)–– HDD– Mapreduce–– Hadoop

×