961万人の食卓を支えるデータ解析

10,633
-1

Published on

2010/10/18のJJUG CCC 2010 Fallの講演で使用したスライドです

Published in: Technology, Spiritual

961万人の食卓を支えるデータ解析

  1. 1. 961
  2. 2. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  3. 3. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  4. 4. • (@sasata299) • 2009 8 JOIN • • Hadoop
  5. 5. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  6. 6. • 961 • 30 3 1 • - -
  7. 7. • • ( , , ...) - - ( , , , …) -
  8. 8. • Hadoop MySQL • - GROUP BY • 7000 … • (´Д` )
  9. 9. • MySQL - • - -
  10. 10. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  11. 11. Hadoop • Google MapReduce OSS • - - - -
  12. 12. Hadoop master ( ) slave ( )
  13. 13. Hadoop master ( ) slave ( ) Map
  14. 14. Hadoop master ( ) slave ( ) <key,value> Map Shuffle & Sort
  15. 15. Hadoop master ( ) slave ( ) <key,value> Map Reduce Shuffle & Sort
  16. 16. • Hadoop Streaming (Ruby ) • EC2 Cloudera Hadoop - Cloudera CDH1 - Hadoop 0.18.3 • S3
  17. 17. MySQL → Hadoop • • GROUP BY MapReduce - ( ) - key • JOIN MapReduce •
  18. 18. (1) master (2) S3
  19. 19. (1) master (2) S3
  20. 20. (1) master master slave scp (2) S3
  21. 21. (1) master master slave scp (2) S3 S3 slave scp
  22. 22. MySQL vs Hadoop 7000 MySQL Hadoop MySQL Hadoop
  23. 23. MySQL vs Hadoop ( Д ) 7000 30 MySQL Hadoop MySQL Hadoop
  24. 24. Hadoop++ ←Hadoop ↓MySQL
  25. 25. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  26. 26. • Hadoop - • Hadoop (HADOOP-6254) - S3 - SocketTimeoutException
  27. 27. • EMR (Elastic MapReduce) - Amazon Hadoop • Cloudera CDH2 -
  28. 28. AMI (Amazon Machine UP Image) EMR CDH2
  29. 29. AMI (Amazon Machine UP Image) EMR CDH2
  30. 30. EMR Job Flow ( )
  31. 31. EMR BootStrap Action Job Flow ( )
  32. 32. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
  33. 33. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
  34. 34. • - - --alive • AMI - AMI - BootStrap Action
  35. 35. Created job flow j-8IXS98OW1WEE ID
  36. 36. Hadoop
  37. 37. • - mapred.child.java.opts - streaming • - - ElasticMapReduce-master 5100
  38. 38. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  39. 39. • Map - • Reduce - key Reduce -
  40. 40. UU Map Reduce
  41. 41. UU Map ID Reduce
  42. 42. UU Map Reduce ID
  43. 43. UU Map Reduce ID
  44. 44. Map Reduce
  45. 45. Map ID key Reduce
  46. 46. Map Reduce key Reduce
  47. 47. Map 100 100 Reduce key Reduce
  48. 48. × Map 100 × 100 Reduce key Reduce
  49. 49. × Map 100 ×100 Reduce Reduce key sort
  50. 50. × Map 100 ×100 Reduce Reduce key sort
  51. 51. Hadoop • - Hadoop
  52. 52. Hadoop • - Hadoop
  53. 53. Hadoop • - Hadoop
  54. 54. Hadoop • - Hadoop
  55. 55. Hadoop • - Hadoop
  56. 56. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  57. 57. • Hadoop - - - Reduce

×