Hadoop at Rakuten, 2011/07/06


Published on

Hadoop at Rakuten

Published in: Technology

Hadoop at Rakuten, 2011/07/06

  1. 1. Hadoop at Rakuten. Rakuten Inc. Architect GroupHamba Mitsuharu & Nakagawa Gen 2011/07/06(Wed) 1
  2. 2. Hadoop at Rakuten.Today’s Agenda.1. Our Profie.2. What is Hadoop?3. Our Current Hadoop System Overview.4. Our Hadoop Usage.5. Our Challenge.6. Our Future Plan. 2
  3. 3. Hadoop at Rakuten.Our Profile. 3
  4. 4. Our Profile.From ACT GroupNakagawa GenHamba Mitsuharu 4
  5. 5. Our Profile.Our MissionEnhancing Hadoop at Rakuten. 5
  6. 6. Our Profile.Latest Our Tasks.Done.1.Implementing Ganglia.2.Implementing HA. 6
  7. 7. Our Profile.Latest Our Tasks.Now Handing Over.1. Keeping Up Our Hadoop Cluster.2. Modifying Our Hadoop Configurations.3. Implementing Scripts for Daily Chores. 7
  8. 8. Our Profile.Latest Our Tasks.Concentrate It!1. Evaluating The Related Products. 8
  9. 9. Hadoop at Rakuten.What is Hadoop? 9
  10. 10. What is Hadoop?One of The Most PowerfulDistributed Processingfor Large Data Sets. 10
  11. 11. What is Hadoop?Distributions. 11
  12. 12. What is Hadoop?Ecosystem. ETC... 12
  13. 13. What is Hadoop?HDFS & MapReduceConstitute Hadoop.HDFS :Hadoop Distributed File System.MapReduce :Map & Reduce (Includes Shuffle & Sort) . 13
  14. 14. What is Hadoop?Input from HDFS. Process by MapReduce. Output to HDFS. Source : http://horicky.blogspot.com/2008_11_01_archive.html 14
  15. 15. What is Hadoop?Simple Example. Source : http://techblog.yahoo.co.jp/cat207/cat209/hadoop/ 15
  16. 16. What is Hadoop?In Common Case,Combine Several Simple Jobs. Source : http://horicky.blogspot.com/2008_11_01_archive.html 16
  17. 17. What is Hadoop?NameNode & DataNodeConstituteHDFS. Source : http://horicky.blogspot.com/2008_11_01_archive.html 17
  18. 18. What is Hadoop?Read & Write on HDFS. Source : http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes 18
  19. 19. What is Hadoop?JobTracker & TaskTrackerConstituteMapReduce.Source : http://horicky.blogspot.com/2008_11_01_archive.html 19
  20. 20. What is Hadoop?Good & Bad Points of Hadoop. Good!Easy to Scale Out System.Easy to Implement Distributed Processing. Bad…There is SPoF at NameNode. 20
  21. 21. Hadoop at Rakuten.Our Current Hadoop System Overview. 21
  22. 22. Our Current Hadoop System Overview.The Cluster Infrastructure. #1For Instance. Source : http://www.ibm.com/developerworks/linux/library/l-hadoop/ 22
  23. 23. Our Current Hadoop System Overview. The Cluster Infrastructure. #2 Client In Our Case. Switch 1Gbps Switch Switch Switch 1Gbps 1Gbps 1GbpsNN&JT x3 SNN x3 NN&JT x3 Active Standby DN&TT DN&TT DN&TT x10 x10 x10 x10 x10 x10 x18 3 Masters & 69 Slaves. x18 x18Others DN&TT DN&TT Others DN&TT DN&TT Others DN&TT DN&TT Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack Rack 23
  24. 24. Our Current Hadoop System Overview.The Monitoring System.Using Ganglia (& MRTG).Every Time We Easily Can CheckThe Resource Usage,Not Only Each MachineBut As Cluster. 24
  25. 25. Our Current Hadoop System Overview.High Availability. ClientUsing DRBD & HeartBeat. NN : NameNode JT : JobTracker v-host.rakuten.co.jp Active Standby eth1 eth1 NN JT NN JT /foo/drbd0 /foo/drbd1 /foo/drbd0 /foo/drbd1 eth0 eth0 Source : Gen DRBD Sync The Change. 25
  26. 26. Hadoop at Rakuten.Our Hadoop Usage. 26
  27. 27. Our Hadoop Usage.Who Is Using Our Hadoop.1. Generating Recommend Engine Index.2. Analyzing Redirect Log.3. Calculating AD Targeting Index.4. Measuring AD Effects.5. Analyzing Ichiba Merchandise & Order Info.6. Calculating Ichiba Product Ranking.7. Analyzing Search Log.8. Analyzing Rakuten Travel’s Access Log. (Coming Soon...)9. Analyzing Search Word N-gram. (Coming Soon...) 27
  28. 28. Our Hadoop Usage.The Issues of The Previous System.1. Need High Cost to Keep Up The RDBMS.2. Need Quite a Lot of Storage Space More & More.3. System Cannot Handle So Many Job Request Due to Low Performance. Batch Server Purchase Marketing Manipulate Shop Intermediate Utility Unload Load File File File File File File Category Intermediate NFS ITEM Previous System Mail 28
  29. 29. Our Hadoop Usage.The Effect of The New System.1. Get Scalable System at Very Low Cost. (80% OFF as Storage.)2. Transaction Time is Dramatically Improved. (50-75% OFF.) Batch Server with Purchase Marketing Manipulate Shop Utility Unload Load File File File File File File Category NFS Intermediate ITEM Mail New System! 1st Step. 29
  30. 30. Our Hadoop Usage.The Remaining Subject ofThe New System.1. Still Halfway to Aiming DWH.2. The Negative Influence Due to The Migration from Occupied Environment to Shared Environment. 1. Security. 2. Sharing Cluster Resource. 30
  31. 31. Hadoop at Rakuten.Our Challenge. 31
  32. 32. Our Challenge.The Issues with Our Hadoop.1. Likely to Use Up The HDFS Space.2. Need Much Electlicity Power.3. Share The Cluster Resource Efficiently.4. Need More Network Bandwidth. 32
  33. 33. Hadoop at Rakuten.Our Future Plan. 33
  34. 34. Our Future Plan.Considering New Slave Machine.Now Looking for a Machine Which has…Low Electric Power Consumption,About 6 Cores CPU x2,About 10TB HDD,About 96GB Memory,& Naturally Compatible With Our Data Center. ? 34
  35. 35. Our Future Plan.Upgrade from Apache to CDH3. Mr.Eric Sammer (Solution Architect at Cloudera) Described the Advantage of Hadoop from Cloudera on Quora.1. A version of Hadoop that has frequent releases (quarterly) that include bug fixes and back ported features (append for HBase, Kerberos security from Y!, etc.).2. Related projects (Hive, Pig, Oozie, HBase, Flume, Sqoop, etc.) tested together and work as a cohesive system.3. Simplified installation via Yum / Apt repositories.4. Tighter integration with the OS (init scripts for daemons, installation of things in common paths, logs in their proper location.).5. A fixed release schedule.6. Support available from Cloudera with SLAs. Source : http://www.quora.com/What-are-the-advantages-of-getting-Apache-Hadoop-from-Cloudera-rather-than-the-Apache-Software-Foundation] 35
  36. 36. Our Future Plan.Evaluating HBase Using AWS.Constructing HBase Cluster on Amazon EC2.Doing Evaluation & Verification This Summer! 36
  37. 37. Hadoop at Rakuten.Need Your Help!We Need Hadooper Much More!Come With Us! 37
  38. 38. Hadoop at Rakuten.Thank You. 38