[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

1,085 views
816 views

Published on

Rakuten Technology Conference 2013
"DWH/Hadoop in Rakuten Ichiba"
Mitsuo Hangai (Rakuten)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,085
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba

  1. 1. DWH/Hadoop in Rakuten Ichiba Vol.01 Oct/26/2013 Mitsuo Hangai Sendai Development Gruop New Service Development Department, Rakuten, Inc. http://www.rakuten.co.jp/
  2. 2. Self introduction @bangucs Mitsuo Hangai(半谷 充生) Rakuten, Inc. Service Development Sendai Group 汉语 2
  3. 3. Agenda About Sendai Branch What is our data ware-house Now and future 3
  4. 4. About Sendai Branch 4
  5. 5. Do you know Sendai? 5
  6. 6. About Sendai Sendai City 白地図、世界地図、日本地図が無料【白地図専門店】 http://www.freemap.jp/japan/ja_kouiki_japan_big_scale_3.html 6
  7. 7. About Sendai 7
  8. 8. About Sendai branch 8
  9. 9. History of Sendai Development Group 2007. Foundation for Pro-sports. 2008. Start Ichiba Business Support and Infoseek operations. 2009. Growing up and starting Advertisement development. 2010. Start Marriage operations. 2011.Hit by the huge earthquake… Move to new office. 2012. GM changed to Nanjo (He organizes Satellite!). 30 20 10 0 2007 2008 2009 2010 2011 2012 2013 Advertisement Auction Marrige Infoseek Ichiba Pro-sports 9
  10. 10. Current work of Rakuten Sendai Our team! International Ichiba Development & Operation Central Data WareHouse Development & Operation Development & Operation Development & System replacement Operation Development & Operation 10
  11. 11. About the usage of Rakuten Ichiba’s Data 11
  12. 12. How/what we use Rakuten Ichiba’s data Purchase History Ranking Ichiba GMS(Gross Merchandise Sales) Reporting for 500 of EC Consultants Marketing department Accounting, Giving points Find injustice And so forth…. 12
  13. 13. By the way…. Do you know …. 13
  14. 14. How many orders Rakuten Ichiba receives per day? 14
  15. 15. A: About 2,000,000 Transactions (※order based, not items) 15
  16. 16. How much data Do our Data warehouse handle per day? 16
  17. 17. A:About 100GB (this is not all, only needed) 17
  18. 18. How many Items Does Rakuten Ichiba have? 18
  19. 19. A: about 1,400,000,000 Items (2013/10/08 basis) 19
  20. 20. There are some Long and Funny names of Items 20
  21. 21. This is the name of this item!! http://item.rakuten.co.jp/wakamaru/sale-2908-50offcp/ 21
  22. 22. This is the name of this item!! http://item.rakuten.co.jp/pascoshop/4901820354426/ 22
  23. 23. This is the name of this item!! http://item.rakuten.co.jp/e-cha/hd-sakusakuwakame/ 23
  24. 24. We have such huge data. (and funny) 24
  25. 25. We must handle such huge data until morning… 25
  26. 26. Like this(1): This table has about 200,000,000 records 26
  27. 27. Like this(2): About 2 meters Each tables has about 200,000,000 records 27
  28. 28. How tough… But it is necessary… 28
  29. 29. Few years ago(- May 2011) Old SelectDB SQL Perl Scheduler Batch Server File File File File File File Purchase RDB2 Interface File File File load ITEM Unload Shops RDB1 107 tables 378 interface files File File File File File File File File File File File File 29
  30. 30. Problem RDBMS had problems such as: -Poor performance… -Lack of disk amount... -Difficult to enhance… -servers are expensive!! 30
  31. 31. Really poor… For example: 31
  32. 32. 32
  33. 33. How do we solve it? 33
  34. 34. 34
  35. 35. Sweet point of Hadoop  Good performance!  As for batch processing, it acts extremely good performance.  Easy to enhance!  Just only add Data nodes.  Do not need high performance servers!  Just only commodity servers, so we can reduce costs! 35
  36. 36. Bitter point of Hadoop  MapReduce is not easy…  We decided to use Hive(enable MapReduce via SQL-like query language called HiveQL)  Hive has no “delete” and “insert into” clause, and HiveQL has many different from SQL…  Need to consider before development, deeply  Hive has high latency…  Only batch processing 36
  37. 37. Then we decided to use Hadoop. 37
  38. 38. Rakuten’s Shared Hadoop Cluster Recommend Recommend Ranking Ranking Item data analysis Item data analysis Behavior analysis Recommend Behavior analysis Japan Ichiba DWH Ranking Item data analysis Japan Ichiba DWH Access log analysis Behavior analysis Access log analysis Personalize Suggest Granting Point 2009 15 nodes 50TB 2011 69 nodes 300TB 2013 30 nodes 1PB 38
  39. 39. Plan: New SelectDB(called Ichiba DWH) HiveQL Shell/Java Batch Server Scheduler Purchase Hadoop Cluster 69nodes File File File Interface File File File load ITEM Unload Shops File File File 107 tables 378 interface files File File File File File File File File File File File File 39
  40. 40. We had to transfer data from old system to Hadoop: 107 tables!! We had to check all diff between old system and new system: 378 files!! (=378HiveQL) 40
  41. 41. Project was 2010-Oct To 2011-May 41
  42. 42. Can we Beat it? http://www.pakutaso.com/20130900245post-3233.html 42
  43. 43. Moreover... 43
  44. 44. 2011- March 44
  45. 45. We were hit by a huge earthquake on March 11, 2011… the project was in the climax…. Hole at the wall… 45
  46. 46. But we did. 46
  47. 47. At temporary office (like Tako-beya) 47
  48. 48. 2011- May 48
  49. 49. We released The new Data warehouse!! 49
  50. 50. Hadoop was great At result, total processing time basis: RDB1 161:29:38 VS 99:54:39 Hadoop beat RDBMS 40%!!!! 50
  51. 51. No problem at all!!! 51
  52. 52. Detail of architecture of our DWH HiveQL Perl/Shell/Java Batch Server Scheduler (Client Node of Hadoop) File File File Purchase Unload Shops File File File Job tracker/ Name Node File File File File File File File File File ITEM File File File Data Nodes File File File Rakuten Shared Hadoop Cluster 52
  53. 53. Now and future 53
  54. 54. Current situation of our DWH Purchase Shops Tables:202 HiveQLs:701 Keeps Growing!!! New! Data Nodes ITEM Review Rakuten Shared Hadoop Cluster New! … It doubled! Total processing time: Total processing time: 80:04:04!! 99:54:39 http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131 54
  55. 55. Future BI New! Purchase Shops New! Data Nodes Customer Support tool ITEM Review Rakuten Shared Hadoop Cluster New! New! … http://model.foto.ne.jp/free/product_info.php/cPath/24_251_243/products_id/302131 55
  56. 56. We will expand our service and usage of data!!  Currently, we act like a platform team. But our mission is “analytics”.  We are going to focus on Analyzing data, more and more!  And we are going to expand and develop other services which use Rakuten Ichiba’s exciting data! 56
  57. 57. Exciting!! http://www.pakutaso.com/20130926245post-3235.html 57
  58. 58. We are Waiting for you!! 58
  59. 59. Join us! 59
  60. 60. Thank you for listening! Contact me via: @bangucs Mitsuo.hangai mitsuo.hangai@mail.rakuten.com English is OK, of course 日本語でもおk 60

×