Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

楽天における機械学習アルゴリズムの活用

7,789 views

Published on

楽天技術研究所では,楽天グループが保有する様々な大規模データを対象とした,機械学習・データマイニング・自然言語処理等の研究を実施しております.本発表では,楽天市場の商品データの構造化,楽天市場のユーザ行動データを利用した潜在顧客抽出,楽天VikiのTVドラマ対訳コーパスを利用して構築したTVドラマに特化した機械翻訳アルゴリズムについて紹介しています.

Published in: Technology
  • Be the first to comment

楽天における機械学習アルゴリズムの活用

  1. 1. 楽天における機械学習アルゴリズムの活用 Yu Hirate, Dr. Eng. Rakuten Institute of Technology Tokyo, Rakuten, Inc.
  2. 2. 2 平手勇宇  Principal Scientist, Rakuten Institute of Technology Manager, Intelligence domain research group  Bio. • 2005-2008 CS div. graduate school of Science and Engineering, Waseda University. • 2006-2009 Research Associate, Media Network Center, Waseda University. • 2009- current Rakuten Institute of Technology. Working on projects for extracting knowledge from large scale of data by utilizing data mining, machine learning technologies.
  3. 3. 3 Masaya Mori Global head • Established in 2006. • Launched R.I.T. NY in 2010. • Launched R.I.T. Paris in 2014. • Launched R.I.T. Singapore / Boston in 2015. Strategic R&D organization for Rakuten Group
  4. 4. 4 +100 researchers in 5 locations
  5. 5. 5 3 research groups for adapting with Internet growth RealityIntelligencePower • HCI • AR / VR • Image Processing • Distributed Computing • HPC • IoT • Machine Learning • Deep Learning • NLP • Data Mining
  6. 6. 6 Optimizing A/B testing Item Classification User Segmentation AI Coupon Distribution Recommender System Economy Prediction / Demand Prediction Review Analysis Anomaly Detection / Fraud Detection Image Recognition
  7. 7. 8 Huge! Unstructured! 241 million items Num.ofItems(million) date https://item.rakuten.co.jp/kawahara/345812/
  8. 8. 9 Why are we working on this problem? (Key Benefits) ‣ To organize our catalog in accordance with customer expectations ‣ To precisely search our catalog for products and its variants ‣ To measure and enforce merchant KPI's. What are we doing? (Key Tasks) ‣ Product Genre Classification ‣ Attribute Extraction from Product Information ‣ Merchant and Item Review Analysis How are we doing? (Key Technologies) ‣ Large-Scale Gradient Boosted Decision Trees ‣ Deep Learning (RNN's, CNN's, others) ‣ Computing Massive Number of NLP Features Product Catalog Businesses
  9. 9. 10 Each product can be assigned a category and attributes. For instance: +Category Grocery & food Subcategory Wine Each (sub)category has a number of relevant attributes with a list of valid values Challenge: this structured information is not always present or correct Goal: automatically predict category and attributes from text and/or images https://item.rakuten.co.jp/kawahara/345812/
  10. 10. 11 Classifier based on Deep Learning Algorithm (CNN) Prec@1 92% Prec@10 99% Classifier based on Deep Learning Algorithm(CNN) Prec@1 57% Prec@3 75% Extracting Words * Tested to Ichiba L3 category (1.5K categories) * Tested for PriceMinister Image Data Text Data • Item Title • Item Description Image Data
  11. 11. 12 Hobby and Entertainment > Books and Magazine > Business Electronics > Audio > Earphone / Headphone Electronics > Smartphone > AC Adaptor / Battery
  12. 12. 13
  13. 13. 14
  14. 14. 15 Detect prospective applicants from Ichiba purchasers by using their purchase trends and demographics Ichiba Active Users Prospective Applicants Extract a finance service
  15. 15. 16 Ichiba Active UsersOverlap 7,413 Positive Samples 7,417 (Negative) Samples About 50% of contractors of the Fintech service were Ichiba Active Users.
  16. 16. 17 抽出されたユーザ行動モデル:重要なファクタ 0 0.1 0.2 0.3 genre_41_100890_ / 花・ガーデン・DIY / DIY・工具 genre_72_111078_ / キッズ・ベビー・マタニティ / キッズ genre_50_110983_ / 靴 / メンズ靴 Age-05-[35-40] genre_93_101077_ / スポーツ・アウトドア / ゴルフ Area-01-Kanto Area-00-Others genre_113_101126_ / 車用品・バイク用品 / カー用品 Age-08-[50-*] Age-03-[25-30] Age-00-none gms Gender-00-none basket_max_price frequency basket_average_price average_unit_price Age-02-[20-25] Gender-02-female Gender-01-male Top 20 factors selected from 141 factors 市場ジャンル/車・バイク/車用品・バイク用品 市場ジャンル /スポーツ・アウトドア/ゴルフ用品 市場ジャンル/靴/メンズ靴 市場ジャンル/キッズ・ベビー・マタニティ/キッズ 市場ジャンル/ガーデン・DIY・工具/ DIY・工具 購買商品の平均単価 一回あたりの購買金額の平均値 購買頻度 一回あたりの購買金額の最大値 購買金額総計
  17. 17. 18 Prospective Users Control Group • Randomly Selected • About 300,000 users • Score >= 0.8 • About 300,000 users Send ichiba mail magazine to two groups Ichiba Mail Magazine
  18. 18. 19 Mail Deliver Open Mail Click Contents (Visit Service Page) Click Rate went up by +49.23% compared with control group +3.52% +49.23%
  19. 19. 20
  20. 20. 21 我们真的很有诚意了。 你说我一个老总都亲 自跑了好几趟了。 Machine translation is a Rakuten group company which provides video streaming service. Volunteers are editing subtitles and translated subtitles. https://www.viki.com/?locale=ja
  21. 21. 22  Translate from Chinese to English sentences  Extracted 10,000 Chinese-English sentence-pairs to evaluate commercial APIs and IBot, e.g.,  我一个老总都亲自跑了好几趟了  I’m a director and yet I’ve made so many trips  Extracted another 2.1 million sentence-pairs to train IBot’s model
  22. 22. 23  Applying Attentional Recurrent Neural Networks (RNN)  Neural Machine Translation by Jointly Learning to Align and Translate [Bahdanau, Cho & Bengio, ICLR 2015]  658 citations (Google scholar)  Train RNN with 2.1 million c Chinese-English sentence pairs
  23. 23. 24  Evaluated on 10,000 Chinese-English sentence pairs System BLEU (%) METEOR (%) Google API 12 20 Microsoft API 12 20 IBM Watson API 3 12 RIT (Aug 24) 10 15 RIT (Sep 7) 14 19 RIT (Sep 21) 22 24 RIT (Nov 28) 36 30

×