Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An approach to semantics word extraction by applying language phonology in rakuten

593 views

Published on

In its daily business activities, Rakuten has experienced massive and diverse increases in the quantity of data, for example, product information, merchant information, and customer activities (e.g., queries, emails, reviews, helpdesk, etc.). In view of this, the identification of each entity is vitally important for an e-commerce search engine to find those exactly what the users want. An approach of matching semantic information is very useful in solving the variety of representations and meanings of those entities since the variations of user-created entities cannot be resolved in all cases. One very basic but powerful approach is therefore to consolidate a dictionary and to update this dictionary through the data-driven extraction of entities. A phonetic-based approach is a useful method of identifying these entities. Our method consists of three parts: preprocessing, measuring phonetic similarity,and postprocessing. Our model is trained on phonetic representations of unlabeled data using an unsupervised EM algorithm and is tested on the extraction of entity-word pairs. On average, the results of testing achieved the highest level of performance.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

An approach to semantics word extraction by applying language phonology in rakuten

  1. 1. 2
  2. 2. 3 ヘアーアイロン ヘアーアイロン
  3. 3. 4 ヘアーアイロン ヘアアイロン ヘアーアイロ ン ヘアア イロン
  4. 4. 5  籠 篭 バスケット ヘアーアイロン 
  5. 5. 6 
  6. 6. 7 ワンピース デザイン ボタニカル柄 ミセス ファッション キャロルグレイ … … … … … …
  7. 7. 8 … … ワンピース デザイン ボタニカル柄 ミセス ファッション キャロルグレイ    … …  
  8. 8. 9 • •
  9. 9. 10 • • •
  10. 10. 11 https://search.rakuten.co.jp/search/mall/hair+iron/ (Access Date: 2018/10/24)
  11. 11. ありがとうございました。 ohnmar.htun@rakuten.com
  12. 12. 13 ʋ A part of CLSG

×