Successfully reported this slideshow.
Your SlideShare is downloading. ×

An approach to semantics word extraction by applying language phonology in rakuten

An approach to semantics word extraction by applying language phonology in rakuten

Download to read offline

In its daily business activities, Rakuten has experienced massive and diverse increases in the quantity of data, for example, product information, merchant information, and customer activities (e.g., queries, emails, reviews, helpdesk, etc.). In view of this, the identification of each entity is vitally important for an e-commerce search engine to find those exactly what the users want. An approach of matching semantic information is very useful in solving the variety of representations and meanings of those entities since the variations of user-created entities cannot be resolved in all cases. One very basic but powerful approach is therefore to consolidate a dictionary and to update this dictionary through the data-driven extraction of entities. A phonetic-based approach is a useful method of identifying these entities. Our method consists of three parts: preprocessing, measuring phonetic similarity,and postprocessing. Our model is trained on phonetic representations of unlabeled data using an unsupervised EM algorithm and is tested on the extraction of entity-word pairs. On average, the results of testing achieved the highest level of performance.

In its daily business activities, Rakuten has experienced massive and diverse increases in the quantity of data, for example, product information, merchant information, and customer activities (e.g., queries, emails, reviews, helpdesk, etc.). In view of this, the identification of each entity is vitally important for an e-commerce search engine to find those exactly what the users want. An approach of matching semantic information is very useful in solving the variety of representations and meanings of those entities since the variations of user-created entities cannot be resolved in all cases. One very basic but powerful approach is therefore to consolidate a dictionary and to update this dictionary through the data-driven extraction of entities. A phonetic-based approach is a useful method of identifying these entities. Our method consists of three parts: preprocessing, measuring phonetic similarity,and postprocessing. Our model is trained on phonetic representations of unlabeled data using an unsupervised EM algorithm and is tested on the extraction of entity-word pairs. On average, the results of testing achieved the highest level of performance.

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

An approach to semantics word extraction by applying language phonology in rakuten

  1. 1. 2
  2. 2. 3 ヘアーアイロン ヘアーアイロン
  3. 3. 4 ヘアーアイロン ヘアアイロン ヘアーアイロ ン ヘアア イロン
  4. 4. 5  籠 篭 バスケット ヘアーアイロン 
  5. 5. 6 
  6. 6. 7 ワンピース デザイン ボタニカル柄 ミセス ファッション キャロルグレイ … … … … … …
  7. 7. 8 … … ワンピース デザイン ボタニカル柄 ミセス ファッション キャロルグレイ    … …  
  8. 8. 9 • •
  9. 9. 10 • • •
  10. 10. 11 https://search.rakuten.co.jp/search/mall/hair+iron/ (Access Date: 2018/10/24)
  11. 11. ありがとうございました。 ohnmar.htun@rakuten.com
  12. 12. 13 ʋ A part of CLSG

×