Be the first to like this
In its daily business activities, Rakuten has experienced massive and diverse increases in the quantity of data, for example, product information, merchant information, and customer activities (e.g., queries, emails, reviews, helpdesk, etc.). In view of this, the identification of each entity is vitally important for an e-commerce search engine to find those exactly what the users want. An approach of matching semantic information is very useful in solving the variety of representations and meanings of those entities since the variations of user-created entities cannot be resolved in all cases. One very basic but powerful approach is therefore to consolidate a dictionary and to update this dictionary through the data-driven extraction of entities. A phonetic-based approach is a useful method of identifying these entities. Our method consists of three parts: preprocessing, measuring phonetic similarity,and postprocessing. Our model is trained on phonetic representations of unlabeled data using an unsupervised EM algorithm and is tested on the extraction of entity-word pairs. On average, the results of testing achieved the highest level of performance.