CompanyDepot: Employer Name Normalization in the Online Recruitment Industry
In the recruitment domain, the employer name normalization task, which links employer names in job postings or resumes to entities in an employer knowledge base (KB), is important to many business applications. It has several unique challenges: handling employer names from both job postings and resumes, leveraging the corresponding location and url context, as well as handling name variations, irrelevant input data, and noises in the KB. In this talk, we present a system called CompanyDepot which uses machine learning techniques to address these challenges. The proposed system achieves 2.5%- 21.4% higher coverage at the same precision level compared to a legacy system used at CareerBuilder over multiple real-world datasets. After applying it to several applications at CareerBuilder, we faced a new challenge: how to avoid duplicate normalization results when the KB is noisy and contains many duplicate entities. To address this challenge, we extend the CompanyDepot system to normalize employer names not only at entity level, but also at cluster level by mapping a query to a cluster in the KB that best matches the query. The proposed system performs an efficient graph-based clustering based on external knowledge from five mapping sources. We also propose a new metric based on success rate and diversity reduction ratio for evaluating the cluster-level normalization. Through experiments and applications, we demonstrate a large improvement on normalization quality from entity-level to cluster-level normalization.