Improving Search in Workday Products using Natural Language Processing

Improving Search in Workday Products
using Natural Language Processing
Adam Baker
Namrata Ghadi

1. User data is very noisy
• Abbreviations and misspellings are common
• Synonyms are common, eg “word vectors”, “word embeddings”
2. Classification
• Search “data science”, return docs with “deep learning”/R
3. Recommendation
• “deep” → deep learning
• nlp → machine learning
Problems looking for NLP

• Why Word Vectors
• Word Vectors Explained
• Word Representation Use Cases
• Model Evaluation
Agenda

• Captures Semantics Captures Syntax
Athens
Greece
Norway Oslo
dollars dollar
mousemice
Athens - Greece
Why Might Word Vectors Help?
𝑑𝑜𝑙𝑙𝑎𝑟𝑠 − 𝑑𝑜𝑙𝑙𝑎𝑟 + 𝑚𝑜𝑢𝑠𝑒 ≈ 𝑚𝑖𝑐𝑒

• Similar terms point in similar direction: cosine similarity
• Rank suggestions, given existing profile
• Find similar terms in taxonomy
• Cluster for folksonomy, synonyms
• Cheap to train
• No labelled data needed
• Very efficient training algorithms
Why Might Word Vectors Help?

Steps for using fastText:
1. Get a corpus (news feeds, web scrapes, social media posts,
Wikipedia)
2. Minimize the difference between predictions and corpus
3. Tune hyperparameters
Where Do Word Vectors Come From?

backed
Skip Gram with Negative Sampling (SGNS)
𝑣 𝑏𝑎𝑐𝑘𝑒𝑑
𝑐 𝑎
𝑐 𝑛𝑙𝑝
𝑐 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛
𝑐 𝑠𝑦𝑠𝑡𝑒𝑚
“a nlp backed recommendation system”

SGNS in fastText in Detail
• Let 𝑣 𝑤 be canonical vector for word 𝑤
𝑐 𝑤 be context vector for word 𝑤
𝜎 𝑥 =
1
1−𝑒−𝑥
• 𝑝 𝑤 𝑓 = 𝜎(𝑣 𝑓∙ 𝑐 𝑤)
• Maximize 𝑝 𝑐 𝑓 for c in context of focus word f
Minimize 𝑝 𝑛 𝑓 for n randomly sampled from the vocabulary [1, 2]

• “You shall know a word by the company it keeps” - J.R. Firth
• Context window
• Narrow window → functional, syntactic vectors
• Wide window → topical, semantic vectors
• Dimensionality
• Character n-gram model for OOV terms
• Phrases
• “data science” → “data_science”
• compositional vectors (eg [2, 3])
The Art of Training Word Vectors

• Search at the core
• Talent: Candidate search and assignment
• HCM: Job Title, Job Qualification
• Having clean entities is paramount to success of Workday’s products
Why improve search?

Denormalized Entities
Senior Software Engineer, 2009-Present
Software Engineer, 2006 - 2009
Programming Languages: Scala, Java, C#
Database: SQL, MySQL
Senior Software Engineer
Software Engineer
Scala, Java, C#
SQL, MySQL
Software Engineer
Scala
Java
C#
SQL
MySQL
Senior Software Engineer, 2009-Present
Software Engineer, 2006 – 2009
Software Engineer
Programming Languages: Scala, Java, C#
Database: SQL, MySQL
Scala, Java, C#
SQL, MySQL
Scala
Java
C#
SQL

• Regex Based Approaches
• Partial Match
• Multiple Entities
• True Casing
• Synonyms Canonicalization
Entity Cleanup
Databases: SQL SQL
Scala, Java, VB.Net Scala
Java
VB.Net
ios
mongodb
iOS
MongoDB

• 7 different data sources
• Rechunking
• Bag-of-sentences
• Task specific “Intrinsic Evaluation”
Word representations

• Usage
• Search on broad term
• How?
• Hierarchy
• How are word vectors useful?
• Add new entity
• Clustering
• Implementation details
• Cosine similarity
Word Representations - Use Case 1: Broad Search

Affinity photo
Pixlr
….
Sql
Data Modeling
Relational Databases
…
Productivity
Software
Graphics
Software
Office
Software
Illustration
Software
Photo editing
Software
Photo
Management
Software
Presentation
Software
Spreadsheet
Software
Adobe Photoshop
Adobe Photoshop
Adobe Photoshop
Affinity photo
Data Modeling
Pixlr
Relational Databases
SQL
Word Representations - Use Case 1: New Entity Ingestion

• Quality of category recommendation
• Parent, grandparent or sibling
• 48.5% increase in ingesting new entity in hierarchy
Word Vector Evaluation: New Entity Ingestion Score

• Usage
• Recommend related entities
• Search results: Exact Match + Related Skills
• Siblings vs Related entities
• How are word vectors used?
Word Representations - Use Case 2: Related Entities Recommendation

• Quality of related entity recommendations
• Skills co-occurrences in Resume
• 49.5% additive increase in the recommendations
Word Vectors Evaluation: Co-occurrence Score

• Usage
• Disambiguation
• Query Expansion
• How are word vectors used?
• Subword model
• Additionally
• Similar meaning
• e.g. Software Developer -> Software Engineer
• Abbreviation
• e.g. MS Excel -> Microsoft Excel
• Partial matches
• e.g. Microsoft Excel 2008 -> Microsoft Excel
• Spelling Errors
Word Representations - Use Case 3 - Synonyms

• Synonyms
• Polysemy
• Compositional phrase vectors
• Document vectors
Ongoing and Future Work

1. P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov. Enriching Word
Vectors with Subword Information. Transactions of the Association
for Computational Linguistics, 5: 135-146, 2017.
2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey
Dean. Distributed representations of words and phrases and their
compositionality. In NIPS, pages 3111–3119, 2013
3. Minh-Thang Luong, Richard Socher, and Christopher D. Manning.
Better Word Representations with Recursive Neural Networks for
Morphology. In Proceedings of the Seventeenth Conference on
Computational Natural Language Learning, pages 104–113, 2013
References

• Chief Data Scientists
• Joseph Turian
• Parag Namjoshi
• Engineering Manager
• Harikrishna Naraynan
• Tech Lead
• Saumil Shah
• Dev Team
• Adam Baker
• Namrata Ghadi
• Sergei Wintzki
• Rohit Kumar
Team

Improving Search in Workday Products using Natural Language Processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Improving Search in Workday Products using Natural Language Processing

Similar to Improving Search in Workday Products using Natural Language Processing (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Improving Search in Workday Products using Natural Language Processing

Editor's Notes