SlideShare a Scribd company logo
Improving Search in Workday Products
using Natural Language Processing
Adam Baker
Namrata Ghadi
About Workday
1. User data is very noisy
• Abbreviations and misspellings are common
• Synonyms are common, eg “word vectors”, “word embeddings”
2. Classification
• Search “data science”, return docs with “deep learning”/R
3. Recommendation
• “deep” → deep learning
• nlp → machine learning
Problems looking for NLP
• Why Word Vectors
• Word Vectors Explained
• Word Representation Use Cases
• Model Evaluation
Agenda
• Captures Semantics Captures Syntax
Athens
Greece
Norway Oslo
dollars dollar
mousemice
Athens - Greece
Why Might Word Vectors Help?
𝑑𝑜𝑙𝑙𝑎𝑟𝑠 − 𝑑𝑜𝑙𝑙𝑎𝑟 + 𝑚𝑜𝑢𝑠𝑒 ≈ 𝑚𝑖𝑐𝑒
• Similar terms point in similar direction: cosine similarity
• Rank suggestions, given existing profile
• Find similar terms in taxonomy
• Cluster for folksonomy, synonyms
• Cheap to train
• No labelled data needed
• Very efficient training algorithms
Why Might Word Vectors Help?
Steps for using fastText:
1. Get a corpus (news feeds, web scrapes, social media posts,
Wikipedia)
2. Minimize the difference between predictions and corpus
3. Tune hyperparameters
Where Do Word Vectors Come From?
backed
Skip Gram with Negative Sampling (SGNS)
𝑣 𝑏𝑎𝑐𝑘𝑒𝑑
𝑐 𝑎
𝑐 𝑛𝑙𝑝
𝑐 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛
𝑐 𝑠𝑦𝑠𝑡𝑒𝑚
“a nlp backed recommendation system”
SGNS in fastText in Detail
• Let 𝑣 𝑤 be canonical vector for word 𝑤
𝑐 𝑤 be context vector for word 𝑤
𝜎 𝑥 =
1
1−𝑒−𝑥
• 𝑝 𝑤 𝑓 = 𝜎(𝑣 𝑓∙ 𝑐 𝑤)
• Maximize 𝑝 𝑐 𝑓 for c in context of focus word f
Minimize 𝑝 𝑛 𝑓 for n randomly sampled from the vocabulary [1, 2]
• “You shall know a word by the company it keeps” - J.R. Firth
• Context window
• Narrow window → functional, syntactic vectors
• Wide window → topical, semantic vectors
• Dimensionality
• Character n-gram model for OOV terms
• Phrases
• “data science” → “data_science”
• compositional vectors (eg [2, 3])
The Art of Training Word Vectors
• Search at the core
• Talent: Candidate search and assignment
• HCM: Job Title, Job Qualification
• Having clean entities is paramount to success of Workday’s products
Why improve search?
Denormalized Entities
Senior Software Engineer, 2009-Present
Software Engineer, 2006 - 2009
Programming Languages: Scala, Java, C#
Database: SQL, MySQL
Senior Software Engineer
Software Engineer
Scala, Java, C#
SQL, MySQL
Senior Software Engineer
Software Engineer
Scala
Java
C#
SQL
MySQL
Senior Software Engineer, 2009-Present
Software Engineer, 2006 – 2009
Senior Software Engineer
Software Engineer
Programming Languages: Scala, Java, C#
Database: SQL, MySQL
Scala, Java, C#
SQL, MySQL
Scala
Java
C#
SQL
• Regex Based Approaches
• Partial Match
• Multiple Entities
• True Casing
• Synonyms Canonicalization
Entity Cleanup
Databases: SQL SQL
Scala, Java, VB.Net Scala
Java
VB.Net
ios
mongodb
iOS
MongoDB
• 7 different data sources
• Rechunking
• Bag-of-sentences
• Task specific “Intrinsic Evaluation”
Word representations
• Usage
• Search on broad term
• How?
• Hierarchy
• How are word vectors useful?
• Add new entity
• Clustering
• Implementation details
• Cosine similarity
Word Representations - Use Case 1: Broad Search
Affinity photo
Pixlr
….
Sql
Data Modeling
Relational Databases
…
Productivity
Software
Graphics
Software
Office
Software
Illustration
Software
Photo editing
Software
Photo
Management
Software
Presentation
Software
Spreadsheet
Software
Adobe Photoshop
Adobe Photoshop
Adobe Photoshop
Affinity photo
Data Modeling
Pixlr
Relational Databases
SQL
Word Representations - Use Case 1: New Entity Ingestion
• Quality of category recommendation
• Parent, grandparent or sibling
• 48.5% increase in ingesting new entity in hierarchy
Word Vector Evaluation: New Entity Ingestion Score
• Usage
• Recommend related entities
• Search results: Exact Match + Related Skills
• Siblings vs Related entities
• How are word vectors used?
• Cosine similarity
Word Representations - Use Case 2: Related Entities Recommendation
• Quality of related entity recommendations
• Skills co-occurrences in Resume
• 49.5% additive increase in the recommendations
Word Vectors Evaluation: Co-occurrence Score
• Usage
• Disambiguation
• Query Expansion
• How are word vectors used?
• Subword model
• Cosine similarity
• Additionally
• Similar meaning
• e.g. Software Developer -> Software Engineer
• Abbreviation
• e.g. MS Excel -> Microsoft Excel
• Partial matches
• e.g. Microsoft Excel 2008 -> Microsoft Excel
• Spelling Errors
Word Representations - Use Case 3 - Synonyms
• Synonyms
• Polysemy
• Compositional phrase vectors
• Document vectors
Ongoing and Future Work
1. P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov. Enriching Word
Vectors with Subword Information. Transactions of the Association
for Computational Linguistics, 5: 135-146, 2017.
2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey
Dean. Distributed representations of words and phrases and their
compositionality. In NIPS, pages 3111–3119, 2013
3. Minh-Thang Luong, Richard Socher, and Christopher D. Manning.
Better Word Representations with Recursive Neural Networks for
Morphology. In Proceedings of the Seventeenth Conference on
Computational Natural Language Learning, pages 104–113, 2013
References
• Chief Data Scientists
• Joseph Turian
• Parag Namjoshi
• Engineering Manager
• Harikrishna Naraynan
• Tech Lead
• Saumil Shah
• Dev Team
• Adam Baker
• Namrata Ghadi
• Sergei Wintzki
• Rohit Kumar
Team

More Related Content

What's hot

Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
DoKC
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Rust: Systems Programming for Everyone
Rust: Systems Programming for EveryoneRust: Systems Programming for Everyone
Rust: Systems Programming for Everyone
C4Media
 
SyScan Singapore 2010 - Returning Into The PHP-Interpreter
SyScan Singapore 2010 - Returning Into The PHP-InterpreterSyScan Singapore 2010 - Returning Into The PHP-Interpreter
SyScan Singapore 2010 - Returning Into The PHP-Interpreter
Stefan Esser
 
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm.com Formation Elastic : Maitriser les fondamentauxAlphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm
 
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Spark Summit
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache Spark
Databricks
 
Models for hierarchical data
Models for hierarchical dataModels for hierarchical data
Models for hierarchical data
Karwin Software Solutions LLC
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
 
Web and Mobile Application Security
Web and Mobile Application SecurityWeb and Mobile Application Security
Web and Mobile Application Security
Prateek Jain
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
Traian Rebedea
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
confluent
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
Uptime Technologies LLC
 
REST API Pentester's perspective
REST API Pentester's perspectiveREST API Pentester's perspective
REST API Pentester's perspective
SecuRing
 
Querying Elasticsearch with Deep Learning to Answer Natural Language Questions
Querying Elasticsearch with Deep Learning to Answer Natural Language QuestionsQuerying Elasticsearch with Deep Learning to Answer Natural Language Questions
Querying Elasticsearch with Deep Learning to Answer Natural Language Questions
Sebastian Blank
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
Sam Shah
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
nullowaspmumbai
 
Secure PHP Coding
Secure PHP CodingSecure PHP Coding
Secure PHP Coding
Narudom Roongsiriwong, CISSP
 

What's hot (20)

Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Rust: Systems Programming for Everyone
Rust: Systems Programming for EveryoneRust: Systems Programming for Everyone
Rust: Systems Programming for Everyone
 
SyScan Singapore 2010 - Returning Into The PHP-Interpreter
SyScan Singapore 2010 - Returning Into The PHP-InterpreterSyScan Singapore 2010 - Returning Into The PHP-Interpreter
SyScan Singapore 2010 - Returning Into The PHP-Interpreter
 
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm.com Formation Elastic : Maitriser les fondamentauxAlphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm.com Formation Elastic : Maitriser les fondamentaux
 
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache Spark
 
Models for hierarchical data
Models for hierarchical dataModels for hierarchical data
Models for hierarchical data
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Web and Mobile Application Security
Web and Mobile Application SecurityWeb and Mobile Application Security
Web and Mobile Application Security
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
 
REST API Pentester's perspective
REST API Pentester's perspectiveREST API Pentester's perspective
REST API Pentester's perspective
 
Querying Elasticsearch with Deep Learning to Answer Natural Language Questions
Querying Elasticsearch with Deep Learning to Answer Natural Language QuestionsQuerying Elasticsearch with Deep Learning to Answer Natural Language Questions
Querying Elasticsearch with Deep Learning to Answer Natural Language Questions
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
 
Secure PHP Coding
Secure PHP CodingSecure PHP Coding
Secure PHP Coding
 

Similar to Improving Search in Workday Products using Natural Language Processing

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Simon Hughes
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
Simon Hughes
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
OpenSource Connections
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
Simon Hughes
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...
Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...
Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...
National Inistitute of Informatics (NII), Tokyo, Japann
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 
Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Debanjan Mahata
 
Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017
Debanjan Mahata
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
Findwise
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Andrea Wiggins
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
Yiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
Aman Grover
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
Nathan McMinn
 

Similar to Improving Search in Workday Products using Natural Language Processing (20)

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...
Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...
Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
394 wade word2007-ssp2008
394 wade word2007-ssp2008394 wade word2007-ssp2008
394 wade word2007-ssp2008
 
Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017
 
Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Improving Search in Workday Products using Natural Language Processing

  • 1. Improving Search in Workday Products using Natural Language Processing Adam Baker Namrata Ghadi
  • 3. 1. User data is very noisy • Abbreviations and misspellings are common • Synonyms are common, eg “word vectors”, “word embeddings” 2. Classification • Search “data science”, return docs with “deep learning”/R 3. Recommendation • “deep” → deep learning • nlp → machine learning Problems looking for NLP
  • 4. • Why Word Vectors • Word Vectors Explained • Word Representation Use Cases • Model Evaluation Agenda
  • 5. • Captures Semantics Captures Syntax Athens Greece Norway Oslo dollars dollar mousemice Athens - Greece Why Might Word Vectors Help? 𝑑𝑜𝑙𝑙𝑎𝑟𝑠 − 𝑑𝑜𝑙𝑙𝑎𝑟 + 𝑚𝑜𝑢𝑠𝑒 ≈ 𝑚𝑖𝑐𝑒
  • 6. • Similar terms point in similar direction: cosine similarity • Rank suggestions, given existing profile • Find similar terms in taxonomy • Cluster for folksonomy, synonyms • Cheap to train • No labelled data needed • Very efficient training algorithms Why Might Word Vectors Help?
  • 7. Steps for using fastText: 1. Get a corpus (news feeds, web scrapes, social media posts, Wikipedia) 2. Minimize the difference between predictions and corpus 3. Tune hyperparameters Where Do Word Vectors Come From?
  • 8. backed Skip Gram with Negative Sampling (SGNS) 𝑣 𝑏𝑎𝑐𝑘𝑒𝑑 𝑐 𝑎 𝑐 𝑛𝑙𝑝 𝑐 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑎𝑡𝑖𝑜𝑛 𝑐 𝑠𝑦𝑠𝑡𝑒𝑚 “a nlp backed recommendation system”
  • 9. SGNS in fastText in Detail • Let 𝑣 𝑤 be canonical vector for word 𝑤 𝑐 𝑤 be context vector for word 𝑤 𝜎 𝑥 = 1 1−𝑒−𝑥 • 𝑝 𝑤 𝑓 = 𝜎(𝑣 𝑓∙ 𝑐 𝑤) • Maximize 𝑝 𝑐 𝑓 for c in context of focus word f Minimize 𝑝 𝑛 𝑓 for n randomly sampled from the vocabulary [1, 2]
  • 10. • “You shall know a word by the company it keeps” - J.R. Firth • Context window • Narrow window → functional, syntactic vectors • Wide window → topical, semantic vectors • Dimensionality • Character n-gram model for OOV terms • Phrases • “data science” → “data_science” • compositional vectors (eg [2, 3]) The Art of Training Word Vectors
  • 11. • Search at the core • Talent: Candidate search and assignment • HCM: Job Title, Job Qualification • Having clean entities is paramount to success of Workday’s products Why improve search?
  • 12. Denormalized Entities Senior Software Engineer, 2009-Present Software Engineer, 2006 - 2009 Programming Languages: Scala, Java, C# Database: SQL, MySQL Senior Software Engineer Software Engineer Scala, Java, C# SQL, MySQL Senior Software Engineer Software Engineer Scala Java C# SQL MySQL Senior Software Engineer, 2009-Present Software Engineer, 2006 – 2009 Senior Software Engineer Software Engineer Programming Languages: Scala, Java, C# Database: SQL, MySQL Scala, Java, C# SQL, MySQL Scala Java C# SQL
  • 13. • Regex Based Approaches • Partial Match • Multiple Entities • True Casing • Synonyms Canonicalization Entity Cleanup Databases: SQL SQL Scala, Java, VB.Net Scala Java VB.Net ios mongodb iOS MongoDB
  • 14. • 7 different data sources • Rechunking • Bag-of-sentences • Task specific “Intrinsic Evaluation” Word representations
  • 15. • Usage • Search on broad term • How? • Hierarchy • How are word vectors useful? • Add new entity • Clustering • Implementation details • Cosine similarity Word Representations - Use Case 1: Broad Search
  • 16. Affinity photo Pixlr …. Sql Data Modeling Relational Databases … Productivity Software Graphics Software Office Software Illustration Software Photo editing Software Photo Management Software Presentation Software Spreadsheet Software Adobe Photoshop Adobe Photoshop Adobe Photoshop Affinity photo Data Modeling Pixlr Relational Databases SQL Word Representations - Use Case 1: New Entity Ingestion
  • 17. • Quality of category recommendation • Parent, grandparent or sibling • 48.5% increase in ingesting new entity in hierarchy Word Vector Evaluation: New Entity Ingestion Score
  • 18. • Usage • Recommend related entities • Search results: Exact Match + Related Skills • Siblings vs Related entities • How are word vectors used? • Cosine similarity Word Representations - Use Case 2: Related Entities Recommendation
  • 19.
  • 20. • Quality of related entity recommendations • Skills co-occurrences in Resume • 49.5% additive increase in the recommendations Word Vectors Evaluation: Co-occurrence Score
  • 21. • Usage • Disambiguation • Query Expansion • How are word vectors used? • Subword model • Cosine similarity • Additionally • Similar meaning • e.g. Software Developer -> Software Engineer • Abbreviation • e.g. MS Excel -> Microsoft Excel • Partial matches • e.g. Microsoft Excel 2008 -> Microsoft Excel • Spelling Errors Word Representations - Use Case 3 - Synonyms
  • 22. • Synonyms • Polysemy • Compositional phrase vectors • Document vectors Ongoing and Future Work
  • 23. 1. P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5: 135-146, 2017. 2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119, 2013 3. Minh-Thang Luong, Richard Socher, and Christopher D. Manning. Better Word Representations with Recursive Neural Networks for Morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 104–113, 2013 References
  • 24. • Chief Data Scientists • Joseph Turian • Parag Namjoshi • Engineering Manager • Harikrishna Naraynan • Tech Lead • Saumil Shah • Dev Team • Adam Baker • Namrata Ghadi • Sergei Wintzki • Rohit Kumar Team

Editor's Notes

  1. New category?