SlideShare a Scribd company logo
1 of 18
Benchmarking of a Novel POS
Tagging Based Semantic Similarity
Approach for Job Description
Similarity Computation
Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu
Shekhar Singh, Gyana Parija
IBM Research Lab, India
Presenter: Joydeep
What exactly it is trying to solve?
Job Description document1 Job Description document2
Semantic Similarity Score
Business Application (Where & Why
it is needed?)
• IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using sate of the art document
similarity methods.
Reason: Almost all state of the art
document similarity methods use
data to train their model. Clients
generally are hesitated to share
their data for training purposes.
How the problem has been solved?
Represent each job
description (Job
Responsibility) into
object-action-
attribute triplet set
by using Stanford
POS Tagger
dependency tree
Calculate semantic
similarity between
each of the
elements of these
categories of triplet
sets of Doc1 with
that of Doc2 and
create a similarity
matrix (symmetric
strongly connected
graph)
Map each category
of one job
responsibility to
those of the other
job responsibility
Calculate the
overall similarity
Example
• Job responsibility1:
• Determines operational feasibility by evaluating analysis, problem definition, requirements, solution
development, and proposed solutions
• Job responsibility2:
• Regulate operational viability. Evaluates survey, requirements and resolution development.
• Representation of Job responsibility1 :
• action: determines, object: feasibility, attributes: [operational ]
• action: evaluating , object: analysis, attributes: []
• action: evaluating , object: problem definition, attributes: []
• action: evaluating , object: requirements, attributes: []
• action: evaluating , object: solution development, attributes: []
• action: evaluating , object: solutions, attributes: [proposed]
• Representation of Job responsibility1 :
• action: regulate , object: viability, attributes: [operational ]
• action: evaluates, object: survey, attributes: []
• action: evaluates, object: requirements, attributes: []
• action: evaluates, object: resolution development, attributes: []
8
VERB NOUN ADJECTIVE
DETERMINE FEASIBIITY OPERATIONAL
VERB NOUN ADJECTIVE
EVALUATING
ANALYSIS
PROBLEM DEFINITION
REQUIREMENTS
SOLUTION DEVELOPMENT
SOLUTIONS PROPOSED
STANFORD POS
TAGGER
DEPENDECY
GRAPH PARSING
Example (Continue…)
• action Similarity:
• action Similarity Matrix :
• action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem
• Regulate -> determine = 0.30
• evaluate-> evaluate = 1.0
• object Similarity:
• object similarity matrices:
Determine evaluate
regulate 0.30 0.25
evaluate 0.31 1.0
feasibility
viability 0.60
9
survey requirements Resolution
development
analysis 0.36 0.30 0.15
Problem
definition
0.16 0.13 0.26
requirements 0.24 1.0 0.15
Solution
development
0.17 0.15 0.65
Solution 0.34 0.31 0.15
Example (Continue…)
• object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem
• feasibility -> viability = 0.60
• analysis -> survey = 0.36
• requirements -> requirements = 1.0
• solution development -> resolution development = 0.65
• attribute Similarity:
• attribute similarity Matrix:
• attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem
• Operational -> operational ….. (1) = 1.0
• Total similarity score :
• ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0
*(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275
Operational
operational 1.0
10Importance order: Action, Object, Attribute
System Diagram
SENTENCE TOKENIZER
d
[s1,s2,s3…]
WORD TOKENIZER + POS TAGGER
SEMANTIC SIMILARITY SCORE CALCULATON
[v1, n1, a1]
[v1, n2, a1]
s1
s2
.
.
Triplet Formation
MULTILEVEL IMBALANCED CLASSICAL
ASSIGNMENT PROBLEM
1
2
3
4
[v2, n3, a2]
T
T T’
uses precomputed
memorized scores
0.767
Similarity Score
PART 1
PART 2
11
Experiment Setup and Evaluation
– F = {F1, F2, ..., Fn} be the set of all job families in the test set.
– Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi.
– Intrai be the average similarity between all pairs of jobs within Fi
– Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi
And B∈Fj for all j ≠ i
– Ri = Intrai / Interi
S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 )
• N1 = 56, N2 = 129 and N3 = 430 documents used for training.
• POSDC does not require any training corpus, therefore the corpus varying experiments are valid
only for doc2vec and LDA.
• test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent
frameworks
Results
Comparison of S value Across Methods
Results
Core Novelty
• A system to represent a Job Description Document as a collection of
action, object and attribute triplets.
• A method to find similarity score between two such triplets
representations by hierarchically matching of triplets across
documents. It is done as an imbalanced assignment problem to find
the best match (non-greedy, to maximize the overall match score) of
all triplets in the two documents.
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

More Related Content

Similar to Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
powerpoint
powerpointpowerpoint
powerpointbutest
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to peopleFabian Abel
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative AttributesVikas Jain
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Searchceya
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLGeorge Simov
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxDIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxlynettearnold46882
 
Part 1
Part 1Part 1
Part 1butest
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasexabhaysonone0
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksFranco Maria Nardini
 
Empirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxEmpirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxghufranullah25
 

Similar to Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation (20)

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
powerpoint
powerpointpowerpoint
powerpoint
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to people
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative Attributes
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Search
 
Nl201609
Nl201609Nl201609
Nl201609
 
Job evaluation
Job evaluationJob evaluation
Job evaluation
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
assia2015sakai
assia2015sakaiassia2015sakai
assia2015sakai
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxDIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
 
Part 1
Part 1Part 1
Part 1
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search Tasks
 
Empirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxEmpirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptx
 

Recently uploaded

Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 

Recently uploaded (20)

Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

  • 1. Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu Shekhar Singh, Gyana Parija IBM Research Lab, India Presenter: Joydeep
  • 2. What exactly it is trying to solve?
  • 3. Job Description document1 Job Description document2 Semantic Similarity Score
  • 4. Business Application (Where & Why it is needed?)
  • 5. • IBM Watson Recruitment (IWR) : https://www.ibm.com/talent- management/hr-solutions/recruiting-software Mapping requisition jobs to the available job taxonomy without using sate of the art document similarity methods. Reason: Almost all state of the art document similarity methods use data to train their model. Clients generally are hesitated to share their data for training purposes.
  • 6. How the problem has been solved?
  • 7. Represent each job description (Job Responsibility) into object-action- attribute triplet set by using Stanford POS Tagger dependency tree Calculate semantic similarity between each of the elements of these categories of triplet sets of Doc1 with that of Doc2 and create a similarity matrix (symmetric strongly connected graph) Map each category of one job responsibility to those of the other job responsibility Calculate the overall similarity
  • 8. Example • Job responsibility1: • Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions • Job responsibility2: • Regulate operational viability. Evaluates survey, requirements and resolution development. • Representation of Job responsibility1 : • action: determines, object: feasibility, attributes: [operational ] • action: evaluating , object: analysis, attributes: [] • action: evaluating , object: problem definition, attributes: [] • action: evaluating , object: requirements, attributes: [] • action: evaluating , object: solution development, attributes: [] • action: evaluating , object: solutions, attributes: [proposed] • Representation of Job responsibility1 : • action: regulate , object: viability, attributes: [operational ] • action: evaluates, object: survey, attributes: [] • action: evaluates, object: requirements, attributes: [] • action: evaluates, object: resolution development, attributes: [] 8 VERB NOUN ADJECTIVE DETERMINE FEASIBIITY OPERATIONAL VERB NOUN ADJECTIVE EVALUATING ANALYSIS PROBLEM DEFINITION REQUIREMENTS SOLUTION DEVELOPMENT SOLUTIONS PROPOSED STANFORD POS TAGGER DEPENDECY GRAPH PARSING
  • 9. Example (Continue…) • action Similarity: • action Similarity Matrix : • action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem • Regulate -> determine = 0.30 • evaluate-> evaluate = 1.0 • object Similarity: • object similarity matrices: Determine evaluate regulate 0.30 0.25 evaluate 0.31 1.0 feasibility viability 0.60 9 survey requirements Resolution development analysis 0.36 0.30 0.15 Problem definition 0.16 0.13 0.26 requirements 0.24 1.0 0.15 Solution development 0.17 0.15 0.65 Solution 0.34 0.31 0.15
  • 10. Example (Continue…) • object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem • feasibility -> viability = 0.60 • analysis -> survey = 0.36 • requirements -> requirements = 1.0 • solution development -> resolution development = 0.65 • attribute Similarity: • attribute similarity Matrix: • attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem • Operational -> operational ….. (1) = 1.0 • Total similarity score : • ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0 *(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275 Operational operational 1.0 10Importance order: Action, Object, Attribute
  • 11. System Diagram SENTENCE TOKENIZER d [s1,s2,s3…] WORD TOKENIZER + POS TAGGER SEMANTIC SIMILARITY SCORE CALCULATON [v1, n1, a1] [v1, n2, a1] s1 s2 . . Triplet Formation MULTILEVEL IMBALANCED CLASSICAL ASSIGNMENT PROBLEM 1 2 3 4 [v2, n3, a2] T T T’ uses precomputed memorized scores 0.767 Similarity Score PART 1 PART 2 11
  • 12.
  • 13. Experiment Setup and Evaluation – F = {F1, F2, ..., Fn} be the set of all job families in the test set. – Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi. – Intrai be the average similarity between all pairs of jobs within Fi – Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi And B∈Fj for all j ≠ i – Ri = Intrai / Interi S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ) • N1 = 56, N2 = 129 and N3 = 430 documents used for training. • POSDC does not require any training corpus, therefore the corpus varying experiments are valid only for doc2vec and LDA. • test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent frameworks
  • 14. Results Comparison of S value Across Methods
  • 16. Core Novelty • A system to represent a Job Description Document as a collection of action, object and attribute triplets. • A method to find similarity score between two such triplets representations by hierarchically matching of triplets across documents. It is done as an imbalanced assignment problem to find the best match (non-greedy, to maximize the overall match score) of all triplets in the two documents.