SlideShare a Scribd company logo
Benchmarking of a Novel POS
Tagging Based Semantic Similarity
Approach for Job Description
Similarity Computation
Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu
Shekhar Singh, Gyana Parija
IBM Research Lab, India
Presenter: Joydeep
What exactly it is trying to solve?
Job Description document1 Job Description document2
Semantic Similarity Score
Business Application (Where & Why
it is needed?)
• IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using sate of the art document
similarity methods.
Reason: Almost all state of the art
document similarity methods use
data to train their model. Clients
generally are hesitated to share
their data for training purposes.
How the problem has been solved?
Represent each job
description (Job
Responsibility) into
object-action-
attribute triplet set
by using Stanford
POS Tagger
dependency tree
Calculate semantic
similarity between
each of the
elements of these
categories of triplet
sets of Doc1 with
that of Doc2 and
create a similarity
matrix (symmetric
strongly connected
graph)
Map each category
of one job
responsibility to
those of the other
job responsibility
Calculate the
overall similarity
Example
• Job responsibility1:
• Determines operational feasibility by evaluating analysis, problem definition, requirements, solution
development, and proposed solutions
• Job responsibility2:
• Regulate operational viability. Evaluates survey, requirements and resolution development.
• Representation of Job responsibility1 :
• action: determines, object: feasibility, attributes: [operational ]
• action: evaluating , object: analysis, attributes: []
• action: evaluating , object: problem definition, attributes: []
• action: evaluating , object: requirements, attributes: []
• action: evaluating , object: solution development, attributes: []
• action: evaluating , object: solutions, attributes: [proposed]
• Representation of Job responsibility1 :
• action: regulate , object: viability, attributes: [operational ]
• action: evaluates, object: survey, attributes: []
• action: evaluates, object: requirements, attributes: []
• action: evaluates, object: resolution development, attributes: []
8
VERB NOUN ADJECTIVE
DETERMINE FEASIBIITY OPERATIONAL
VERB NOUN ADJECTIVE
EVALUATING
ANALYSIS
PROBLEM DEFINITION
REQUIREMENTS
SOLUTION DEVELOPMENT
SOLUTIONS PROPOSED
STANFORD POS
TAGGER
DEPENDECY
GRAPH PARSING
Example (Continue…)
• action Similarity:
• action Similarity Matrix :
• action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem
• Regulate -> determine = 0.30
• evaluate-> evaluate = 1.0
• object Similarity:
• object similarity matrices:
Determine evaluate
regulate 0.30 0.25
evaluate 0.31 1.0
feasibility
viability 0.60
9
survey requirements Resolution
development
analysis 0.36 0.30 0.15
Problem
definition
0.16 0.13 0.26
requirements 0.24 1.0 0.15
Solution
development
0.17 0.15 0.65
Solution 0.34 0.31 0.15
Example (Continue…)
• object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem
• feasibility -> viability = 0.60
• analysis -> survey = 0.36
• requirements -> requirements = 1.0
• solution development -> resolution development = 0.65
• attribute Similarity:
• attribute similarity Matrix:
• attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem
• Operational -> operational ….. (1) = 1.0
• Total similarity score :
• ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0
*(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275
Operational
operational 1.0
10Importance order: Action, Object, Attribute
System Diagram
SENTENCE TOKENIZER
d
[s1,s2,s3…]
WORD TOKENIZER + POS TAGGER
SEMANTIC SIMILARITY SCORE CALCULATON
[v1, n1, a1]
[v1, n2, a1]
s1
s2
.
.
Triplet Formation
MULTILEVEL IMBALANCED CLASSICAL
ASSIGNMENT PROBLEM
1
2
3
4
[v2, n3, a2]
T
T T’
uses precomputed
memorized scores
0.767
Similarity Score
PART 1
PART 2
11
Experiment Setup and Evaluation
– F = {F1, F2, ..., Fn} be the set of all job families in the test set.
– Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi.
– Intrai be the average similarity between all pairs of jobs within Fi
– Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi
And B∈Fj for all j ≠ i
– Ri = Intrai / Interi
S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 )
• N1 = 56, N2 = 129 and N3 = 430 documents used for training.
• POSDC does not require any training corpus, therefore the corpus varying experiments are valid
only for doc2vec and LDA.
• test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent
frameworks
Results
Comparison of S value Across Methods
Results
Core Novelty
• A system to represent a Job Description Document as a collection of
action, object and attribute triplets.
• A method to find similarity score between two such triplets
representations by hierarchically matching of triplets across
documents. It is done as an imbalanced assignment problem to find
the best match (non-greedy, to maximize the overall match score) of
all triplets in the two documents.
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

More Related Content

Similar to Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
Aditya Joshi
 
powerpoint
powerpointpowerpoint
powerpoint
butest
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to people
Fabian Abel
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
Ruby Shrestha
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Alejandro Bellogin
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative Attributes
Vikas Jain
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
Joaquin Vanschoren
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Search
ceya
 
Nl201609
Nl201609Nl201609
Nl201609
Tetsuya Sakai
 
Job evaluation
Job evaluationJob evaluation
Job evaluation
Dr. Saswat Barpanda
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
George Simov
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
assia2015sakai
assia2015sakaiassia2015sakai
assia2015sakai
Tetsuya Sakai
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET Journal
 
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxDIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
lynettearnold46882
 
Part 1
Part 1Part 1
Part 1
butest
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex
abhaysonone0
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
Shruti Mohan
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search Tasks
Franco Maria Nardini
 
Empirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxEmpirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptx
ghufranullah25
 

Similar to Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation (20)

Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
powerpoint
powerpointpowerpoint
powerpoint
 
Recommending job ads to people
Recommending job ads to peopleRecommending job ads to people
Recommending job ads to people
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Learning with Relative Attributes
Learning with Relative AttributesLearning with Relative Attributes
Learning with Relative Attributes
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Matching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web SearchMatching Task Profiles And User Needs In Personalized Web Search
Matching Task Profiles And User Needs In Personalized Web Search
 
Nl201609
Nl201609Nl201609
Nl201609
 
Job evaluation
Job evaluationJob evaluation
Job evaluation
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
assia2015sakai
assia2015sakaiassia2015sakai
assia2015sakai
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docxDIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
DIRECTIONS READ THE FOLLOWING STUDENT POST AND RESPOND EVALUATE I.docx
 
Part 1
Part 1Part 1
Part 1
 
316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex316_16SCCCS4_2020052505222431.pptdatabasex
316_16SCCCS4_2020052505222431.pptdatabasex
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Learning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search TasksLearning To Rank User Queries to Detect Search Tasks
Learning To Rank User Queries to Detect Search Tasks
 
Empirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptxEmpirical Study on Collaborative Software in the field of Machine learning.pptx
Empirical Study on Collaborative Software in the field of Machine learning.pptx
 

Recently uploaded

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

  • 1. Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation Authors: Joydeep Mondal, Sarthak Ahuja, Kushal Mukherjee, Sudhanshu Shekhar Singh, Gyana Parija IBM Research Lab, India Presenter: Joydeep
  • 2. What exactly it is trying to solve?
  • 3. Job Description document1 Job Description document2 Semantic Similarity Score
  • 4. Business Application (Where & Why it is needed?)
  • 5. • IBM Watson Recruitment (IWR) : https://www.ibm.com/talent- management/hr-solutions/recruiting-software Mapping requisition jobs to the available job taxonomy without using sate of the art document similarity methods. Reason: Almost all state of the art document similarity methods use data to train their model. Clients generally are hesitated to share their data for training purposes.
  • 6. How the problem has been solved?
  • 7. Represent each job description (Job Responsibility) into object-action- attribute triplet set by using Stanford POS Tagger dependency tree Calculate semantic similarity between each of the elements of these categories of triplet sets of Doc1 with that of Doc2 and create a similarity matrix (symmetric strongly connected graph) Map each category of one job responsibility to those of the other job responsibility Calculate the overall similarity
  • 8. Example • Job responsibility1: • Determines operational feasibility by evaluating analysis, problem definition, requirements, solution development, and proposed solutions • Job responsibility2: • Regulate operational viability. Evaluates survey, requirements and resolution development. • Representation of Job responsibility1 : • action: determines, object: feasibility, attributes: [operational ] • action: evaluating , object: analysis, attributes: [] • action: evaluating , object: problem definition, attributes: [] • action: evaluating , object: requirements, attributes: [] • action: evaluating , object: solution development, attributes: [] • action: evaluating , object: solutions, attributes: [proposed] • Representation of Job responsibility1 : • action: regulate , object: viability, attributes: [operational ] • action: evaluates, object: survey, attributes: [] • action: evaluates, object: requirements, attributes: [] • action: evaluates, object: resolution development, attributes: [] 8 VERB NOUN ADJECTIVE DETERMINE FEASIBIITY OPERATIONAL VERB NOUN ADJECTIVE EVALUATING ANALYSIS PROBLEM DEFINITION REQUIREMENTS SOLUTION DEVELOPMENT SOLUTIONS PROPOSED STANFORD POS TAGGER DEPENDECY GRAPH PARSING
  • 9. Example (Continue…) • action Similarity: • action Similarity Matrix : • action Assignments: (VA) : Hungarian method to solve imbalanced assignment problem • Regulate -> determine = 0.30 • evaluate-> evaluate = 1.0 • object Similarity: • object similarity matrices: Determine evaluate regulate 0.30 0.25 evaluate 0.31 1.0 feasibility viability 0.60 9 survey requirements Resolution development analysis 0.36 0.30 0.15 Problem definition 0.16 0.13 0.26 requirements 0.24 1.0 0.15 Solution development 0.17 0.15 0.65 Solution 0.34 0.31 0.15
  • 10. Example (Continue…) • object Assignments: (NA ) Hungarian method to solve imbalanced assignment problem • feasibility -> viability = 0.60 • analysis -> survey = 0.36 • requirements -> requirements = 1.0 • solution development -> resolution development = 0.65 • attribute Similarity: • attribute similarity Matrix: • attribute Assignment: (AA) Hungarian method to solve imbalanced assignment problem • Operational -> operational ….. (1) = 1.0 • Total similarity score : • ((VA1 ( 1+ NA1 (1+AA1) /3)+( VA2 (1+ (NA2 + NA3 + NA4 ) /3) )/2)/2 = ((0.30 (1+0.60 * (1+1.0)/3 )) + (1.0 *(1+ (0.36 + 1.0 + 0.65)/3)))/2 =0.5275 Operational operational 1.0 10Importance order: Action, Object, Attribute
  • 11. System Diagram SENTENCE TOKENIZER d [s1,s2,s3…] WORD TOKENIZER + POS TAGGER SEMANTIC SIMILARITY SCORE CALCULATON [v1, n1, a1] [v1, n2, a1] s1 s2 . . Triplet Formation MULTILEVEL IMBALANCED CLASSICAL ASSIGNMENT PROBLEM 1 2 3 4 [v2, n3, a2] T T T’ uses precomputed memorized scores 0.767 Similarity Score PART 1 PART 2 11
  • 12.
  • 13. Experiment Setup and Evaluation – F = {F1, F2, ..., Fn} be the set of all job families in the test set. – Ji = {Ji,1, Ji,2, ..., Ji,ni } be the set of all jobs in family Fi. – Intrai be the average similarity between all pairs of jobs within Fi – Interi be the average similarity between all pairs (A, B) of jobs such that A ∈ Fi And B∈Fj for all j ≠ i – Ri = Intrai / Interi S = 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ∗ 𝑅𝑖 / 𝑖 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦(𝐹𝑖 ) • N1 = 56, N2 = 129 and N3 = 430 documents used for training. • POSDC does not require any training corpus, therefore the corpus varying experiments are valid only for doc2vec and LDA. • test set consisted of 500 randomly chosen jobs out of the 2344 available in IBM Kenexa talent frameworks
  • 14. Results Comparison of S value Across Methods
  • 16. Core Novelty • A system to represent a Job Description Document as a collection of action, object and attribute triplets. • A method to find similarity score between two such triplets representations by hierarchically matching of triplets across documents. It is done as an imbalanced assignment problem to find the best match (non-greedy, to maximize the overall match score) of all triplets in the two documents.