SlideShare a Scribd company logo
Similarity Computation Exploiting
the Semantic and Syntactic
Inherent Structure Among Job
Titles
Authors: Sarthak Ahuja1, Joydeep Mondal1, Sudhanhsu Shekhar Singh1 and
David Glenn George2
1 IBM Research Lab, India
2 IBM Talent Management Solutions, Portsmouth, UK
Presenter: Joydeep
What exactly it is trying to solve?
List of Available Job Titles
• System Engineer
• Software Developer
• Senior Software Engineer
• Junior Network Engineer
• Junior Software Tester
Query Job Title
• Junior Software Engineer
No Other information (job descriptions or other
details except TITLE) is available corresponding to
these jobs
Similarity
Computation
Similarity
ComputationSimilarity
Computation
Similarity
Computation
Similarity
Computation
Best Match
Business Application (Where & Why
it is needed?)
• IBM Watson Recruitment (IWR) : https://www.ibm.com/talent-
management/hr-solutions/recruiting-software
Mapping requisition jobs to the available job
taxonomy without using computation intensive and
time consuming sate of the art document similarity
methods by narrow down the search space
How the problem has been solved?
Job Title Matching
Split Title keywords
into Three categories
(Domain, Functional,
Attribute)
Map each category of
one job title to those
of the other title
Example
• Title = “Junior Software Engineer”
• Domain keywords Set = [“Software”]
• Functional keywords Set = [“Engineer”]
• Attribute Keywords set = [“Junior”]
Title = “Junior Software Engineer”
Map Domain, Functional, Attribute keyword sets of one title to those of the
other title
Methods
• Objective: Any job title can be split into the attribute, functional and core descriptor/domain words.
• Input:
• Job Title (T)
• Output:
• 3 sets , Attribute words set (SA), functional words set (SF) and core descriptor/domain words set (SD)
• Resources/ Existing techniques used:
• Acronym dictionary (DictA ), Spell checker technique (TechS ), Classifier model (Mclass)
• Algorithm:
• Step 1: SWord = split the title T into separate words
• Step 2: for each word in Sword
• Step 2.1: word = resolve acronyms of word using DictA
• Step 2.2: word = resolve the spelling mistake using TechS
• Step 2.3: classify word using Mclass as either a Attribute (A) word or a functional word (F) or a core descriptor/domain word (D)
• Step2.4: Append word to the corresponding set (SA , SF , SD ) depending upon it’s class label (A, F, D)
• Feature vector used in Classifier model (Mclass):
• [POS (part of speech) of the word, position of the word in job title (T) (first word/last word/in between
word), POS of the root word for each word, word ends with “er”/”or”/”ar” or not]
• Why we used these features?
• POS (part of speech) of the word : We found most of the attribute-words are adjectives, e.g. Senior, Junior etc., most of the
functional-words are noun, e.g. developer, tester, teacher and most of the core descriptor/domain words are also noun, e.g.
Software, Network etc.
• position of the word in job title (T) (first word/last word/in between word) : We found that attribute-words are generally the first or
last words of the title e.g.: Senior software developer, Network administrator junior etc. Most of the functional-words appear as in-
between or last word of the title e.g.: Senior software developer, Network administrator junior etc. We also found that most of the
core descriptor/domain words appears as in-between or first word in a title e.g.: Senior software developer, Network administrator
junior etc.
• POS of the root word for each word : Our analysis showed that POS of the root word corresponding to the functional-words are verb,
e.g. : Senior software developer : root word for developer = “develop” which is a verb. We used
https://www.vocabulary.com/dictionary/ open source online dictionary to get the root words.
• word ends with “er”/”or”/”ar” or not: We also found that most of the functional words end with either of these three substrings
“er”/”or”/”ar”, e.g. : teacher, developer, engineer etc.
I’m the
Best!
Functional classifier o/p
-> input of Attribute
Classifier
Functional Classifier o/p
+ Attribute Classifier
o/p -> input of Domain
Classifier
Methods
Objective: mapping three category-set of words (Attribute, Functional and core descriptor/domain)
corresponding to the two titles among themselves using classical imbalanced assignment problem. Then the
mapping scores are combined based on weighted or hierarchical scoring scheme to generate job title similarity.
• Input:
• Job Title1 (T1), Job Titl2 (T2)
• Output:
• Similarity score (s) between T1 and T2
• Resources/ Existing techniques used:
• Wordnet Dictionary API (W), Hungarian method to solve imbalanced assignment problem (TH)
• Algorithm:
• Step 1: extract (SA1 , SF1 , SD1 ) from T1 and (SA2 , SF2 , SD2 ) from T2 by previous method
• Step 2: Get the mappings as MA(SA1 : SA2 ), MF(SF1 : SF2 ) and MD(SD1 : SD2 ) by TH
• Step 3: calculate the mapping similarity score simA , simF and simD for MA , MF and MD respectively.
• Step 4: S = simD (1+ simF (1 + simA ))/ (IndicatorD + IndicatorF + IndicatorA ) // importance order : D, F and A respectively.
• We used Wordnet Dictionary API (W) to calculate semantic similarity between two words. We built a
semantic similarity score matrix for each pair of sets (SA1 : SA2 ), (SF1 : SF2 ) and (SD1 : SD2 ) and provide this
matrix to TH as input. We also use the same matrix to calculate simA , simF and simD for MA , MF and MD.
System Architecture Diagram
System Architecture Diagram + Example
Results
Core Novelty
1 . Any job title can be split into three categories the attribute, functional and core
descriptor/domain words.
2. Job title similarity calculation involves mapping of these three categories of
words corresponding to the two titles among themselves using classical imbalanced
assignment problem. Then the mapping scores can be combined based on
weighted or hierarchical scoring scheme to generate job title similarity.
16
Similarity computation exploiting the semantic and syntactic inherent structure among job titles
Similarity computation exploiting the semantic and syntactic inherent structure among job titles

More Related Content

What's hot

Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
Bilalzafar22
 
Uml
UmlUml
DDL,DML,1stNF
DDL,DML,1stNFDDL,DML,1stNF
DDL,DML,1stNF
Bala Ganesh
 
358 33 powerpoint-slides_1-introduction-c_chapter-1
358 33 powerpoint-slides_1-introduction-c_chapter-1358 33 powerpoint-slides_1-introduction-c_chapter-1
358 33 powerpoint-slides_1-introduction-c_chapter-1
sumitbardhan
 
Syntax
SyntaxSyntax
Lecture 4
Lecture 4Lecture 4
Oop lec 2(introduction to object oriented technology)
Oop lec 2(introduction to object oriented technology)Oop lec 2(introduction to object oriented technology)
Oop lec 2(introduction to object oriented technology)
Asfand Hassan
 
Chap1java5th
Chap1java5thChap1java5th
Chap1java5th
Asfand Hassan
 
classes & objects introduction
classes & objects introductionclasses & objects introduction
classes & objects introduction
Kumar
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
Nico Ludwig
 
Week 2: Getting Your Hands Dirty – Part 2
Week 2: Getting Your Hands Dirty – Part 2Week 2: Getting Your Hands Dirty – Part 2
Week 2: Getting Your Hands Dirty – Part 2
Jamshid Hashimi
 
Project Lambda: To Multicore and Beyond
Project Lambda: To Multicore and BeyondProject Lambda: To Multicore and Beyond
Project Lambda: To Multicore and Beyond
Dmitry Buzdin
 
Week 1: Getting Your Hands Dirty - Part 1
Week 1: Getting Your Hands Dirty - Part 1Week 1: Getting Your Hands Dirty - Part 1
Week 1: Getting Your Hands Dirty - Part 1
Jamshid Hashimi
 
Chap2java5th
Chap2java5thChap2java5th
Chap2java5th
Asfand Hassan
 
Database management systems 3 - Data Modelling
Database management systems 3 - Data ModellingDatabase management systems 3 - Data Modelling
Database management systems 3 - Data Modelling
Nickkisha Farrell
 
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Charitha Gamage
 
Language design and translation issues
Language design and translation issuesLanguage design and translation issues
Language design and translation issues
SURBHI SAROHA
 
Introduction to Object Oriented Design
Introduction to Object Oriented DesignIntroduction to Object Oriented Design
Introduction to Object Oriented Design
Computing Cage
 
Java tokens
Java tokensJava tokens
Java tokens
shalinikarunakaran1
 
Epsilon
EpsilonEpsilon

What's hot (20)

Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
 
Uml
UmlUml
Uml
 
DDL,DML,1stNF
DDL,DML,1stNFDDL,DML,1stNF
DDL,DML,1stNF
 
358 33 powerpoint-slides_1-introduction-c_chapter-1
358 33 powerpoint-slides_1-introduction-c_chapter-1358 33 powerpoint-slides_1-introduction-c_chapter-1
358 33 powerpoint-slides_1-introduction-c_chapter-1
 
Syntax
SyntaxSyntax
Syntax
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Oop lec 2(introduction to object oriented technology)
Oop lec 2(introduction to object oriented technology)Oop lec 2(introduction to object oriented technology)
Oop lec 2(introduction to object oriented technology)
 
Chap1java5th
Chap1java5thChap1java5th
Chap1java5th
 
classes & objects introduction
classes & objects introductionclasses & objects introduction
classes & objects introduction
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
Week 2: Getting Your Hands Dirty – Part 2
Week 2: Getting Your Hands Dirty – Part 2Week 2: Getting Your Hands Dirty – Part 2
Week 2: Getting Your Hands Dirty – Part 2
 
Project Lambda: To Multicore and Beyond
Project Lambda: To Multicore and BeyondProject Lambda: To Multicore and Beyond
Project Lambda: To Multicore and Beyond
 
Week 1: Getting Your Hands Dirty - Part 1
Week 1: Getting Your Hands Dirty - Part 1Week 1: Getting Your Hands Dirty - Part 1
Week 1: Getting Your Hands Dirty - Part 1
 
Chap2java5th
Chap2java5thChap2java5th
Chap2java5th
 
Database management systems 3 - Data Modelling
Database management systems 3 - Data ModellingDatabase management systems 3 - Data Modelling
Database management systems 3 - Data Modelling
 
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
 
Language design and translation issues
Language design and translation issuesLanguage design and translation issues
Language design and translation issues
 
Introduction to Object Oriented Design
Introduction to Object Oriented DesignIntroduction to Object Oriented Design
Introduction to Object Oriented Design
 
Java tokens
Java tokensJava tokens
Java tokens
 
Epsilon
EpsilonEpsilon
Epsilon
 

Similar to Similarity computation exploiting the semantic and syntactic inherent structure among job titles

ProgrammingPrimerAndOOPS
ProgrammingPrimerAndOOPSProgrammingPrimerAndOOPS
ProgrammingPrimerAndOOPS
sunmitraeducation
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
Avelin Huo
 
Ladies Be Architects - Apex Basics
Ladies Be Architects - Apex BasicsLadies Be Architects - Apex Basics
Ladies Be Architects - Apex Basics
gemziebeth
 
Language processors
Language processorsLanguage processors
Language processors
Ganesh Wedpathak
 
ppt
pptppt
ppt
butest
 
ppt
pptppt
ppt
butest
 
CS 112 PA #4Like the previous programming assignment, this assignm.docx
CS 112 PA #4Like the previous programming assignment, this assignm.docxCS 112 PA #4Like the previous programming assignment, this assignm.docx
CS 112 PA #4Like the previous programming assignment, this assignm.docx
annettsparrow
 
System Programming Overview
System Programming OverviewSystem Programming Overview
System Programming Overview
Dattatray Gandhmal
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
2 rel-algebra
2 rel-algebra2 rel-algebra
2 rel-algebra
Mahesh Jeedimalla
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
 
Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#
Robert Pickering
 
Designing Optimized Symbols for InduSoft Web Studio Projects
Designing Optimized Symbols for InduSoft Web Studio ProjectsDesigning Optimized Symbols for InduSoft Web Studio Projects
Designing Optimized Symbols for InduSoft Web Studio Projects
AVEVA
 
Wiki dev nlp
Wiki dev nlpWiki dev nlp
Wiki dev nlp
ICSM 2010
 
Task-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsingTask-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsing
jie cao
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
Bill Liu
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
Yong Siang (Ivan) Tan
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
George Simov
 
c#.pptx
c#.pptxc#.pptx

Similar to Similarity computation exploiting the semantic and syntactic inherent structure among job titles (20)

ProgrammingPrimerAndOOPS
ProgrammingPrimerAndOOPSProgrammingPrimerAndOOPS
ProgrammingPrimerAndOOPS
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
 
Ladies Be Architects - Apex Basics
Ladies Be Architects - Apex BasicsLadies Be Architects - Apex Basics
Ladies Be Architects - Apex Basics
 
Language processors
Language processorsLanguage processors
Language processors
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
CS 112 PA #4Like the previous programming assignment, this assignm.docx
CS 112 PA #4Like the previous programming assignment, this assignm.docxCS 112 PA #4Like the previous programming assignment, this assignm.docx
CS 112 PA #4Like the previous programming assignment, this assignm.docx
 
System Programming Overview
System Programming OverviewSystem Programming Overview
System Programming Overview
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
2 rel-algebra
2 rel-algebra2 rel-algebra
2 rel-algebra
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#
 
Designing Optimized Symbols for InduSoft Web Studio Projects
Designing Optimized Symbols for InduSoft Web Studio ProjectsDesigning Optimized Symbols for InduSoft Web Studio Projects
Designing Optimized Symbols for InduSoft Web Studio Projects
 
Wiki dev nlp
Wiki dev nlpWiki dev nlp
Wiki dev nlp
 
Task-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsingTask-oriented Conversational semantic parsing
Task-oriented Conversational semantic parsing
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
 
c#.pptx
c#.pptxc#.pptx
c#.pptx
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

Similarity computation exploiting the semantic and syntactic inherent structure among job titles

  • 1. Similarity Computation Exploiting the Semantic and Syntactic Inherent Structure Among Job Titles Authors: Sarthak Ahuja1, Joydeep Mondal1, Sudhanhsu Shekhar Singh1 and David Glenn George2 1 IBM Research Lab, India 2 IBM Talent Management Solutions, Portsmouth, UK Presenter: Joydeep
  • 2. What exactly it is trying to solve?
  • 3. List of Available Job Titles • System Engineer • Software Developer • Senior Software Engineer • Junior Network Engineer • Junior Software Tester Query Job Title • Junior Software Engineer No Other information (job descriptions or other details except TITLE) is available corresponding to these jobs Similarity Computation Similarity ComputationSimilarity Computation Similarity Computation Similarity Computation Best Match
  • 4. Business Application (Where & Why it is needed?)
  • 5. • IBM Watson Recruitment (IWR) : https://www.ibm.com/talent- management/hr-solutions/recruiting-software Mapping requisition jobs to the available job taxonomy without using computation intensive and time consuming sate of the art document similarity methods by narrow down the search space
  • 6. How the problem has been solved?
  • 7. Job Title Matching Split Title keywords into Three categories (Domain, Functional, Attribute) Map each category of one job title to those of the other title
  • 8. Example • Title = “Junior Software Engineer” • Domain keywords Set = [“Software”] • Functional keywords Set = [“Engineer”] • Attribute Keywords set = [“Junior”] Title = “Junior Software Engineer” Map Domain, Functional, Attribute keyword sets of one title to those of the other title
  • 9. Methods • Objective: Any job title can be split into the attribute, functional and core descriptor/domain words. • Input: • Job Title (T) • Output: • 3 sets , Attribute words set (SA), functional words set (SF) and core descriptor/domain words set (SD) • Resources/ Existing techniques used: • Acronym dictionary (DictA ), Spell checker technique (TechS ), Classifier model (Mclass) • Algorithm: • Step 1: SWord = split the title T into separate words • Step 2: for each word in Sword • Step 2.1: word = resolve acronyms of word using DictA • Step 2.2: word = resolve the spelling mistake using TechS • Step 2.3: classify word using Mclass as either a Attribute (A) word or a functional word (F) or a core descriptor/domain word (D) • Step2.4: Append word to the corresponding set (SA , SF , SD ) depending upon it’s class label (A, F, D) • Feature vector used in Classifier model (Mclass): • [POS (part of speech) of the word, position of the word in job title (T) (first word/last word/in between word), POS of the root word for each word, word ends with “er”/”or”/”ar” or not]
  • 10. • Why we used these features? • POS (part of speech) of the word : We found most of the attribute-words are adjectives, e.g. Senior, Junior etc., most of the functional-words are noun, e.g. developer, tester, teacher and most of the core descriptor/domain words are also noun, e.g. Software, Network etc. • position of the word in job title (T) (first word/last word/in between word) : We found that attribute-words are generally the first or last words of the title e.g.: Senior software developer, Network administrator junior etc. Most of the functional-words appear as in- between or last word of the title e.g.: Senior software developer, Network administrator junior etc. We also found that most of the core descriptor/domain words appears as in-between or first word in a title e.g.: Senior software developer, Network administrator junior etc. • POS of the root word for each word : Our analysis showed that POS of the root word corresponding to the functional-words are verb, e.g. : Senior software developer : root word for developer = “develop” which is a verb. We used https://www.vocabulary.com/dictionary/ open source online dictionary to get the root words. • word ends with “er”/”or”/”ar” or not: We also found that most of the functional words end with either of these three substrings “er”/”or”/”ar”, e.g. : teacher, developer, engineer etc.
  • 11. I’m the Best! Functional classifier o/p -> input of Attribute Classifier Functional Classifier o/p + Attribute Classifier o/p -> input of Domain Classifier
  • 12. Methods Objective: mapping three category-set of words (Attribute, Functional and core descriptor/domain) corresponding to the two titles among themselves using classical imbalanced assignment problem. Then the mapping scores are combined based on weighted or hierarchical scoring scheme to generate job title similarity. • Input: • Job Title1 (T1), Job Titl2 (T2) • Output: • Similarity score (s) between T1 and T2 • Resources/ Existing techniques used: • Wordnet Dictionary API (W), Hungarian method to solve imbalanced assignment problem (TH) • Algorithm: • Step 1: extract (SA1 , SF1 , SD1 ) from T1 and (SA2 , SF2 , SD2 ) from T2 by previous method • Step 2: Get the mappings as MA(SA1 : SA2 ), MF(SF1 : SF2 ) and MD(SD1 : SD2 ) by TH • Step 3: calculate the mapping similarity score simA , simF and simD for MA , MF and MD respectively. • Step 4: S = simD (1+ simF (1 + simA ))/ (IndicatorD + IndicatorF + IndicatorA ) // importance order : D, F and A respectively. • We used Wordnet Dictionary API (W) to calculate semantic similarity between two words. We built a semantic similarity score matrix for each pair of sets (SA1 : SA2 ), (SF1 : SF2 ) and (SD1 : SD2 ) and provide this matrix to TH as input. We also use the same matrix to calculate simA , simF and simD for MA , MF and MD.
  • 16. Core Novelty 1 . Any job title can be split into three categories the attribute, functional and core descriptor/domain words. 2. Job title similarity calculation involves mapping of these three categories of words corresponding to the two titles among themselves using classical imbalanced assignment problem. Then the mapping scores can be combined based on weighted or hierarchical scoring scheme to generate job title similarity. 16