SlideShare a Scribd company logo
1 of 14
Download to read offline
Job
Recommendation
Engine
Harshendra
Mahak Gambhir
Vishal Gupta
Introduction
● Most job applications are through resumes
● Short Listing resumes is a painful task
● Especially when there are thousands of them
● Manual checking is not feasible
Semantic Search
● What is not a semantic search ?
● Search based on keywords ?
Ex: grep “icpc”
● Can we make it better ?
○ Weightage to sections
○ To be able to search in a particular section
○ Recommend resumes similar to a particular resume
○ NLP vs Natural Language Processing
Challenges
● Structure of a resume
● Content Segregation - How well is it written ?
● Extraction of entities
○ Names
○ Marks/Grades
○ Previous Organizations
○ Years of experience
○ Skills
○ Projects
● Ranking among the shortlisted resumes
Problems addressed
● Creation of info box for each resume.
○ Name: XYZ
○ Email : abc@gmail.com
○ Phone: xx-xxxxxxxxx
○ Skills
○ Projects
● A search interface
○ Query> Machine Learning, ACM ICPC, Scikit Learn
● Recommend resumes similar to a given resume
○ Resume Path> /path/to/resume
The Process Flow
Resumes in
PDF format
PDF to text
Extract
features
Ranking
Model
Top K
Candidates
Resumes in
Text format
Preprocessing
Parsing Infoboxes
JSON files
& Indexes
Used to display
a brief summary
of the candidate.
Approach
● Each resume is a text document
● Extract the sections in each resume
● NER for names
● Index the resumes according to sections
● Extract features from each resume
Technical Details
● Named Entity Recognition for extracting names
○ Used Stanford NER
● Parse the sections using most common section names
○ Project/Projects/Academic Project/Major Projects/Minor Projects
○ Skills/Technical Skills/Technical Expertise and so on
● Stanford temporal tagger for extracting the temporal
information
○ Jan 2012 - Jan 2014 (2 years)
○ Useful in extracting the years of experience
Technical Details
● Bag of words approach
● Vector space model
● Extract feature for each resume
○ Boolean
■ A term is present or not d
○ Tf-Idf
■ Tf-idf weight of a term w.r.t the document d
● Rank the resumes based on distance functions
○ Cosine Similarity
○ Euclidean distance
Features
Let N be the |V|, d1 and d2 be two resumes
d1
: v1 <v11
v12
. . . v1N
>
d2
: v2 <v21
v22
. . . v2N
>
where,
vij
: Tf-Idf(di
,tj
)
tj
is the jth resume
Tf-Idf(d,t) = tft
x log(N/dft
)
tft
- Term frequency of term t in document d
dft
- document frequency of term t
Ranking
● Convert the test query/document into tf-idf feature
vector t
Ex: test = “acm icpc, machine learning”
● Calculate cosine similarity with each document and rank
them
cos-sim(d1
,t), cos-sim(d2
,t) and so on
● Rank the resumes in non-decreasing order
● Choose top K
Tools/Frameworks
● Language: python 2.7
● Frameworks:
○ Scikit-learn
○ Nltk
○ Stanford NER tagger
○ Stanford temporal tagger
● Intuition!
Demo
Links
● Code: https://github.com/gvishal/Sematic-Job-Recommendation-Engine
● Deliverables: https://www.dropbox.
com/home/IRE_Major_Project/deliverables

More Related Content

Viewers also liked

Viewers also liked (8)

ACM Winter 2016 General Meeting
ACM Winter 2016 General Meeting ACM Winter 2016 General Meeting
ACM Winter 2016 General Meeting
 
Semantic job recommendation engine
Semantic job recommendation engineSemantic job recommendation engine
Semantic job recommendation engine
 
STAFF ACM-ICPC
STAFF  ACM-ICPCSTAFF  ACM-ICPC
STAFF ACM-ICPC
 
VSCO: an easy way to replicate film photography
VSCO: an easy way to replicate film photographyVSCO: an easy way to replicate film photography
VSCO: an easy way to replicate film photography
 
ICPC
ICPCICPC
ICPC
 
resume
resumeresume
resume
 
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
 
Quic을 이용한 네트워크 성능 개선
 Quic을 이용한 네트워크 성능 개선 Quic을 이용한 네트워크 성능 개선
Quic을 이용한 네트워크 성능 개선
 

Similar to Semantic job recommendation engine

3. Google Advanced Search.pdf
3. Google Advanced Search.pdf3. Google Advanced Search.pdf
3. Google Advanced Search.pdf
relsmantab
 

Similar to Semantic job recommendation engine (20)

Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 
Programming Languages of Importance in Modern Academics & Industries
Programming Languages of Importance in Modern Academics & IndustriesProgramming Languages of Importance in Modern Academics & Industries
Programming Languages of Importance in Modern Academics & Industries
 
How to become Industry ready engineers.pdf
How to become  Industry ready engineers.pdfHow to become  Industry ready engineers.pdf
How to become Industry ready engineers.pdf
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career Guidance
 
Level-based Resume Classification on Nursing Positions
Level-based Resume Classification on Nursing PositionsLevel-based Resume Classification on Nursing Positions
Level-based Resume Classification on Nursing Positions
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Getting a Data Science Job
Getting a Data Science JobGetting a Data Science Job
Getting a Data Science Job
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
3. Google Advanced Search.pdf
3. Google Advanced Search.pdf3. Google Advanced Search.pdf
3. Google Advanced Search.pdf
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptx
 
Resume Writing Workshop (Part I)
Resume Writing Workshop (Part I)Resume Writing Workshop (Part I)
Resume Writing Workshop (Part I)
 
Software Craftmanship - Cours Polytech
Software Craftmanship - Cours PolytechSoftware Craftmanship - Cours Polytech
Software Craftmanship - Cours Polytech
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering Primer
 
Data quality in Real Estate
Data quality in Real EstateData quality in Real Estate
Data quality in Real Estate
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Shilpa shukla processing_text
Shilpa shukla processing_textShilpa shukla processing_text
Shilpa shukla processing_text
 
Technical Debt Management
Technical Debt ManagementTechnical Debt Management
Technical Debt Management
 

Recently uploaded

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 

Semantic job recommendation engine

  • 2. Introduction ● Most job applications are through resumes ● Short Listing resumes is a painful task ● Especially when there are thousands of them ● Manual checking is not feasible
  • 3. Semantic Search ● What is not a semantic search ? ● Search based on keywords ? Ex: grep “icpc” ● Can we make it better ? ○ Weightage to sections ○ To be able to search in a particular section ○ Recommend resumes similar to a particular resume ○ NLP vs Natural Language Processing
  • 4. Challenges ● Structure of a resume ● Content Segregation - How well is it written ? ● Extraction of entities ○ Names ○ Marks/Grades ○ Previous Organizations ○ Years of experience ○ Skills ○ Projects ● Ranking among the shortlisted resumes
  • 5. Problems addressed ● Creation of info box for each resume. ○ Name: XYZ ○ Email : abc@gmail.com ○ Phone: xx-xxxxxxxxx ○ Skills ○ Projects ● A search interface ○ Query> Machine Learning, ACM ICPC, Scikit Learn ● Recommend resumes similar to a given resume ○ Resume Path> /path/to/resume
  • 6. The Process Flow Resumes in PDF format PDF to text Extract features Ranking Model Top K Candidates Resumes in Text format Preprocessing Parsing Infoboxes JSON files & Indexes Used to display a brief summary of the candidate.
  • 7. Approach ● Each resume is a text document ● Extract the sections in each resume ● NER for names ● Index the resumes according to sections ● Extract features from each resume
  • 8. Technical Details ● Named Entity Recognition for extracting names ○ Used Stanford NER ● Parse the sections using most common section names ○ Project/Projects/Academic Project/Major Projects/Minor Projects ○ Skills/Technical Skills/Technical Expertise and so on ● Stanford temporal tagger for extracting the temporal information ○ Jan 2012 - Jan 2014 (2 years) ○ Useful in extracting the years of experience
  • 9. Technical Details ● Bag of words approach ● Vector space model ● Extract feature for each resume ○ Boolean ■ A term is present or not d ○ Tf-Idf ■ Tf-idf weight of a term w.r.t the document d ● Rank the resumes based on distance functions ○ Cosine Similarity ○ Euclidean distance
  • 10. Features Let N be the |V|, d1 and d2 be two resumes d1 : v1 <v11 v12 . . . v1N > d2 : v2 <v21 v22 . . . v2N > where, vij : Tf-Idf(di ,tj ) tj is the jth resume Tf-Idf(d,t) = tft x log(N/dft ) tft - Term frequency of term t in document d dft - document frequency of term t
  • 11. Ranking ● Convert the test query/document into tf-idf feature vector t Ex: test = “acm icpc, machine learning” ● Calculate cosine similarity with each document and rank them cos-sim(d1 ,t), cos-sim(d2 ,t) and so on ● Rank the resumes in non-decreasing order ● Choose top K
  • 12. Tools/Frameworks ● Language: python 2.7 ● Frameworks: ○ Scikit-learn ○ Nltk ○ Stanford NER tagger ○ Stanford temporal tagger ● Intuition!
  • 13. Demo
  • 14. Links ● Code: https://github.com/gvishal/Sematic-Job-Recommendation-Engine ● Deliverables: https://www.dropbox. com/home/IRE_Major_Project/deliverables