www.synerzip.com
Building an Augmented IT Recruiter
Cloud Native Containerized Micro-service
Powered by NLP and ML
About the Speaker
Vinayak Joglekar
Founder & CTO, Synerzip
- Hires & Mentors Agile Software Development
Teams
- Over 3 decades of experience in Software Product
Development
- Hands on practitioner of Agile and Lean techniques.
-Speaker at 2008 Agile Conference in Toronto.
- Hands on experience in QA automation, DevOps,
UX design and CD
- Blogs about trends in software development
Linkedin Profile:
https://www.linkedin.com/in/vinayak-joglekar-b95329/
Confidential
Problem statement
• Recruiters in software companies who hire
3 to 5 years experienced professionals in
popular technologies like Javascript are
inundated with resumes. Build an app that
would magically parse and rank hundreds of
resumes in a jiffy.
• Build an engaging UX so that the recruiters
would return time and again to use the app.
The app should empower the recruiter by
snugly augmenting his routine tasks
Confidential
Hook Model to build an engaging UX
• External Trigger-Recruiters
receives a Job requirement
• Action- About 20 to 50 freshly
sourced resumes are submitted
• Variable reward – Download an
excel tracker with all the
resumes ranked in the order of
suitability
• Investment- Repeat use of the
application gives more accurate
results.
Confidential Concept courtesy: Nir Eyal
Smooth experience - Action
Confidential
• Point to a folder
containing all the
resumes received
• Parser extracts
important information
like contact details,
education, technical
expertise, relevant
project experience etc.
in less than 1 min.
Execution Challenges- Gate Server
• Gate server is single threaded. How to build
a web application?
• Gate server crashes after parsing a few
hundred resumes. Ops need to restart it to
bring up the service
• A rogue resume can take very long and
eventually bring down the Gate server
• Each resume takes a few seconds – parallel
processing needed to speed up parsing.
Confidential
Challenge-Gate is single threaded
Confidential
GATE
Ontolgy
User 1
GATE
Ontolgy
User 2 User 1 User 2
…
User n
Web Server
Singleton
Queue
Challenge-Parsing is inherently slow
Confidential
Gate Document
Parser
Preprocessing
DOCX to HTML
Timeout 60
seconds
Timeout 90
seconds
Error
Message
Pod
P G
Input
Queue
Dead Letter
Queue
6
Pod
P G
R
6
Output
Queue
R
6
Pod
P G
6
6
6
Pod
P G
Pod
P G
6
6
6
6
Pod
P G
Pod
P G
Pod
P G
Pod
P G
6
6
6
6
6
6
Instance 1 Instance 2
Solution
• Create a singleton Gate server that works on
multiple requests serially by using rabbit MQ
• Create multiple instances of this server by
putting each one in a Docker container.
• Use AWS to host and K8s to orchestrate
• Circuit breaker for consistent performance
• Container killed if it times out . Document beig
processed in put on dead letter queue
• Containers re start after servicing a fixed
number of requests
Confidential
Rewarding experience
Confidential
• Quickly rank the
resumes in the order of
suitability score=sum of
weighted score of
various criteria like
education, technical
expertise, relevant
experience, proximity,
notice period, expected
compensation etc.
Challenge- Missing Information
Confidential
Name
Contact Details
Objective Target Designation/ Role
Overview Experience
Skill Set List of Technologies
Institute Degree BranchYear
Institute Degree BranchYear
Company DesignationFrom
To Company DesignationFrom
To
Client Project DescriptionFro
m
URL ResponsibilitiesTechnologies
To
Client Project DescriptionFro
m
URL ResponsibilitiesTechnologies
To
Awards and Certifications
Sports and Hobbies
Footnote Address
All the terms were getting correctly
annotated but they were not getting
properly grouped under the correct
heading
Education
Experience
Project 1
Project 2
Solution
• We had 3 annotators manually annotate more
than 1000 resumes
• We modeled this as n way classification
problem with each heading as a class and the
terms inside the headings and their relative
location in the resume as the features
• We achieved 98% accuracy and 95% recall
Confidential
Challenge- weightages???
• Ranking was largely dependent on the
weightages assigned to suitability criteria
like education, relevant experience, notice
period, compensation, technical skills
needed etc.
• The importance assigned to these factors
was dependent on the seniority of the
position, company and specific project
needs
Confidential
Solution
We collected historical information about
resumes that were short listed for interview in a
company for specific projects and modeled it
as a logistic regression problem with the vector
of weightages being theta in the sigmoid
function above
Confidential
Challenge
• Resumes will have new words- technologies
and technical terms that are
unrecognizable.
• Candidates will learn new skills that are not
existent today(Big data analytics, cloud
computing & mobile programming didn’t
exist 5 years back)
• Recruiters will feel powerless and bored of
using the application if they can’t teach it to
work smarter- they want to achieve mastery
Confidential
Solution
• Created a “training set” from manually
annotated resumes. More resumes
processed= bigger training set = smarter
parsing of new resumes.
• Offset locations of the training set are
modeled as features and annotations are
modeled as their values
• CNN using Tensorflow to automatically
annotate resumes-> User empowerment!
Confidential
Challenge
• Most suitable candidate as per the suitability
score and quiz score doesn’t always get
selected. Sometimes no 2 or 3 is found to
be better than no 1.
• Suitability scores are calculated using
weightages assigned to various attributes.
These weightages are based on “hunch”
Confidential
Smooth experience - Action
Confidential
As frequency of use & no. of
users increases
•more terms get added to the
ontology and less number of
terms need manual
annotation
•Accuracy & recall in parsing
headings improves
•Weightages used in
computing suitability score
become more accurate
Un-annotated terms reduce
As more number of
words get added to
the ontology more
than 95% of the
words are found in
the ontology. The
drop in
unrecognized
terms is
exponential
Confidential
Number of resumes parsed.
Numberofunrecognizedwords.
Accuracy and Recall improve
As more number of
resumes are
parsed with
corrections done
manually wherever
required, they get
added to the
training set and the
recall and accuracy
improve
Confidential
Number of resumes parsed.
AccuracyandRecall%
Accuracy
Recall
Conclusion-hitech for engagig UX
• Machine learning models become smarter
with continued use which keeps the users
invested in the application. Past history of
usage is the investment in this case.
• Cloud native containerized micro-services
provide an opportunity to build magically
fast, consistent and reliable response
Confidential
Confidential
THANK YOU!

Agile india2018 exp_report

  • 1.
    www.synerzip.com Building an AugmentedIT Recruiter Cloud Native Containerized Micro-service Powered by NLP and ML
  • 2.
    About the Speaker VinayakJoglekar Founder & CTO, Synerzip - Hires & Mentors Agile Software Development Teams - Over 3 decades of experience in Software Product Development - Hands on practitioner of Agile and Lean techniques. -Speaker at 2008 Agile Conference in Toronto. - Hands on experience in QA automation, DevOps, UX design and CD - Blogs about trends in software development Linkedin Profile: https://www.linkedin.com/in/vinayak-joglekar-b95329/ Confidential
  • 3.
    Problem statement • Recruitersin software companies who hire 3 to 5 years experienced professionals in popular technologies like Javascript are inundated with resumes. Build an app that would magically parse and rank hundreds of resumes in a jiffy. • Build an engaging UX so that the recruiters would return time and again to use the app. The app should empower the recruiter by snugly augmenting his routine tasks Confidential
  • 4.
    Hook Model tobuild an engaging UX • External Trigger-Recruiters receives a Job requirement • Action- About 20 to 50 freshly sourced resumes are submitted • Variable reward – Download an excel tracker with all the resumes ranked in the order of suitability • Investment- Repeat use of the application gives more accurate results. Confidential Concept courtesy: Nir Eyal
  • 5.
    Smooth experience -Action Confidential • Point to a folder containing all the resumes received • Parser extracts important information like contact details, education, technical expertise, relevant project experience etc. in less than 1 min.
  • 6.
    Execution Challenges- GateServer • Gate server is single threaded. How to build a web application? • Gate server crashes after parsing a few hundred resumes. Ops need to restart it to bring up the service • A rogue resume can take very long and eventually bring down the Gate server • Each resume takes a few seconds – parallel processing needed to speed up parsing. Confidential
  • 7.
    Challenge-Gate is singlethreaded Confidential GATE Ontolgy User 1 GATE Ontolgy User 2 User 1 User 2 … User n Web Server Singleton Queue
  • 8.
    Challenge-Parsing is inherentlyslow Confidential Gate Document Parser Preprocessing DOCX to HTML Timeout 60 seconds Timeout 90 seconds Error Message Pod P G Input Queue Dead Letter Queue 6 Pod P G R 6 Output Queue R 6 Pod P G 6 6 6 Pod P G Pod P G 6 6 6 6 Pod P G Pod P G Pod P G Pod P G 6 6 6 6 6 6 Instance 1 Instance 2
  • 9.
    Solution • Create asingleton Gate server that works on multiple requests serially by using rabbit MQ • Create multiple instances of this server by putting each one in a Docker container. • Use AWS to host and K8s to orchestrate • Circuit breaker for consistent performance • Container killed if it times out . Document beig processed in put on dead letter queue • Containers re start after servicing a fixed number of requests Confidential
  • 10.
    Rewarding experience Confidential • Quicklyrank the resumes in the order of suitability score=sum of weighted score of various criteria like education, technical expertise, relevant experience, proximity, notice period, expected compensation etc.
  • 11.
    Challenge- Missing Information Confidential Name ContactDetails Objective Target Designation/ Role Overview Experience Skill Set List of Technologies Institute Degree BranchYear Institute Degree BranchYear Company DesignationFrom To Company DesignationFrom To Client Project DescriptionFro m URL ResponsibilitiesTechnologies To Client Project DescriptionFro m URL ResponsibilitiesTechnologies To Awards and Certifications Sports and Hobbies Footnote Address All the terms were getting correctly annotated but they were not getting properly grouped under the correct heading Education Experience Project 1 Project 2
  • 12.
    Solution • We had3 annotators manually annotate more than 1000 resumes • We modeled this as n way classification problem with each heading as a class and the terms inside the headings and their relative location in the resume as the features • We achieved 98% accuracy and 95% recall Confidential
  • 13.
    Challenge- weightages??? • Rankingwas largely dependent on the weightages assigned to suitability criteria like education, relevant experience, notice period, compensation, technical skills needed etc. • The importance assigned to these factors was dependent on the seniority of the position, company and specific project needs Confidential
  • 14.
    Solution We collected historicalinformation about resumes that were short listed for interview in a company for specific projects and modeled it as a logistic regression problem with the vector of weightages being theta in the sigmoid function above Confidential
  • 15.
    Challenge • Resumes willhave new words- technologies and technical terms that are unrecognizable. • Candidates will learn new skills that are not existent today(Big data analytics, cloud computing & mobile programming didn’t exist 5 years back) • Recruiters will feel powerless and bored of using the application if they can’t teach it to work smarter- they want to achieve mastery Confidential
  • 16.
    Solution • Created a“training set” from manually annotated resumes. More resumes processed= bigger training set = smarter parsing of new resumes. • Offset locations of the training set are modeled as features and annotations are modeled as their values • CNN using Tensorflow to automatically annotate resumes-> User empowerment! Confidential
  • 17.
    Challenge • Most suitablecandidate as per the suitability score and quiz score doesn’t always get selected. Sometimes no 2 or 3 is found to be better than no 1. • Suitability scores are calculated using weightages assigned to various attributes. These weightages are based on “hunch” Confidential
  • 18.
    Smooth experience -Action Confidential As frequency of use & no. of users increases •more terms get added to the ontology and less number of terms need manual annotation •Accuracy & recall in parsing headings improves •Weightages used in computing suitability score become more accurate
  • 19.
    Un-annotated terms reduce Asmore number of words get added to the ontology more than 95% of the words are found in the ontology. The drop in unrecognized terms is exponential Confidential Number of resumes parsed. Numberofunrecognizedwords.
  • 20.
    Accuracy and Recallimprove As more number of resumes are parsed with corrections done manually wherever required, they get added to the training set and the recall and accuracy improve Confidential Number of resumes parsed. AccuracyandRecall% Accuracy Recall
  • 21.
    Conclusion-hitech for engagigUX • Machine learning models become smarter with continued use which keeps the users invested in the application. Past history of usage is the investment in this case. • Cloud native containerized micro-services provide an opportunity to build magically fast, consistent and reliable response Confidential
  • 22.