SlideShare a Scribd company logo
TEXT ANALYTICS APPLIED TO HR
How to transform HR
via computational linguistics
Presentation written and designed by Hendrik Feddersen and Raja Sengupta
Frequency : TF and IDF
xv
"How can computational linguistics
help business create value, drive
action and impact without the hassle
of going through intensive HR
reporting?
Presentation written and designed by Hendrik Feddersen and Raja Sengupta
Hendrik Feddersen
Head of HRIS
European Medicines Agency
My journey so far
• SAP HCM implementation, problem solving,
data cleaning, reporting, predictions and
training colleagues
• Chairman to more than 100 open selection
procedures of about 150 candidates each
• Continuous business improvements
(Lean Six Sigma Green Belt certified)
• Self-learning “R language” and Data Science
• Passion for HR Analytics, writing posts on
LinkedIn, tweets and connecting
internationally
Building
the case
for Text
Analytics
in HR
Text or natural language is omnipresent. Human
communication inherently involves languages, even
our deepest internal thoughts can be documented
via natural language. This talk itself could be
transcribed and text processed for meaningful
insights, the magnitude and scope of text analytics
application is limitless.
This enables NLP analysis with unique, unparalleled
insights not limited to business alone but to the
human intent and disposition itself, a key for HR.
The History and Challenges
of Text Analytics
LATE STARTER
Natural Language originates with genesis of
Human civilization itself, yet text
analytics has been a relatively slow-
starter.
RESOURCE INTENSIVE
Inherent variation and volumetric content,
makes analytics on human language an
extremely resource intensive process.
SEMANTIC DISTRIBUTION
The mathematic challenge lies in mapping
stochastic characteristics to adequately
deterministic, based on the ZIPF’s law.
Text Analytics Today
THRUST OF INFORMATION TECHNOLOGY
The thrust of information technology in the
early part of this century has helped
immensely the cause of text analytics.
R&D IN STATISTICAL TAGGING OF TEXT
Taking advantage of improved computing
capabilities, vast repositories of libraries
dedicated to statistical text modeling text
were compiled, nosql type databases
developed and extensive R&D in text
analytics has been carried out.
SEMANTIC CONTENT OF HR DATA
Most HR processes are textual or
unstructured, making thereby HR a key
beneficiary of NLP research.
Text Data generated by HR
TEXT DATA FROM CORE HR PROCESSES
Most core HR operations including application
evaluation, selection process, appraisal
form and management, 360 assessments,
staff engagement surveys, employee
feedbacks, etc., all generate a wealth of
semantic data.
OTHER TEXT DATA AVAILABLE TO HR
In addition the professional and personnel
social networking records like LinkedIn,
Facebook, Twitter provide HR implicit
access to potential capabilities and
behavioural trends of individuals.
TRANSCRIBED TEXT / VOICE TO TEXT
Ongoing research of key relevance for HR.
Applications of Text
Analytics to HR process
TEXT ANALYTICS MODELS ON HR
PROCESSES DATA HELP IMPROVE
• Applicant hiring & employee monitoring
• Employee appraisals and feedback
management
• Employee welfare initiatives and
complaint management systems
• Creative HR based Insights via text
based surveys
• Early warnings and insights on potential
behavioural trends and legal issues.
Skepticism of operational HR
towards Text Analytics
• Traditional HRIS reporting systems suffice
• Issues of integration with existing HRIS
systems
• Data protection and compliance
• Learning, training and support
• New technology skepticism
• Complexity and potential failure
• Potential job loss via automated text scoring
models
• Challenges of Multi lingual text environment
• Inherent limitations of machines in
understanding delicate nuances of human
communication and behaviour
Countering Skepticism of
operational HR
Traditional HRIS reporting does not cater to
predictive and prescriptive modeling
Modern HR Analytics systems seamlessly
integrate with existing legacy HRIS systems
Step wise evidence based production
implementation
HR Analytics cater for all existing European
standard data protection and compliance
Full integrated video based training and support
along with implementation
Text scoring models are a capable decision
improvement system, however they are not a
replacement for human expertise
POS tagging in German, French and Italian
seamlessly cater for Multi lingual text analytics
What is Text Analytics?
Brief Technical Overview
Natural Language Processing
NLP involves mathematical operations on
linguistic units, i.e. words, expressions,
sentences, paragraphs in various
permutations and combinations in order to
derive meaningful insights via their statistical
tagging.
Statistical Tagging
Statistical tags include vectoring, tokenizing,
filtering, stemming, lemmatization, n-grams
and topicalisation, etc.
Objective
The objective is for machines to decipher the
thought and intent vector behind human
language via test of statistical significance of
statistical tags and provide us with insights
relevant to our interests.
Basic Text Analytics routines
relevant to HR process data
Supervised
Naive Bayes, SVM, Neural Networks,
Decision Trees, Various ensembles
Unsupervised
Principle component analysis,
Clustering via distance matrix’s
Multivariate Semantic Outliers,
Association Rules & FP-Growth
Semi-Supervised
Conditional Rules Based Dictionary Taggers
Hidden Markov Models
HMM can be considered to be an
implementation of Bayesian joint
probability distribution in text analytics, i.e.
mapping the unobservable based on
probabilistic ranking of the observable
term, hence the use of the term hidden
Latent Dirichlet Allocation
LDA is a adoption of SVM, essentially a vector
based decomposition method typically
using a cosine distance matrix between
words in the document. This is supposed
to unravel the synonymy and polysemous
relationships between words via statistical
inference and help discover topics.
Basic Text Analytics routines
relevant to HR process data
Live
example
of Text
Analytics
in HR
Automated Resume Scoring –
A real life HR Analytics example
Business Problem
A. Recruiters typically estimate job applicant fitment via their
specific process expertise,
B. This is often a tedious, manual time and consuming process,
C. Quality of fitment estimations can be inconsistent and
subject to bias.
Solution
Statistical text classification models, can improve upon
A. Absolute accuracy of application fitment estimations,
B. Overall consistency of correct application estimations,
C. Improve the skills of recruiter, and assist in fine tuning
searches,
D. Greatly reduce time and effort.
Automated Resume Scoring –
A real life HR Analytics example
Success Parameters of the Model:
A. At least 80% of the accuracy of manual forecasting,
taking a fraction of the time required,
B. In addition to the above providing statistical visualization and insights
via multivariate maps helping improve technical skills of the recruiter.
Challenges (of this specific business case):
A. Inadequacy of learner base (scored data) for the supervised
classification model,
B. A scattered data-set that required significant pre-processing in order
to unify for analytics,
C. Significant variation among the different job orders.
Automated Resume Scoring –
A real life HR Analytics example
Methodology Employed - Technology
The severe inadequacy of the learner base meant that typically used supervised
classification models could not be employed. More than 15 text based
information modules are scored for each candidate In addition to learner base
inadequacy cross validation of the models would not have been reliable.
Iterative semi-supervised conditional rules dictionary had to be built Once built
the model is reusable for job orders with same (or similar) specifications. Our
model can typically score 500 applications in less than 5 minutes!
The estimated accuracy is 80%, corroborated via HR domain expert.
The following step wise approach was used for building the model.
Step 1: Structured Query Language
Purpose : Consolidated text (string) data
stored across multiple tables
Output : Single unified XLS report
Step 2: Near codeless (some java script) based ETL
Data: Multiple XLS reports
Purpose : ETL - Extract, Transform and Load
to preprocess data for text analysis
Output : Single unified XLS report
Step 3: Co-occurrence network of maps [illustration only]
Part-of-speech tagging : Noun, proper nouns, bi and tri gram tags
Distance Metric : Jaccard’s similarity coefficient
Frequency : Term frequency–inverse document frequency
Step 4: Self organising maps
POS tagging: Noun, proper nouns, bi and tri gram tags
Distance Metric : Euclidean distance
Frequency : TF and IDF
Step 5: Kruskal Nonmetric Multidimensional analysis
POS tagging: Noun, proper nouns, bi and tri gram tags
Distance Metric : Mahalobnis distance
Frequency : TF and IDF
Step 6: Kruskal Non-metric Multi Dimensional Scaling
POS tagging: Noun, proper nouns, bi and tri gram tags
Distance Metric : Jaccard coefficient
Frequency : TF and IDF
Step 7: Latent Dirichlet allocation (LDA) Subtopic Discovery
POS tagging: Noun, proper nouns, bi and tri gram tags
Method : Parallel LDA
Frequency : TF and IDF
Step 8: Hierarchical Cluster Analysis
POS tagging: Noun, proper nouns, bi and tri gram tags
Method : Ward and Mahalanobis distance
Frequency : TF and IDF
Step 9: Conditional Rules Dictionary via python wrappers
Data: Computational Linguistics from multidimensional maps
Purpose : Creating coding dictionary to score application forms
Output : Single unified XLS report of scored applications
[for illustration only]
Step 10 OUTPUT: Single unified XLS report of scored
applications arranging in descending order
Further info
Raja Sengupta provided technical support for this
presentation.
Raja is a Computational Linguistics Professional with
research background in applying text analytics to
operational HR.
One of the many solutions he is working on is
KNOHR: A scoring system for job applicants,
appraisals, text surveys, which can be adopted for
semantic data analysis, while processing operational
HR data.
The system based on python natural language tool kit,
integrates seamless with existing HRIS systems
provides efficient decision support reports via a single
click or periodically via batch process with over 80%
cross validation accuracy.
Raja Sengupta
HR Analytics Developer
Summary
The intent here was to provide an overview of the body
of work around natural language processing, including
its relevance and application for HR Analytics.
The predominantly semantic nature of data generated
by HR processes was emphasised, making the case for
strong relevance of text analytics in the HR domain.
A live production example was demonstrated to
emphasise the real time constraints and creative semi
supervised approaches required, often under
supervision of operational HR, in order to achieve an
acceptable level of model accuracy.
This is the very reason, why text analytics models shall
continue to evolve as a very efficient decision support
system, however it cannot replace humans expertise
and capabilities in understanding the nuances of
human language.
Text
Analytics
in HR

More Related Content

What's hot

Steps in developing a valid competency model
Steps in developing a valid competency modelSteps in developing a valid competency model
Steps in developing a valid competency model
Al-Qurmoshi Institute of Business Management, Hyderabad
 
CBHRM Unit III-Competency Development & its Models.pdf
CBHRM Unit III-Competency Development & its Models.pdfCBHRM Unit III-Competency Development & its Models.pdf
CBHRM Unit III-Competency Development & its Models.pdf
MIT
 
Mis & Decision Making
Mis & Decision MakingMis & Decision Making
Mis & Decision MakingArun Mishra
 
Marketing Information System (MkIS)
Marketing Information System (MkIS)Marketing Information System (MkIS)
Marketing Information System (MkIS)
Hem Pokhrel
 
PESTLE Analysis of FMCG retail in India
PESTLE Analysis of FMCG retail in IndiaPESTLE Analysis of FMCG retail in India
PESTLE Analysis of FMCG retail in India
Meher Kalyani
 
A Comprehensive Project Report on HRIS
A Comprehensive Project Report on HRIS A Comprehensive Project Report on HRIS
A Comprehensive Project Report on HRIS
Radhika Gohel
 
KPO - Core Business Solutions Providers
KPO - Core Business Solutions ProvidersKPO - Core Business Solutions Providers
KPO - Core Business Solutions Providers
Deepika Ojha
 
Human resource information system
Human   resource information   systemHuman   resource information   system
Human resource information systemUjjwal 'Shanu'
 
PPT ON TCS
PPT ON TCSPPT ON TCS
PPT ON TCS
LakshaySachdeva9
 
Knowledge management, strategy and HRM
Knowledge management, strategy and HRMKnowledge management, strategy and HRM
Knowledge management, strategy and HRM
Mangesh Nawale
 
Competency mapping
Competency mappingCompetency mapping
Competency mapping
Suhail Muzafar
 
1345 keynote roberts
1345 keynote roberts1345 keynote roberts
1345 keynote roberts
Rising Media, Inc.
 
Knowledge process outsourcing ppt
Knowledge process outsourcing pptKnowledge process outsourcing ppt
Knowledge process outsourcing pptraxcool_2005
 
AI on HR
AI on HRAI on HR
AI on HR
Maria Hui
 
Unit i shrm
Unit i shrmUnit i shrm
Unit i shrm
Ashwini Hiremath
 
MIS Presentation
MIS PresentationMIS Presentation
MIS Presentation
Dhiren Gala
 
Organisational impacts of Knowledge Management on People, Processes, Products...
Organisational impacts of Knowledge Management on People, Processes, Products...Organisational impacts of Knowledge Management on People, Processes, Products...
Organisational impacts of Knowledge Management on People, Processes, Products...
Al-Qurmoshi Institute of Business Management, Hyderabad
 
It industry & tcs strategic analysis
It industry & tcs strategic analysisIt industry & tcs strategic analysis
It industry & tcs strategic analysis
Abhigyan Singh
 

What's hot (20)

Steps in developing a valid competency model
Steps in developing a valid competency modelSteps in developing a valid competency model
Steps in developing a valid competency model
 
CBHRM Unit III-Competency Development & its Models.pdf
CBHRM Unit III-Competency Development & its Models.pdfCBHRM Unit III-Competency Development & its Models.pdf
CBHRM Unit III-Competency Development & its Models.pdf
 
Mis & Decision Making
Mis & Decision MakingMis & Decision Making
Mis & Decision Making
 
Marketing Information System (MkIS)
Marketing Information System (MkIS)Marketing Information System (MkIS)
Marketing Information System (MkIS)
 
PESTLE Analysis of FMCG retail in India
PESTLE Analysis of FMCG retail in IndiaPESTLE Analysis of FMCG retail in India
PESTLE Analysis of FMCG retail in India
 
A Comprehensive Project Report on HRIS
A Comprehensive Project Report on HRIS A Comprehensive Project Report on HRIS
A Comprehensive Project Report on HRIS
 
KPO - Core Business Solutions Providers
KPO - Core Business Solutions ProvidersKPO - Core Business Solutions Providers
KPO - Core Business Solutions Providers
 
Human resource information system
Human   resource information   systemHuman   resource information   system
Human resource information system
 
PPT ON TCS
PPT ON TCSPPT ON TCS
PPT ON TCS
 
Intro shrm 1
Intro shrm 1Intro shrm 1
Intro shrm 1
 
Knowledge management, strategy and HRM
Knowledge management, strategy and HRMKnowledge management, strategy and HRM
Knowledge management, strategy and HRM
 
Competency mapping
Competency mappingCompetency mapping
Competency mapping
 
1345 keynote roberts
1345 keynote roberts1345 keynote roberts
1345 keynote roberts
 
Knowledge process outsourcing ppt
Knowledge process outsourcing pptKnowledge process outsourcing ppt
Knowledge process outsourcing ppt
 
AI on HR
AI on HRAI on HR
AI on HR
 
Unit i shrm
Unit i shrmUnit i shrm
Unit i shrm
 
Hr analytics
Hr analyticsHr analytics
Hr analytics
 
MIS Presentation
MIS PresentationMIS Presentation
MIS Presentation
 
Organisational impacts of Knowledge Management on People, Processes, Products...
Organisational impacts of Knowledge Management on People, Processes, Products...Organisational impacts of Knowledge Management on People, Processes, Products...
Organisational impacts of Knowledge Management on People, Processes, Products...
 
It industry & tcs strategic analysis
It industry & tcs strategic analysisIt industry & tcs strategic analysis
It industry & tcs strategic analysis
 

Similar to Text Analytics applied to HR

A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
Michael Mortenson
 
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
Michael Mortenson
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
Lifeng (Aaron) Han
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdf
Anime196637
 
Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...
IJECEIAES
 
Using People Analytics for a Sustainable Remote Workforce
Using People Analytics for a Sustainable Remote WorkforceUsing People Analytics for a Sustainable Remote Workforce
Using People Analytics for a Sustainable Remote Workforce
Harbinger Systems - HRTech Builder of Choice
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475
IJRAT
 
Rita Duncan Resume
Rita Duncan ResumeRita Duncan Resume
Rita Duncan ResumeRita Duncan
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
rajatkr
 
Ontology of KM technologies
Ontology of KM technologiesOntology of KM technologies
Ontology of KM technologies
Andre Saito
 
Session 1-5 HR Analytics in Perspectives .pptx
Session 1-5 HR Analytics in Perspectives .pptxSession 1-5 HR Analytics in Perspectives .pptx
Session 1-5 HR Analytics in Perspectives .pptx
drjeetasarkar
 
Information Mapping - Solutions For the Financial Services Industry
Information Mapping - Solutions For the Financial Services IndustryInformation Mapping - Solutions For the Financial Services Industry
Information Mapping - Solutions For the Financial Services Industry
Chris MacMillan
 
HR Analytics.pptx
HR Analytics.pptxHR Analytics.pptx
HR Analytics.pptx
KIRAN CHIPPALA
 
HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...
HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...
HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...
Farhan Shehab
 
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
DataScienceConferenc1
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET Journal
 
Best Data Science Online Training in Hyderabad
  Best Data Science Online Training in Hyderabad  Best Data Science Online Training in Hyderabad
Best Data Science Online Training in Hyderabad
bharathtsofttech
 
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGEUNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
Prasadu Peddi
 
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
ITC Infotech
 

Similar to Text Analytics applied to HR (20)

A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
 
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdf
 
Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...
 
Using People Analytics for a Sustainable Remote Workforce
Using People Analytics for a Sustainable Remote WorkforceUsing People Analytics for a Sustainable Remote Workforce
Using People Analytics for a Sustainable Remote Workforce
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475
 
Rita Duncan Resume
Rita Duncan ResumeRita Duncan Resume
Rita Duncan Resume
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
Ontology of KM technologies
Ontology of KM technologiesOntology of KM technologies
Ontology of KM technologies
 
Session 1-5 HR Analytics in Perspectives .pptx
Session 1-5 HR Analytics in Perspectives .pptxSession 1-5 HR Analytics in Perspectives .pptx
Session 1-5 HR Analytics in Perspectives .pptx
 
Information Mapping - Solutions For the Financial Services Industry
Information Mapping - Solutions For the Financial Services IndustryInformation Mapping - Solutions For the Financial Services Industry
Information Mapping - Solutions For the Financial Services Industry
 
HR Analytics.pptx
HR Analytics.pptxHR Analytics.pptx
HR Analytics.pptx
 
HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...
HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...
HRIS assignment on Application of MS Office software (Word, PowerPoint, Excel...
 
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Best Data Science Online Training in Hyderabad
  Best Data Science Online Training in Hyderabad  Best Data Science Online Training in Hyderabad
Best Data Science Online Training in Hyderabad
 
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGEUNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
UNDERSTAND SHORTTEXTS BY HARVESTING & ANALYZING SEMANTIKNOWLEDGE
 
Hranalytics goodone
Hranalytics goodoneHranalytics goodone
Hranalytics goodone
 
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
 

Text Analytics applied to HR

  • 1. TEXT ANALYTICS APPLIED TO HR How to transform HR via computational linguistics Presentation written and designed by Hendrik Feddersen and Raja Sengupta Frequency : TF and IDF
  • 2. xv "How can computational linguistics help business create value, drive action and impact without the hassle of going through intensive HR reporting? Presentation written and designed by Hendrik Feddersen and Raja Sengupta
  • 3. Hendrik Feddersen Head of HRIS European Medicines Agency My journey so far • SAP HCM implementation, problem solving, data cleaning, reporting, predictions and training colleagues • Chairman to more than 100 open selection procedures of about 150 candidates each • Continuous business improvements (Lean Six Sigma Green Belt certified) • Self-learning “R language” and Data Science • Passion for HR Analytics, writing posts on LinkedIn, tweets and connecting internationally
  • 4. Building the case for Text Analytics in HR Text or natural language is omnipresent. Human communication inherently involves languages, even our deepest internal thoughts can be documented via natural language. This talk itself could be transcribed and text processed for meaningful insights, the magnitude and scope of text analytics application is limitless. This enables NLP analysis with unique, unparalleled insights not limited to business alone but to the human intent and disposition itself, a key for HR.
  • 5. The History and Challenges of Text Analytics LATE STARTER Natural Language originates with genesis of Human civilization itself, yet text analytics has been a relatively slow- starter. RESOURCE INTENSIVE Inherent variation and volumetric content, makes analytics on human language an extremely resource intensive process. SEMANTIC DISTRIBUTION The mathematic challenge lies in mapping stochastic characteristics to adequately deterministic, based on the ZIPF’s law.
  • 6. Text Analytics Today THRUST OF INFORMATION TECHNOLOGY The thrust of information technology in the early part of this century has helped immensely the cause of text analytics. R&D IN STATISTICAL TAGGING OF TEXT Taking advantage of improved computing capabilities, vast repositories of libraries dedicated to statistical text modeling text were compiled, nosql type databases developed and extensive R&D in text analytics has been carried out. SEMANTIC CONTENT OF HR DATA Most HR processes are textual or unstructured, making thereby HR a key beneficiary of NLP research.
  • 7. Text Data generated by HR TEXT DATA FROM CORE HR PROCESSES Most core HR operations including application evaluation, selection process, appraisal form and management, 360 assessments, staff engagement surveys, employee feedbacks, etc., all generate a wealth of semantic data. OTHER TEXT DATA AVAILABLE TO HR In addition the professional and personnel social networking records like LinkedIn, Facebook, Twitter provide HR implicit access to potential capabilities and behavioural trends of individuals. TRANSCRIBED TEXT / VOICE TO TEXT Ongoing research of key relevance for HR.
  • 8. Applications of Text Analytics to HR process TEXT ANALYTICS MODELS ON HR PROCESSES DATA HELP IMPROVE • Applicant hiring & employee monitoring • Employee appraisals and feedback management • Employee welfare initiatives and complaint management systems • Creative HR based Insights via text based surveys • Early warnings and insights on potential behavioural trends and legal issues.
  • 9. Skepticism of operational HR towards Text Analytics • Traditional HRIS reporting systems suffice • Issues of integration with existing HRIS systems • Data protection and compliance • Learning, training and support • New technology skepticism • Complexity and potential failure • Potential job loss via automated text scoring models • Challenges of Multi lingual text environment • Inherent limitations of machines in understanding delicate nuances of human communication and behaviour
  • 10. Countering Skepticism of operational HR Traditional HRIS reporting does not cater to predictive and prescriptive modeling Modern HR Analytics systems seamlessly integrate with existing legacy HRIS systems Step wise evidence based production implementation HR Analytics cater for all existing European standard data protection and compliance Full integrated video based training and support along with implementation Text scoring models are a capable decision improvement system, however they are not a replacement for human expertise POS tagging in German, French and Italian seamlessly cater for Multi lingual text analytics
  • 11. What is Text Analytics? Brief Technical Overview Natural Language Processing NLP involves mathematical operations on linguistic units, i.e. words, expressions, sentences, paragraphs in various permutations and combinations in order to derive meaningful insights via their statistical tagging. Statistical Tagging Statistical tags include vectoring, tokenizing, filtering, stemming, lemmatization, n-grams and topicalisation, etc. Objective The objective is for machines to decipher the thought and intent vector behind human language via test of statistical significance of statistical tags and provide us with insights relevant to our interests.
  • 12. Basic Text Analytics routines relevant to HR process data Supervised Naive Bayes, SVM, Neural Networks, Decision Trees, Various ensembles Unsupervised Principle component analysis, Clustering via distance matrix’s Multivariate Semantic Outliers, Association Rules & FP-Growth Semi-Supervised Conditional Rules Based Dictionary Taggers
  • 13. Hidden Markov Models HMM can be considered to be an implementation of Bayesian joint probability distribution in text analytics, i.e. mapping the unobservable based on probabilistic ranking of the observable term, hence the use of the term hidden Latent Dirichlet Allocation LDA is a adoption of SVM, essentially a vector based decomposition method typically using a cosine distance matrix between words in the document. This is supposed to unravel the synonymy and polysemous relationships between words via statistical inference and help discover topics. Basic Text Analytics routines relevant to HR process data
  • 14. Live example of Text Analytics in HR Automated Resume Scoring – A real life HR Analytics example Business Problem A. Recruiters typically estimate job applicant fitment via their specific process expertise, B. This is often a tedious, manual time and consuming process, C. Quality of fitment estimations can be inconsistent and subject to bias. Solution Statistical text classification models, can improve upon A. Absolute accuracy of application fitment estimations, B. Overall consistency of correct application estimations, C. Improve the skills of recruiter, and assist in fine tuning searches, D. Greatly reduce time and effort.
  • 15. Automated Resume Scoring – A real life HR Analytics example Success Parameters of the Model: A. At least 80% of the accuracy of manual forecasting, taking a fraction of the time required, B. In addition to the above providing statistical visualization and insights via multivariate maps helping improve technical skills of the recruiter. Challenges (of this specific business case): A. Inadequacy of learner base (scored data) for the supervised classification model, B. A scattered data-set that required significant pre-processing in order to unify for analytics, C. Significant variation among the different job orders.
  • 16. Automated Resume Scoring – A real life HR Analytics example Methodology Employed - Technology The severe inadequacy of the learner base meant that typically used supervised classification models could not be employed. More than 15 text based information modules are scored for each candidate In addition to learner base inadequacy cross validation of the models would not have been reliable. Iterative semi-supervised conditional rules dictionary had to be built Once built the model is reusable for job orders with same (or similar) specifications. Our model can typically score 500 applications in less than 5 minutes! The estimated accuracy is 80%, corroborated via HR domain expert. The following step wise approach was used for building the model.
  • 17. Step 1: Structured Query Language Purpose : Consolidated text (string) data stored across multiple tables Output : Single unified XLS report
  • 18. Step 2: Near codeless (some java script) based ETL Data: Multiple XLS reports Purpose : ETL - Extract, Transform and Load to preprocess data for text analysis Output : Single unified XLS report
  • 19. Step 3: Co-occurrence network of maps [illustration only] Part-of-speech tagging : Noun, proper nouns, bi and tri gram tags Distance Metric : Jaccard’s similarity coefficient Frequency : Term frequency–inverse document frequency
  • 20. Step 4: Self organising maps POS tagging: Noun, proper nouns, bi and tri gram tags Distance Metric : Euclidean distance Frequency : TF and IDF
  • 21. Step 5: Kruskal Nonmetric Multidimensional analysis POS tagging: Noun, proper nouns, bi and tri gram tags Distance Metric : Mahalobnis distance Frequency : TF and IDF
  • 22. Step 6: Kruskal Non-metric Multi Dimensional Scaling POS tagging: Noun, proper nouns, bi and tri gram tags Distance Metric : Jaccard coefficient Frequency : TF and IDF
  • 23. Step 7: Latent Dirichlet allocation (LDA) Subtopic Discovery POS tagging: Noun, proper nouns, bi and tri gram tags Method : Parallel LDA Frequency : TF and IDF
  • 24. Step 8: Hierarchical Cluster Analysis POS tagging: Noun, proper nouns, bi and tri gram tags Method : Ward and Mahalanobis distance Frequency : TF and IDF
  • 25. Step 9: Conditional Rules Dictionary via python wrappers Data: Computational Linguistics from multidimensional maps Purpose : Creating coding dictionary to score application forms Output : Single unified XLS report of scored applications [for illustration only]
  • 26. Step 10 OUTPUT: Single unified XLS report of scored applications arranging in descending order
  • 27. Further info Raja Sengupta provided technical support for this presentation. Raja is a Computational Linguistics Professional with research background in applying text analytics to operational HR. One of the many solutions he is working on is KNOHR: A scoring system for job applicants, appraisals, text surveys, which can be adopted for semantic data analysis, while processing operational HR data. The system based on python natural language tool kit, integrates seamless with existing HRIS systems provides efficient decision support reports via a single click or periodically via batch process with over 80% cross validation accuracy. Raja Sengupta HR Analytics Developer
  • 28. Summary The intent here was to provide an overview of the body of work around natural language processing, including its relevance and application for HR Analytics. The predominantly semantic nature of data generated by HR processes was emphasised, making the case for strong relevance of text analytics in the HR domain. A live production example was demonstrated to emphasise the real time constraints and creative semi supervised approaches required, often under supervision of operational HR, in order to achieve an acceptable level of model accuracy. This is the very reason, why text analytics models shall continue to evolve as a very efficient decision support system, however it cannot replace humans expertise and capabilities in understanding the nuances of human language. Text Analytics in HR