2. Overview
The case study: Extracting job information from vacancies
• The problem: Modernizing job analysis
• The data: 500,000 online vacancies
• The use of a framework: knowledge from the job analysis field
• The techniques: feature extraction
• The results: Successful automatic categorization of job information
The review: text mining techniques and tasks in organizational research
• The task: Invitation for a special issue on big data in ORM
• The paper: Our structure so far
• The question: Feedback
3. The case study: Extracting job information from vacancies
The problem: Modernizing job analysis
Jobs are changing, but job analysis is lagging behind
• Seen as a tedious and expensive, but necessary task
• Not up to speed with the changes in work
• Accuracy of job analysis using job incumbents as a source is questioned
• Not taking advantage of the ‘big data’ opportunities
4. The case study: Extracting job information from vacancies
The data: 500,000 English online vacancies
An often overlooked rich source of job information
Could facilitate upscaling amount of data used in job analysis
5. The case study: Extracting job information from vacancies
The use of a framework: knowledge from the job analysis field
Skills can be extracted from job advertisements (Sodhi & Son, 2009; Smith & Ali, 2014)
Studies conducted in the field of Information Technologies with a focus on the use of
technologies
Need for a more deductive approach (George, Haas, & Pentland, 2014)
We go beyond this research by using knowledge from the job analysis field
We categorize job information based on the basic distinction between job attributes
and job activities (Sackett & Laczo, 2003)
First step toward the extraction of finer grained job information
6. The case study: Extracting job information from vacancies
The use of a framework: knowledge from the job analysis field
Categorization into job attributes and job activities
Use of manual labelling of 300 random vacancies (3,921 labelled sentences)
Based on definitions of the finer grained job features (either attribute or activity), such
as knowledge, abilities, tasks, responsibilities etc.
7. The case study: Extracting job information from vacancies
The techniques: Feature extraction
Feature Matrix
TEXT PREPROCESSING TEXT ENCODING
Text Preprocessing
• Sentence and word tokenization
• Lower case transformation
• Stopwords removal, e.g. the, and, etc
• Extra whitespace
• Lemmatization
Text Encoding
• Linguistic preprocessing, e.g. part of
speech (POS) tagging
F E A T U R E S
S E N T E N C E S
Job Vacancies Preprocessed Vacancies
8. The case study: Extracting job information from vacancies
Feature list
• Sentence Length (after removing certain words)
• POS of first word (job activity sentences usually start with a verb)
• First word (both kind of sentences often start with certain words)
• Last Word (job attribute sentences commonly end with certain words )
• Proportion of nouns and adjectives
• Proportion of verbs and TO
• Proportion of verbs followed by noun, verb, adjectives, adverb
• Frequent words
9. The case study: Extracting job information from vacancies
Application of Data Mining Techniques to the Feature Matrix
• Naïve Bayes
• Support Vector Machines
• Random Forest
The results: Successful automatic categorization of job information
At least 95% mean accuracy based on 10-fold cross validation
compared with the base classifier accuracy of 55%
10. The case study: Extracting job information from vacancies
Future work
• Semi-supervised labelling
• Finer classification
• Consideration of more features
11. The review: Text mining techniques in organizational research
The task: Invitation for a special issue on big data in ORM
Introduce the methods of text analysis to organizational scientists
Review of various techniques for mining textual data
The pros and cons of different approaches (best practices)
Illustrations from the current project on job analysis showing how
these procedures can be applied to a substantive area
12. The review: Text mining techniques in organizational research
The paper: Our structure so far
1. Introduction
Text data in organizational research and issues that could be solved with text mining
Introduce the case study on text mining in job analysis
2. Review of text mining techniques
Definitions and terminology
Text preprocessing
3 tasks done in text mining: classification, feature construction, and feature selection
Evaluating text mining results
13. The review: Text mining techniques in organizational research
The paper: Our structure so far
2. Review of text mining techniques
For each task
a) Text mining techniques applied to perform the tasks
b) Possibilities for applying Organizational frameworks
c) Advantages and disadvantages of these techniques illustrated with
examples from Organizational Research and other fields
d) Illustration from our case study
14. The review: Text mining techniques in organizational research
The paper: Our structure so far
3. Discussion of opportunities and challenges of text mining in Organizational Research
Opportunities such as extending the application of text mining to other problems in
Organizational Research (input?)
Challenges such as dealing with data size, access and protection of data, language
issues etc.
4. Conclusion
15. The review: Text mining techniques in organizational research
The question: Feedback
What problems you are dealing with right now (or in the past) that make use of text
data?
What are the opportunities that you see for text mining?
Which part of text mining would you like to learn more about?
Do you have experience in submitting a manuscript to ORM?
16. References
The question: Feedback
George, G., Haas, M.R. & Pentland, A. (2014). From the editors: Big Data and Management.
Academy of Management Journal, 57 (2), 321-326.
Sackett, P.R., & Laczo, R.M. (2003). Job and Work Analysis. In Comprehensive Handbook of
Psychology: Industrial and Organizational Psychology, vol. 12, ed. W.C. Borman, D.R. Ilgen,
& R.J. Klimoski, pp. 21-37. New York: Wiley.
Smith, D., & Ali, A. (2014). Analysing Computer Programming Job Trend Using Web Data Mining.
Issues in Informing Science and Information Technology, 11, 203-214.
Sodhi, M.S., & Son, B-G. (2009). Content Analysis of O.R. Job Advertisements to Infer Required Skills.
Journal of the Operational Research Society, 61, 1315-1327.