Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field

Ontology-guided Job Market Demand Analysis:
A Cross-Sectional Study for the Data Science field
Elisa Margareth Sibarani, Simon Scerri, Camilo Morales, Sören Auer, Diego Collarana
12/09/2017 - 13/09/2017
SEMANTiCS 2017, Amsterdam, Netherlands

2
Motivation (1)
 The prompt changes in the job market
 Continuous increase in new skills and new job profiles
 New challenges for
• Job Seekers  less informed about the demanded skills
• Educators  unable to offer courses that meet the expectations
Deep learning
Scikit-learn
TENSORFLOW
CAFFE
Theano
TORCH
MxNet
Puppet
Ansible Vagrant
……………

4
Goals
• Provide the most needed technical skills, use case: data science
field
• A cross-sectional analysis, focuses on a snapshot of demand (at a
point in time)
• Target user group:
• job seekers and applicants
• educators and training providers
• Utilize co-word analysis, to identify and structure relationships
among concepts
• serve as a basis for our future work on time-series analysis

5
Methodology
Preprocessing
(e.g., ontology
development)
Ontology-based
Information Extraction
(OBIE)
Quantitative Technique
(Co-word Analysis based
on Co-occurrence Matrix)
Keywords’ structure and
Strategic Diagram
(Trend Detection)
Evaluation
(F Measure,
Precision, Recall)
Ontology
Population

6
Literature Review (relevant state of the art)
• Research on co-word analysis [1, 5, 7, 17]
• The potential of utilizing co-word analysis for investigating job
adverts [8, 12, 13, 9, 10]
• A review on the research methodology of 70 researches in LIS [6]
• OBIE applications [14, 18]
• IT skill analysis [11, 15, 16, 19]

7
Novel contribution
• little research in the co-word analysis that explore OBIE for keyword
extraction
• provide SARO ontology, knowledge representation serves as a
reusable model
• implement OBIE methodology
• critical step in this study to cope with the “indexer effect” of co-word
analysis
• utilize co-word analysis for knowledge discovery
• reveal skill demands together with their internal and external correlations
with other skills

9
Skills and Recruitment Ontology (SARO)
• the extension of two relevant models
• ESCO (labor market and its skills)
• Schema.org (job openings in organizations)
• goal: enable analysis and reuse of related tasks to interpret job
postings in the context of skills
• top-level view of SARO
• saro:JobPosting, extends the so:JobPosting concept in Schema.org and
defines essential attributes – including so:datePosted, so:jobLocation,
so:hiringOrganization, in addition to the saro:describes to state the
saro:jobRole
• saro:Skill, extends the esco:Concept, categorizes skills as job-specific or
transversal (cross-sector)

11
Ontology-based Information Extraction (1)
• purpose: builds an index/keywords list as a basic pillar for co-word
analysis
• guided by an ontology (a certain model that specifies the objective
of the search)
• extracting pre-defined concepts and instances
• annotating text using concepts and instances
• why GATE?
• enable developers to implement flexible IE system
• supplies robust evaluation tools for NLP

12
Ontology-based Information Extraction (2)

13
Correlation Matrix Construction
• build a symmetric co-word matrix
• querying triples to retrieve occurrence and co-occurrence frequency
• transform co-word matrix into correlation matrix
• calculate equivalence index to measure strength of association between two
keywords
• 𝐸𝑖𝑗 =
𝐶𝑖𝑗
2
(𝐶𝑖 . 𝐶𝑗)
• 0 ≤ 𝐸𝑖𝑗 ≤ 1
• Cij = number of job adverts where the skill pair appears
• Ci = number of times that the keyword i is used to index a document
• Cj = number of times that the keyword j is used to index a document

14
Co-word Analysis for Job Adverts (1)

15
Co-word Analysis for Job Adverts (2)
Pass-1
Link with
>> index
Add
other
links
Choose
correspond
ing nodes
Check if
links >
threshold
NO
YES
Pass-2
Choose
link from
Pass-1
cluster
Add links
to other
pass1-
node
Choose
pass-1
node with
>> index
Check if
links >
threshold
NO
YES
Co-occurrence threshold
Max Pass-1 Link
Co-occurrence threshold
Max Pass-1 and
Pass-2 Link

16
1st Study and Evaluation: OBIE Method (1)
• The objectives is to evaluate:
• the adequacy of OBIE method
• its performance compared to manual human
extraction
• Data collection and pre-processing:
• 872 job adverts between August to November
2015, crawled from Adzuna.com
• 184 keywords (skills), ranging from 1 to 26 per
advert
• 20% of job adverts have at least 10 keywords,
95% had more than 1 keyword

17
1st Study and Evaluation: OBIE Method (2)
• lowest F-score: SkillTool
annotation.
• 86 total, 37 consistent, 16 unique to IAA
set, and 33 uniquely by OBIE method
• Investigate the results, possibilities
to improve:
• ambiguity remains a challenge
• skills unknown to SARO instances are
not annotated
• missing synonyms result in incomplete
annotations
Hudson in “Hudson Global Resources Limited
offers the services of an employment agency
…”, excel in “If this role sounds like something
which you could excel in, please do not hesitate
…”.,
only SQL is marked up in Microsoft SQL Server,
MsOffice and MsAccess are marked up, but
Microsoft Office and Access are missed

18
2nd Study: Co-word for Skill Analysis (1)
• The objectives:
• proof-of-concept, ahead of next stage research (time-series analysis to
detect trends)
• identify its potential to yield useful insights
• decided to drop off some generic skills
• ignore four frequently-occurring field words (Analyst, Analysis, Analytics,
Analytic)
• 180 skills under consideration

19
2nd Study: Co-word for Skill Analysis (2)
• The final co-occurrences observed, range from:
• 1 (1087 skill pairs only observed once in the job post sample)
• 167 (1 skill pair was observed in 167 job posts)
• To generate the clusters, by heuristic approach we define 2 variants:
• 1st variant: co-occurrence threshold ≥ 10, Pass1Links = 3, and MaxLinks = 5
• 2nd variant: co-occurrence threshold ≥ 15, Pass1Links = 5, and MaxLinks = 8
• Why those numbers?
• to generate clusters that are not incomprehensibly large
• not bound with very weak links

22
Conclusion
• OBIE method, guided by a domain vocabulary (SARO)
• Evaluation shows the resulting F-measure of automated extraction
fares very well
• the co-word analysis study serves as a proof-of-concept to identify
skill demand composition and trends
• all raw elicited information, higher-level abstractions and results,
available in a standard format (RDF)

23
Future Work
• OBIE method improvement
• Ontology Population
• skills synonym
• new skill instance
• add more JAPE grammar rules
• entities written in whole upper case (e.g. CVS, GO), starting in lower case and followed
by digits(e.g. k8s), or consisting of a mix of upper case and lower case letters (e.g. LeSS,
SaaS)
• perform time-series analysis and identify trends and shifts
• provide a Web-based User Interface

24
Contact
Elisa Margareth Sibarani
sibarani@cs.uni-bonn.de

Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field

Recommended

Recommended

More Related Content

Similar to Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field

Similar to Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field (20)

More from semanticsconference

More from semanticsconference (20)

Recently uploaded

Recently uploaded (20)

Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field

Editor's Notes