SlideShare a Scribd company logo
1 of 25
Ontology-guided Job Market Demand Analysis:
A Cross-Sectional Study for the Data Science field
Elisa Margareth Sibarani, Simon Scerri, Camilo Morales, Sören Auer, Diego Collarana
12/09/2017 - 13/09/2017
SEMANTiCS 2017, Amsterdam, Netherlands
2
Motivation (1)
 The prompt changes in the job market
 Continuous increase in new skills and new job profiles
 New challenges for
• Job Seekers  less informed about the demanded skills
• Educators  unable to offer courses that meet the expectations
Deep learning
Scikit-learn
TENSORFLOW
CAFFE
Theano
TORCH
MxNet
Puppet
Ansible Vagrant
……………
3
Motivation (2)
4
Goals
• Provide the most needed technical skills, use case: data science
field
• A cross-sectional analysis, focuses on a snapshot of demand (at a
point in time)
• Target user group:
• job seekers and applicants
• educators and training providers
• Utilize co-word analysis, to identify and structure relationships
among concepts
• serve as a basis for our future work on time-series analysis
5
Methodology
Preprocessing
(e.g., ontology
development)
Ontology-based
Information Extraction
(OBIE)
Quantitative Technique
(Co-word Analysis based
on Co-occurrence Matrix)
Keywords’ structure and
Strategic Diagram
(Trend Detection)
Evaluation
(F Measure,
Precision, Recall)
Ontology
Population
6
Literature Review (relevant state of the art)
• Research on co-word analysis [1, 5, 7, 17]
• The potential of utilizing co-word analysis for investigating job
adverts [8, 12, 13, 9, 10]
• A review on the research methodology of 70 researches in LIS [6]
• OBIE applications [14, 18]
• IT skill analysis [11, 15, 16, 19]
7
Novel contribution
• little research in the co-word analysis that explore OBIE for keyword
extraction
• provide SARO ontology, knowledge representation serves as a
reusable model
• implement OBIE methodology
• critical step in this study to cope with the “indexer effect” of co-word
analysis
• utilize co-word analysis for knowledge discovery
• reveal skill demands together with their internal and external correlations
with other skills
8
TOBIE Architecture
9
Skills and Recruitment Ontology (SARO)
• the extension of two relevant models
• ESCO (labor market and its skills)
• Schema.org (job openings in organizations)
• goal: enable analysis and reuse of related tasks to interpret job
postings in the context of skills
• top-level view of SARO
• saro:JobPosting, extends the so:JobPosting concept in Schema.org and
defines essential attributes – including so:datePosted, so:jobLocation,
so:hiringOrganization, in addition to the saro:describes to state the
saro:jobRole
• saro:Skill, extends the esco:Concept, categorizes skills as job-specific or
transversal (cross-sector)
10
SARO
11
Ontology-based Information Extraction (1)
• purpose: builds an index/keywords list as a basic pillar for co-word
analysis
• guided by an ontology (a certain model that specifies the objective
of the search)
• extracting pre-defined concepts and instances
• annotating text using concepts and instances
• why GATE?
• enable developers to implement flexible IE system
• supplies robust evaluation tools for NLP
12
Ontology-based Information Extraction (2)
13
Correlation Matrix Construction
• build a symmetric co-word matrix
• querying triples to retrieve occurrence and co-occurrence frequency
• transform co-word matrix into correlation matrix
• calculate equivalence index to measure strength of association between two
keywords
• 𝐸𝑖𝑗 =
𝐶𝑖𝑗
2
(𝐶𝑖 . 𝐶𝑗)
• 0 ≤ 𝐸𝑖𝑗 ≤ 1
• Cij = number of job adverts where the skill pair appears
• Ci = number of times that the keyword i is used to index a document
• Cj = number of times that the keyword j is used to index a document
14
Co-word Analysis for Job Adverts (1)
15
Co-word Analysis for Job Adverts (2)
Pass-1
Link with
>> index
Add
other
links
Choose
correspond
ing nodes
Check if
links >
threshold
NO
YES
Pass-2
Choose
link from
Pass-1
cluster
Add links
to other
pass1-
node
Choose
pass-1
node with
>> index
Check if
links >
threshold
NO
YES
Co-occurrence threshold
Max Pass-1 Link
Co-occurrence threshold
Max Pass-1 and
Pass-2 Link
16
1st Study and Evaluation: OBIE Method (1)
• The objectives is to evaluate:
• the adequacy of OBIE method
• its performance compared to manual human
extraction
• Data collection and pre-processing:
• 872 job adverts between August to November
2015, crawled from Adzuna.com
• 184 keywords (skills), ranging from 1 to 26 per
advert
• 20% of job adverts have at least 10 keywords,
95% had more than 1 keyword
17
1st Study and Evaluation: OBIE Method (2)
• lowest F-score: SkillTool
annotation.
• 86 total, 37 consistent, 16 unique to IAA
set, and 33 uniquely by OBIE method
• Investigate the results, possibilities
to improve:
• ambiguity remains a challenge
• skills unknown to SARO instances are
not annotated
• missing synonyms result in incomplete
annotations
Hudson in “Hudson Global Resources Limited
offers the services of an employment agency
…”, excel in “If this role sounds like something
which you could excel in, please do not hesitate
…”.,
only SQL is marked up in Microsoft SQL Server,
MsOffice and MsAccess are marked up, but
Microsoft Office and Access are missed
18
2nd Study: Co-word for Skill Analysis (1)
• The objectives:
• proof-of-concept, ahead of next stage research (time-series analysis to
detect trends)
• identify its potential to yield useful insights
• decided to drop off some generic skills
• ignore four frequently-occurring field words (Analyst, Analysis, Analytics,
Analytic)
• 180 skills under consideration
19
2nd Study: Co-word for Skill Analysis (2)
• The final co-occurrences observed, range from:
• 1 (1087 skill pairs only observed once in the job post sample)
• 167 (1 skill pair was observed in 167 job posts)
• To generate the clusters, by heuristic approach we define 2 variants:
• 1st variant: co-occurrence threshold ≥ 10, Pass1Links = 3, and MaxLinks = 5
• 2nd variant: co-occurrence threshold ≥ 15, Pass1Links = 5, and MaxLinks = 8
• Why those numbers?
• to generate clusters that are not incomprehensibly large
• not bound with very weak links
20
21
22
Conclusion
• OBIE method, guided by a domain vocabulary (SARO)
• Evaluation shows the resulting F-measure of automated extraction
fares very well
• the co-word analysis study serves as a proof-of-concept to identify
skill demand composition and trends
• all raw elicited information, higher-level abstractions and results,
available in a standard format (RDF)
23
Future Work
• OBIE method improvement
• Ontology Population
• skills synonym
• new skill instance
• add more JAPE grammar rules
• entities written in whole upper case (e.g. CVS, GO), starting in lower case and followed
by digits(e.g. k8s), or consisting of a mix of upper case and lower case letters (e.g. LeSS,
SaaS)
• perform time-series analysis and identify trends and shifts
• provide a Web-based User Interface
24
Contact
Elisa Margareth Sibarani
sibarani@cs.uni-bonn.de
25
Questions & Suggestions

More Related Content

Similar to Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field

Openbar Kontich Online // The Competences of the future: how we applied AI to...
Openbar Kontich Online // The Competences of the future: how we applied AI to...Openbar Kontich Online // The Competences of the future: how we applied AI to...
Openbar Kontich Online // The Competences of the future: how we applied AI to...Openbar
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyAbdel Salam Sayyad
 
Toward a Traceable, Explainable and fair JD/Resume Recommendation System
Toward a Traceable, Explainable and fair JD/Resume Recommendation SystemToward a Traceable, Explainable and fair JD/Resume Recommendation System
Toward a Traceable, Explainable and fair JD/Resume Recommendation SystemAmine Barrak
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)evabl444
 
Chapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhMayishaRahmanSparsha
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support SystemKavita Ganesan
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningIRJET Journal
 
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual ModellingSemantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual ModellingOscar Corcho
 
Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
 
'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...
'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...
'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...IIBA_Latvia_Chapter
 
PABRE: Pattern-Based Requirements Elicitation
PABRE: Pattern-Based Requirements ElicitationPABRE: Pattern-Based Requirements Elicitation
PABRE: Pattern-Based Requirements ElicitationGESSI UPC
 
Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...
Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...
Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...Daniel Mendez
 
Eduworks kick-off presentation: UvA - ABS
Eduworks kick-off presentation: UvA - ABSEduworks kick-off presentation: UvA - ABS
Eduworks kick-off presentation: UvA - ABSEduworks Network
 
Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Nane Kratzke
 
5_6134023428304274682.pptx
5_6134023428304274682.pptx5_6134023428304274682.pptx
5_6134023428304274682.pptxgamingpro22
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 

Similar to Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field (20)

Openbar Kontich Online // The Competences of the future: how we applied AI to...
Openbar Kontich Online // The Competences of the future: how we applied AI to...Openbar Kontich Online // The Competences of the future: how we applied AI to...
Openbar Kontich Online // The Competences of the future: how we applied AI to...
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature SurveyPareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
Pareto-Optimal Search-Based Software Engineering (POSBSE): A Literature Survey
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Toward a Traceable, Explainable and fair JD/Resume Recommendation System
Toward a Traceable, Explainable and fair JD/Resume Recommendation SystemToward a Traceable, Explainable and fair JD/Resume Recommendation System
Toward a Traceable, Explainable and fair JD/Resume Recommendation System
 
Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)Experimenting with eXtreme Design (EKAW2010)
Experimenting with eXtreme Design (EKAW2010)
 
Chapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter 5hjjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion Mining
 
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual ModellingSemantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
 
Class Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP TechniquesClass Diagram Extraction from Textual Requirements Using NLP Techniques
Class Diagram Extraction from Textual Requirements Using NLP Techniques
 
D017232729
D017232729D017232729
D017232729
 
'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...
'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...
'Patterns in Business Analysis and Enterprise Modeling: How to evaluate their...
 
PABRE: Pattern-Based Requirements Elicitation
PABRE: Pattern-Based Requirements ElicitationPABRE: Pattern-Based Requirements Elicitation
PABRE: Pattern-Based Requirements Elicitation
 
Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...
Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...
Artefact-based Requirements Engineering Improvement - Learning to Walk in Pra...
 
Eduworks kick-off presentation: UvA - ABS
Eduworks kick-off presentation: UvA - ABSEduworks kick-off presentation: UvA - ABS
Eduworks kick-off presentation: UvA - ABS
 
Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...
 
5_6134023428304274682.pptx
5_6134023428304274682.pptx5_6134023428304274682.pptx
5_6134023428304274682.pptx
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 

More from semanticsconference

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventuresemanticsconference
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...semanticsconference
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideationsemanticsconference
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance centersemanticsconference
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domainssemanticsconference
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi ressemanticsconference
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...semanticsconference
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017semanticsconference
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...semanticsconference
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...semanticsconference
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...semanticsconference
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....semanticsconference
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...semanticsconference
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...semanticsconference
 

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

Session 2.2 ontology-guided job market demand analysis: a cross-sectional study for the data science field

  • 1. Ontology-guided Job Market Demand Analysis: A Cross-Sectional Study for the Data Science field Elisa Margareth Sibarani, Simon Scerri, Camilo Morales, Sören Auer, Diego Collarana 12/09/2017 - 13/09/2017 SEMANTiCS 2017, Amsterdam, Netherlands
  • 2. 2 Motivation (1)  The prompt changes in the job market  Continuous increase in new skills and new job profiles  New challenges for • Job Seekers  less informed about the demanded skills • Educators  unable to offer courses that meet the expectations Deep learning Scikit-learn TENSORFLOW CAFFE Theano TORCH MxNet Puppet Ansible Vagrant ……………
  • 4. 4 Goals • Provide the most needed technical skills, use case: data science field • A cross-sectional analysis, focuses on a snapshot of demand (at a point in time) • Target user group: • job seekers and applicants • educators and training providers • Utilize co-word analysis, to identify and structure relationships among concepts • serve as a basis for our future work on time-series analysis
  • 5. 5 Methodology Preprocessing (e.g., ontology development) Ontology-based Information Extraction (OBIE) Quantitative Technique (Co-word Analysis based on Co-occurrence Matrix) Keywords’ structure and Strategic Diagram (Trend Detection) Evaluation (F Measure, Precision, Recall) Ontology Population
  • 6. 6 Literature Review (relevant state of the art) • Research on co-word analysis [1, 5, 7, 17] • The potential of utilizing co-word analysis for investigating job adverts [8, 12, 13, 9, 10] • A review on the research methodology of 70 researches in LIS [6] • OBIE applications [14, 18] • IT skill analysis [11, 15, 16, 19]
  • 7. 7 Novel contribution • little research in the co-word analysis that explore OBIE for keyword extraction • provide SARO ontology, knowledge representation serves as a reusable model • implement OBIE methodology • critical step in this study to cope with the “indexer effect” of co-word analysis • utilize co-word analysis for knowledge discovery • reveal skill demands together with their internal and external correlations with other skills
  • 9. 9 Skills and Recruitment Ontology (SARO) • the extension of two relevant models • ESCO (labor market and its skills) • Schema.org (job openings in organizations) • goal: enable analysis and reuse of related tasks to interpret job postings in the context of skills • top-level view of SARO • saro:JobPosting, extends the so:JobPosting concept in Schema.org and defines essential attributes – including so:datePosted, so:jobLocation, so:hiringOrganization, in addition to the saro:describes to state the saro:jobRole • saro:Skill, extends the esco:Concept, categorizes skills as job-specific or transversal (cross-sector)
  • 11. 11 Ontology-based Information Extraction (1) • purpose: builds an index/keywords list as a basic pillar for co-word analysis • guided by an ontology (a certain model that specifies the objective of the search) • extracting pre-defined concepts and instances • annotating text using concepts and instances • why GATE? • enable developers to implement flexible IE system • supplies robust evaluation tools for NLP
  • 13. 13 Correlation Matrix Construction • build a symmetric co-word matrix • querying triples to retrieve occurrence and co-occurrence frequency • transform co-word matrix into correlation matrix • calculate equivalence index to measure strength of association between two keywords • 𝐸𝑖𝑗 = 𝐶𝑖𝑗 2 (𝐶𝑖 . 𝐶𝑗) • 0 ≤ 𝐸𝑖𝑗 ≤ 1 • Cij = number of job adverts where the skill pair appears • Ci = number of times that the keyword i is used to index a document • Cj = number of times that the keyword j is used to index a document
  • 14. 14 Co-word Analysis for Job Adverts (1)
  • 15. 15 Co-word Analysis for Job Adverts (2) Pass-1 Link with >> index Add other links Choose correspond ing nodes Check if links > threshold NO YES Pass-2 Choose link from Pass-1 cluster Add links to other pass1- node Choose pass-1 node with >> index Check if links > threshold NO YES Co-occurrence threshold Max Pass-1 Link Co-occurrence threshold Max Pass-1 and Pass-2 Link
  • 16. 16 1st Study and Evaluation: OBIE Method (1) • The objectives is to evaluate: • the adequacy of OBIE method • its performance compared to manual human extraction • Data collection and pre-processing: • 872 job adverts between August to November 2015, crawled from Adzuna.com • 184 keywords (skills), ranging from 1 to 26 per advert • 20% of job adverts have at least 10 keywords, 95% had more than 1 keyword
  • 17. 17 1st Study and Evaluation: OBIE Method (2) • lowest F-score: SkillTool annotation. • 86 total, 37 consistent, 16 unique to IAA set, and 33 uniquely by OBIE method • Investigate the results, possibilities to improve: • ambiguity remains a challenge • skills unknown to SARO instances are not annotated • missing synonyms result in incomplete annotations Hudson in “Hudson Global Resources Limited offers the services of an employment agency …”, excel in “If this role sounds like something which you could excel in, please do not hesitate …”., only SQL is marked up in Microsoft SQL Server, MsOffice and MsAccess are marked up, but Microsoft Office and Access are missed
  • 18. 18 2nd Study: Co-word for Skill Analysis (1) • The objectives: • proof-of-concept, ahead of next stage research (time-series analysis to detect trends) • identify its potential to yield useful insights • decided to drop off some generic skills • ignore four frequently-occurring field words (Analyst, Analysis, Analytics, Analytic) • 180 skills under consideration
  • 19. 19 2nd Study: Co-word for Skill Analysis (2) • The final co-occurrences observed, range from: • 1 (1087 skill pairs only observed once in the job post sample) • 167 (1 skill pair was observed in 167 job posts) • To generate the clusters, by heuristic approach we define 2 variants: • 1st variant: co-occurrence threshold ≥ 10, Pass1Links = 3, and MaxLinks = 5 • 2nd variant: co-occurrence threshold ≥ 15, Pass1Links = 5, and MaxLinks = 8 • Why those numbers? • to generate clusters that are not incomprehensibly large • not bound with very weak links
  • 20. 20
  • 21. 21
  • 22. 22 Conclusion • OBIE method, guided by a domain vocabulary (SARO) • Evaluation shows the resulting F-measure of automated extraction fares very well • the co-word analysis study serves as a proof-of-concept to identify skill demand composition and trends • all raw elicited information, higher-level abstractions and results, available in a standard format (RDF)
  • 23. 23 Future Work • OBIE method improvement • Ontology Population • skills synonym • new skill instance • add more JAPE grammar rules • entities written in whole upper case (e.g. CVS, GO), starting in lower case and followed by digits(e.g. k8s), or consisting of a mix of upper case and lower case letters (e.g. LeSS, SaaS) • perform time-series analysis and identify trends and shifts • provide a Web-based User Interface

Editor's Notes

  1. Scope Portrays the skills in demand at a single point in time Co-word analysis: when two skills often appear together, a high chance that both skills are strongly related Occurrence and co-occurrence frequency of skills pairs How to extract keywords prior to the co-word analysis?  Ontology-based Information Extraction (OBIE) The OBIE pipeline exploits an ontology: Skills and Recruitment Ontology (SARO)
  2. Research on co-word analysis [1, 5, 7, 17] reduces and projects data into a specific visual representation The potential of utilizing co-word analysis for investigating job adverts library and information science (LIS) [8, 12, 13] information systems (IS) [9, 10] A review on the research methodology of 70 researches in LIS [6] 3 out of 70 used automatic text analysis 18 out of 70 used inferential statistics OBIE applications [14, 18] IT skill analysis [11, 15, 16, 19]
  3. What is TOBIE? Framework for semantically extract skills and information related to a job posting, and analyze the trends and changes TOBIE relies on a vocabulary to guide the information extraction process (http://vocol.iais.fraunhofer.de/saro/) TOBIE employs the co-word analysis to dynamically map the skills demand and its structure (job role inquiry)
  4. OBIE pipeline accepts as input: job adverts dataset (JSON or XML) SARO ontology OBIE pipeline consists of: linguistic analysis components (pre-processing) named-entity recognizer based on the ontology transducer with JAPE grammar rules using SILK framework: convert OBIE result in XML to RDF load RDF triples to KB
  5. a technique which exploits the use of co-occurrence phenomena skills that occur together frequently are assumed to be related the strength of that relationship related to the co-occurrence frequency networks of these co-occurring phenomena are constructed utilizes the Paris/Keele method for cluster analysis perfect for large and heterogeneous datasets for large number of keywords, this method is simpler and easier to present the assumption: there is a cluster-type structure, without an obligation to specify number of clusters and that all keywords should be included in the cluster construction Networks of these co-occurring phenomena are constructed, and then maps of evolving skills sets are generated using the link-node values of the networks. With these maps of skill structure and evolution, the labour market policy analyst/educators can develop a deeper understanding of the interrelationships among the different skill sets and the impacts of external intervention, and can recommend new directions for more desirable curricula. @Inbook{Kostoff1993, author="Kostoff, Ronald N.", editor="Bozeman, Barry and Melkers, Julia", title="Co-Word Analysis", bookTitle="Evaluating R{\&}D Impacts: Methods and Practice", year="1993", publisher="Springer US", address="Boston, MA", pages="63--78", abstract="In formulating and executing broad spectrum research policy, it is important to understand how research thrusts have interrelated and evolved over time, how they are projected to evolve, and how different types of interventions from sponsors and policymakers can affect the evolution and impact of research. While a panel of experts could provide an acceptable view of the trends and interrelationships within a narrowly-defined research area, identification of the connectivity of a broad range of areas is well beyond the expertise of any one panel of experts, and perhaps beyond a group of panels. An integration of topics and trends requires supplementation to the standard peer or analyst group evaluation. Much recent effort has been focused on development of more objective quantitative approaches for analyzing and integrating written and survey information to supplement analysts or groups of peers in understanding research trends.", isbn="978-1-4757-5182-6", doi="10.1007/978-1-4757-5182-6_4", url="https://doi.org/10.1007/978-1-4757-5182-6_4" }
  6. the algorithm, based on threshold of co-occurrence frequency and number of links in one cluster (sub-network): Pass-1: starting linked-nodes: choose the link with the highest co-efficient added other links and their corresponding nodes in decreasing order of the co-efficient based on Breadth First Search (BFS) stop when no more links exceed the threshold or maximum pass-1 link is exceeded Pass-2: extend each Pass-1 sub-network by adding links between nodes in different cluster stop when no link meets the co-occurrence threshold, or total link Pass-2 is met Result: a keyword’s structure/map & strategic diagram
  7. Gold standard and Inter-Annotator Agreement two annotators annotate random sample of 50 job adverts strict IAA F-score = 94% then, manually annotated two additional and different samples of 50 job posts. generate the gold-standard of manually annotated job posts OBIE evaluation: OBIE was executed for the 150 posts, containing 1,760 sentences and 53,577 tokens Using the Corpus Quality Assurance tool in GATE to calculate precision, recall, and the F1-score