SlideShare a Scribd company logo
Validator and preview 
for the JobPosting data model 
of Schema.org 
Jindřich Mynarz 
Department of Information and Knowledge 
Engineering, 
University of Economics, Prague 
EC-WEB 2014, September 2, 2014
Motivation 
● Improving usability of vocabularies 
● Provide feedback on the use of 
vocabularies 
● Make vocabulary specification executable 
● Help ensure basic level of data quality 
● Capture application-specific requirements 
for data in validation rules
DámePráci.eu project 
“Matching jobs with unemployed 
through semantic data” 
Data model using Schema.org with 
an extension for the job market. 
Application for searching through job postings 
aggregated from distinct sources: 
www.damepraci.cz (in Czech)
Validation method 
● Rule-based, schema-aware 
validation 
● Operates in the RDF data model 
● Focuses on semantic errors, beyond well-formed 
markup 
● Partial open world assumption 
● Implemented as SPARQL 1.1 CONSTRUCT 
queries 
● Error reporting via SPIN RDF vocabulary
Background knowledge 
schema.org 
+ extension for job market (RDFS) 
+ external enumerations: 
● ISO 4217 currency codes (SKOS) 
● ISO 639-1 language codes (SKOS) 
Loaded in separate named graphs that the 
validation rules can reference.
Validation rules 
● Data completeness 
● Distinction between datatype and object 
properties 
● Conflicting data 
● Datatype violations 
● Invalid codes
Data completeness 
● At least 1 instance 
of schema:JobPosting 
● Other type information (class membership, 
datatypes) left optional 
● Empty literals 
● Conditionally required data (e.g., 
compensation + currency)
Distinction between datatype 
and object properties 
● Object properties with literal objects instead 
of URIs or blank nodes (and vice versa for 
datatype properties) 
● Simpler syntax of datatype 
properties 
○ Avoiding nested objects or difficulties with finding an 
object's URI 
● May be a symptom of incorrectly nested 
HTML elements
Conflicting data 
● Mutually-exclusive properties 
○ schema:jobLocation 
+ schema:isRemoteWork true 
● Cardinality violation for functional properties 
with > 1 object 
○ schema:startDate, schema:currency, schema: 
availableVacancies 
● Incompatible class membership inferences 
○ schema:domainIncludes, schema:rangeIncludes 
○ Incompatible class membership is instantiation of 2+ 
distinct classes that are not in rdfs:subClassOf 
relation.
Datatype violations 
● Regular expressions, casting errors 
of XPath datatype constructor functions 
● Date and time formats (xsd:date, xsd: 
duration) 
○ Not conforming to regular expressions 
○ Non-existent dates 
○ Dates from the future 
● Interval limits 
○ Positive integers for schema:availableVacancies
Invalid codes 
● Based on lookup in code lists enumerating 
every valid code 
● Includes language codes (ISO 639-1) and 
currency codes (ISO 4217)
Implementation 
Ruby on Rails web application 
backed by Jena Fuseki SPARQL 1.1 endpoint. 
● Validates both RDFa and HTML5 Microdata 
● Czech and English localization 
● Validation results in HTML or JSON-LD 
● RSpec tests for each validation rule 
● Open source: https://github.com/OPLZZ/job-posting-validator
Demo: bit.ly/broken-job-posting
Preview
Experimental validation 
of a JobPosting corpus 
● 1332 seed URLs from 752 distinct 
pay-level domains obtained via Google 
Custom Search Engine restricted to schema: 
JobPosting 
● Sample of 42 872 web pages obtained 
by crawling seed URLs 
● Each page validated, validation results 
in JSON-LD loaded to Elasticsearch 
for exploration
Most common errors
Datatype property used 
as object property 
Most common path to error: schema:title 
Possible cause: incorrect understanding of 
markup precedence rules: 
<a property="title" href="#title">SEO guru</a> 
[] schema:title <#title> . 
[] schema:title "SEO guru" .
Empty literal value 
Most common path to error: schema: 
addressRegion 
Possible cause: incomplete data used to 
generate HTML from fixed templates 
Less common in manually marked-up HTML
Incorrect character case 
in schema:Postaladdress 
Both RDFa and HTML5 Microdata are case-sensitive. 
Spread across 116 unique PLDs. 
“The default mode of authoring [Schema.org 
markup] is copy and edit.” — R.V. Guha
Object property used 
as datatype property 
Most common path to error: schema:jobLocation 
Common cause: simpler markup without intermediate 
resources 
<p property="jobLocation"> 
<p rel="jobLocation"> 
Munich 
<p rel="address"> 
</p> 
<p property= 
"addressLocality"> 
Munich 
</p> 
</p> 
</p>
Unsuccessful experiments 
Web Data Commons 
● Errors smoothed by extraction to RDF 
● Not suitable as a source of seed URLs: job 
postings disappear quickly 
Veterans Job Bank 
● Data from few PLDs, lacks variety 
● Severe restrictions on automated downloads 
through its API
Questions? 
Acknowledgements: 
The presented research was partially supported by the project 
of Operational Programme Human Resources and Employment no. CZ. 
1.04/5.1.01/77.00440. 
Image credits: 
Check List designed by Arthur Shlain from the thenounproject.com 
Puzzle designed by John from the thenounproject.com

More Related Content

What's hot

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
Ontotext
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Open Knowledge Belgium
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
Martin Necasky
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual Analytics
Alan Dix
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
Freire model api
Freire model apiFreire model api
Freire model api
The European Library
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
Sören Auer
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
Graph-TA
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
Armin Haller
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
Peter Haase
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
Open Knowledge Belgium
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)
Ivan Ermilov
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
Michele Pasin
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
Sören Auer
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
University of Toronto Libraries - Information Technology Services
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
Robert Sanderson
 

What's hot (20)

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual Analytics
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Freire model api
Freire model apiFreire model api
Freire model api
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
 

Viewers also liked

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
Tiago Malheiros
 
Pensar Digital
Pensar DigitalPensar Digital
Pensar Digital
Rui Rocha Costa
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
Inês Silva
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 Mockup
Ivo Gomes
 
Bash Introduction
Bash IntroductionBash Introduction
Bash Introduction
André Santos
 
Incubate Camp 2nd
Incubate Camp 2ndIncubate Camp 2nd
Incubate Camp 2nd
Keisuke Wada
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approach
Tomé Duarte
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012
Nuno Freitas
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
Fábio Oliveira
 
Set n'match
Set n'matchSet n'match
Set n'match
Pedro Santos
 
GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0
Gonçalo Henriques
 
Apresentação
ApresentaçãoApresentação
Apresentação
Pedro Bré
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
Marta Pinto
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)
Pedro Moura
 
Niiiws short
Niiiws short Niiiws short
Niiiws short
João Lopes Martins
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Nuno Rosa
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs Break
Pedro Oliveira
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech products
Sérgio Santos
 
Beta start @ beside
Beta start @ besideBeta start @ beside
Beta start @ beside
Sofia Pessanha
 

Viewers also liked (20)

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
 
Pensar Digital
Pensar DigitalPensar Digital
Pensar Digital
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 Mockup
 
Bash Introduction
Bash IntroductionBash Introduction
Bash Introduction
 
Prosolvers CH
Prosolvers CHProsolvers CH
Prosolvers CH
 
Incubate Camp 2nd
Incubate Camp 2ndIncubate Camp 2nd
Incubate Camp 2nd
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approach
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
 
Set n'match
Set n'matchSet n'match
Set n'match
 
GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0
 
Apresentação
ApresentaçãoApresentação
Apresentação
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)
 
Niiiws short
Niiiws short Niiiws short
Niiiws short
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs Break
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech products
 
Beta start @ beside
Beta start @ besideBeta start @ beside
Beta start @ beside
 

Similar to EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
Maciek Próchniak
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 
API
APIAPI
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
RomanaPernischov
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
Nathanial McConnell
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
Chris Mungall
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
DataScienceConferenc1
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
Gregg Kellogg
 
Linked services
Linked servicesLinked services
Linked services
Carlos Pedrinaci
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
Franz Inc. - AllegroGraph
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
Konstantin Ivinsky
 
L18 Object Relational Mapping
L18 Object Relational MappingL18 Object Relational Mapping
L18 Object Relational Mapping
Ólafur Andri Ragnarsson
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
Josef Hardi
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
mattthemathman
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
Dimitris Kontokostas
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
Prashant Bhargava
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
Abdul Rahman Masri Attal
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 

Similar to EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org (20)

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
API
APIAPI
API
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
 
Linked services
Linked servicesLinked services
Linked services
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
 
L18 Object Relational Mapping
L18 Object Relational MappingL18 Object Relational Mapping
L18 Object Relational Mapping
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 

Recently uploaded

How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
chetankumar9855
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 

Recently uploaded (20)

How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...Amul milk launches in US: Key details of its new products ...
Amul milk launches in US: Key details of its new products ...
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 

EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

  • 1. Validator and preview for the JobPosting data model of Schema.org Jindřich Mynarz Department of Information and Knowledge Engineering, University of Economics, Prague EC-WEB 2014, September 2, 2014
  • 2. Motivation ● Improving usability of vocabularies ● Provide feedback on the use of vocabularies ● Make vocabulary specification executable ● Help ensure basic level of data quality ● Capture application-specific requirements for data in validation rules
  • 3. DámePráci.eu project “Matching jobs with unemployed through semantic data” Data model using Schema.org with an extension for the job market. Application for searching through job postings aggregated from distinct sources: www.damepraci.cz (in Czech)
  • 4. Validation method ● Rule-based, schema-aware validation ● Operates in the RDF data model ● Focuses on semantic errors, beyond well-formed markup ● Partial open world assumption ● Implemented as SPARQL 1.1 CONSTRUCT queries ● Error reporting via SPIN RDF vocabulary
  • 5. Background knowledge schema.org + extension for job market (RDFS) + external enumerations: ● ISO 4217 currency codes (SKOS) ● ISO 639-1 language codes (SKOS) Loaded in separate named graphs that the validation rules can reference.
  • 6. Validation rules ● Data completeness ● Distinction between datatype and object properties ● Conflicting data ● Datatype violations ● Invalid codes
  • 7. Data completeness ● At least 1 instance of schema:JobPosting ● Other type information (class membership, datatypes) left optional ● Empty literals ● Conditionally required data (e.g., compensation + currency)
  • 8. Distinction between datatype and object properties ● Object properties with literal objects instead of URIs or blank nodes (and vice versa for datatype properties) ● Simpler syntax of datatype properties ○ Avoiding nested objects or difficulties with finding an object's URI ● May be a symptom of incorrectly nested HTML elements
  • 9. Conflicting data ● Mutually-exclusive properties ○ schema:jobLocation + schema:isRemoteWork true ● Cardinality violation for functional properties with > 1 object ○ schema:startDate, schema:currency, schema: availableVacancies ● Incompatible class membership inferences ○ schema:domainIncludes, schema:rangeIncludes ○ Incompatible class membership is instantiation of 2+ distinct classes that are not in rdfs:subClassOf relation.
  • 10. Datatype violations ● Regular expressions, casting errors of XPath datatype constructor functions ● Date and time formats (xsd:date, xsd: duration) ○ Not conforming to regular expressions ○ Non-existent dates ○ Dates from the future ● Interval limits ○ Positive integers for schema:availableVacancies
  • 11. Invalid codes ● Based on lookup in code lists enumerating every valid code ● Includes language codes (ISO 639-1) and currency codes (ISO 4217)
  • 12. Implementation Ruby on Rails web application backed by Jena Fuseki SPARQL 1.1 endpoint. ● Validates both RDFa and HTML5 Microdata ● Czech and English localization ● Validation results in HTML or JSON-LD ● RSpec tests for each validation rule ● Open source: https://github.com/OPLZZ/job-posting-validator
  • 15. Experimental validation of a JobPosting corpus ● 1332 seed URLs from 752 distinct pay-level domains obtained via Google Custom Search Engine restricted to schema: JobPosting ● Sample of 42 872 web pages obtained by crawling seed URLs ● Each page validated, validation results in JSON-LD loaded to Elasticsearch for exploration
  • 17. Datatype property used as object property Most common path to error: schema:title Possible cause: incorrect understanding of markup precedence rules: <a property="title" href="#title">SEO guru</a> [] schema:title <#title> . [] schema:title "SEO guru" .
  • 18. Empty literal value Most common path to error: schema: addressRegion Possible cause: incomplete data used to generate HTML from fixed templates Less common in manually marked-up HTML
  • 19. Incorrect character case in schema:Postaladdress Both RDFa and HTML5 Microdata are case-sensitive. Spread across 116 unique PLDs. “The default mode of authoring [Schema.org markup] is copy and edit.” — R.V. Guha
  • 20. Object property used as datatype property Most common path to error: schema:jobLocation Common cause: simpler markup without intermediate resources <p property="jobLocation"> <p rel="jobLocation"> Munich <p rel="address"> </p> <p property= "addressLocality"> Munich </p> </p> </p>
  • 21. Unsuccessful experiments Web Data Commons ● Errors smoothed by extraction to RDF ● Not suitable as a source of seed URLs: job postings disappear quickly Veterans Job Bank ● Data from few PLDs, lacks variety ● Severe restrictions on automated downloads through its API
  • 22. Questions? Acknowledgements: The presented research was partially supported by the project of Operational Programme Human Resources and Employment no. CZ. 1.04/5.1.01/77.00440. Image credits: Check List designed by Arthur Shlain from the thenounproject.com Puzzle designed by John from the thenounproject.com