SlideShare a Scribd company logo
Validator and preview 
for the JobPosting data model 
of Schema.org 
Jindřich Mynarz 
Department of Information and Knowledge 
Engineering, 
University of Economics, Prague 
EC-WEB 2014, September 2, 2014
Motivation 
● Improving usability of vocabularies 
● Provide feedback on the use of 
vocabularies 
● Make vocabulary specification executable 
● Help ensure basic level of data quality 
● Capture application-specific requirements 
for data in validation rules
DámePráci.eu project 
“Matching jobs with unemployed 
through semantic data” 
Data model using Schema.org with 
an extension for the job market. 
Application for searching through job postings 
aggregated from distinct sources: 
www.damepraci.cz (in Czech)
Validation method 
● Rule-based, schema-aware 
validation 
● Operates in the RDF data model 
● Focuses on semantic errors, beyond well-formed 
markup 
● Partial open world assumption 
● Implemented as SPARQL 1.1 CONSTRUCT 
queries 
● Error reporting via SPIN RDF vocabulary
Background knowledge 
schema.org 
+ extension for job market (RDFS) 
+ external enumerations: 
● ISO 4217 currency codes (SKOS) 
● ISO 639-1 language codes (SKOS) 
Loaded in separate named graphs that the 
validation rules can reference.
Validation rules 
● Data completeness 
● Distinction between datatype and object 
properties 
● Conflicting data 
● Datatype violations 
● Invalid codes
Data completeness 
● At least 1 instance 
of schema:JobPosting 
● Other type information (class membership, 
datatypes) left optional 
● Empty literals 
● Conditionally required data (e.g., 
compensation + currency)
Distinction between datatype 
and object properties 
● Object properties with literal objects instead 
of URIs or blank nodes (and vice versa for 
datatype properties) 
● Simpler syntax of datatype 
properties 
○ Avoiding nested objects or difficulties with finding an 
object's URI 
● May be a symptom of incorrectly nested 
HTML elements
Conflicting data 
● Mutually-exclusive properties 
○ schema:jobLocation 
+ schema:isRemoteWork true 
● Cardinality violation for functional properties 
with > 1 object 
○ schema:startDate, schema:currency, schema: 
availableVacancies 
● Incompatible class membership inferences 
○ schema:domainIncludes, schema:rangeIncludes 
○ Incompatible class membership is instantiation of 2+ 
distinct classes that are not in rdfs:subClassOf 
relation.
Datatype violations 
● Regular expressions, casting errors 
of XPath datatype constructor functions 
● Date and time formats (xsd:date, xsd: 
duration) 
○ Not conforming to regular expressions 
○ Non-existent dates 
○ Dates from the future 
● Interval limits 
○ Positive integers for schema:availableVacancies
Invalid codes 
● Based on lookup in code lists enumerating 
every valid code 
● Includes language codes (ISO 639-1) and 
currency codes (ISO 4217)
Implementation 
Ruby on Rails web application 
backed by Jena Fuseki SPARQL 1.1 endpoint. 
● Validates both RDFa and HTML5 Microdata 
● Czech and English localization 
● Validation results in HTML or JSON-LD 
● RSpec tests for each validation rule 
● Open source: https://github.com/OPLZZ/job-posting-validator
Demo: bit.ly/broken-job-posting
Preview
Experimental validation 
of a JobPosting corpus 
● 1332 seed URLs from 752 distinct 
pay-level domains obtained via Google 
Custom Search Engine restricted to schema: 
JobPosting 
● Sample of 42 872 web pages obtained 
by crawling seed URLs 
● Each page validated, validation results 
in JSON-LD loaded to Elasticsearch 
for exploration
Most common errors
Datatype property used 
as object property 
Most common path to error: schema:title 
Possible cause: incorrect understanding of 
markup precedence rules: 
<a property="title" href="#title">SEO guru</a> 
[] schema:title <#title> . 
[] schema:title "SEO guru" .
Empty literal value 
Most common path to error: schema: 
addressRegion 
Possible cause: incomplete data used to 
generate HTML from fixed templates 
Less common in manually marked-up HTML
Incorrect character case 
in schema:Postaladdress 
Both RDFa and HTML5 Microdata are case-sensitive. 
Spread across 116 unique PLDs. 
“The default mode of authoring [Schema.org 
markup] is copy and edit.” — R.V. Guha
Object property used 
as datatype property 
Most common path to error: schema:jobLocation 
Common cause: simpler markup without intermediate 
resources 
<p property="jobLocation"> 
<p rel="jobLocation"> 
Munich 
<p rel="address"> 
</p> 
<p property= 
"addressLocality"> 
Munich 
</p> 
</p> 
</p>
Unsuccessful experiments 
Web Data Commons 
● Errors smoothed by extraction to RDF 
● Not suitable as a source of seed URLs: job 
postings disappear quickly 
Veterans Job Bank 
● Data from few PLDs, lacks variety 
● Severe restrictions on automated downloads 
through its API
Questions? 
Acknowledgements: 
The presented research was partially supported by the project 
of Operational Programme Human Resources and Employment no. CZ. 
1.04/5.1.01/77.00440. 
Image credits: 
Check List designed by Arthur Shlain from the thenounproject.com 
Puzzle designed by John from the thenounproject.com

More Related Content

What's hot

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
Ontotext
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Open Knowledge Belgium
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
Martin Necasky
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual Analytics
Alan Dix
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
Andrea Nuzzolese
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
Freire model api
Freire model apiFreire model api
Freire model api
The European Library
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
Sören Auer
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
Graph-TA
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
Armin Haller
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
Peter Haase
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
Open Knowledge Belgium
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)
Ivan Ermilov
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
Michele Pasin
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
Sören Auer
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
University of Toronto Libraries - Information Technology Services
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
Robert Sanderson
 

What's hot (20)

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual Analytics
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Freire model api
Freire model apiFreire model api
Freire model api
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
 

Viewers also liked

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
Tiago Malheiros
 
Pensar Digital
Pensar DigitalPensar Digital
Pensar Digital
Rui Rocha Costa
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
Inês Silva
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 Mockup
Ivo Gomes
 
Bash Introduction
Bash IntroductionBash Introduction
Bash Introduction
André Santos
 
Incubate Camp 2nd
Incubate Camp 2ndIncubate Camp 2nd
Incubate Camp 2nd
Keisuke Wada
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approach
Tomé Duarte
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012
Nuno Freitas
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
Fábio Oliveira
 
Set n'match
Set n'matchSet n'match
Set n'match
Pedro Santos
 
GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0
Gonçalo Henriques
 
Apresentação
ApresentaçãoApresentação
Apresentação
Pedro Bré
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
Marta Pinto
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)
Pedro Moura
 
Niiiws short
Niiiws short Niiiws short
Niiiws short
João Lopes Martins
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Nuno Rosa
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs Break
Pedro Oliveira
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech products
Sérgio Santos
 
Beta start @ beside
Beta start @ besideBeta start @ beside
Beta start @ beside
Sofia Pessanha
 

Viewers also liked (20)

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
 
Pensar Digital
Pensar DigitalPensar Digital
Pensar Digital
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 Mockup
 
Bash Introduction
Bash IntroductionBash Introduction
Bash Introduction
 
Prosolvers CH
Prosolvers CHProsolvers CH
Prosolvers CH
 
Incubate Camp 2nd
Incubate Camp 2ndIncubate Camp 2nd
Incubate Camp 2nd
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approach
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
 
Set n'match
Set n'matchSet n'match
Set n'match
 
GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0
 
Apresentação
ApresentaçãoApresentação
Apresentação
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)
 
Niiiws short
Niiiws short Niiiws short
Niiiws short
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs Break
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech products
 
Beta start @ beside
Beta start @ besideBeta start @ beside
Beta start @ beside
 

Similar to EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
Maciek Próchniak
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
Shyjal Raazi
 
API
APIAPI
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
RomanaPernischov
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
Nathanial McConnell
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
Chris Mungall
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
DataScienceConferenc1
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
Gregg Kellogg
 
Linked services
Linked servicesLinked services
Linked services
Carlos Pedrinaci
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
Franz Inc. - AllegroGraph
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
Konstantin Ivinsky
 
L18 Object Relational Mapping
L18 Object Relational MappingL18 Object Relational Mapping
L18 Object Relational Mapping
Ólafur Andri Ragnarsson
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
Josef Hardi
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
mattthemathman
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
Dimitris Kontokostas
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
Prashant Bhargava
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
Abdul Rahman Masri Attal
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 

Similar to EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org (20)

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
API
APIAPI
API
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
 
Linked services
Linked servicesLinked services
Linked services
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
 
L18 Object Relational Mapping
L18 Object Relational MappingL18 Object Relational Mapping
L18 Object Relational Mapping
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 

Recently uploaded

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 

Recently uploaded (20)

5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 

EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

  • 1. Validator and preview for the JobPosting data model of Schema.org Jindřich Mynarz Department of Information and Knowledge Engineering, University of Economics, Prague EC-WEB 2014, September 2, 2014
  • 2. Motivation ● Improving usability of vocabularies ● Provide feedback on the use of vocabularies ● Make vocabulary specification executable ● Help ensure basic level of data quality ● Capture application-specific requirements for data in validation rules
  • 3. DámePráci.eu project “Matching jobs with unemployed through semantic data” Data model using Schema.org with an extension for the job market. Application for searching through job postings aggregated from distinct sources: www.damepraci.cz (in Czech)
  • 4. Validation method ● Rule-based, schema-aware validation ● Operates in the RDF data model ● Focuses on semantic errors, beyond well-formed markup ● Partial open world assumption ● Implemented as SPARQL 1.1 CONSTRUCT queries ● Error reporting via SPIN RDF vocabulary
  • 5. Background knowledge schema.org + extension for job market (RDFS) + external enumerations: ● ISO 4217 currency codes (SKOS) ● ISO 639-1 language codes (SKOS) Loaded in separate named graphs that the validation rules can reference.
  • 6. Validation rules ● Data completeness ● Distinction between datatype and object properties ● Conflicting data ● Datatype violations ● Invalid codes
  • 7. Data completeness ● At least 1 instance of schema:JobPosting ● Other type information (class membership, datatypes) left optional ● Empty literals ● Conditionally required data (e.g., compensation + currency)
  • 8. Distinction between datatype and object properties ● Object properties with literal objects instead of URIs or blank nodes (and vice versa for datatype properties) ● Simpler syntax of datatype properties ○ Avoiding nested objects or difficulties with finding an object's URI ● May be a symptom of incorrectly nested HTML elements
  • 9. Conflicting data ● Mutually-exclusive properties ○ schema:jobLocation + schema:isRemoteWork true ● Cardinality violation for functional properties with > 1 object ○ schema:startDate, schema:currency, schema: availableVacancies ● Incompatible class membership inferences ○ schema:domainIncludes, schema:rangeIncludes ○ Incompatible class membership is instantiation of 2+ distinct classes that are not in rdfs:subClassOf relation.
  • 10. Datatype violations ● Regular expressions, casting errors of XPath datatype constructor functions ● Date and time formats (xsd:date, xsd: duration) ○ Not conforming to regular expressions ○ Non-existent dates ○ Dates from the future ● Interval limits ○ Positive integers for schema:availableVacancies
  • 11. Invalid codes ● Based on lookup in code lists enumerating every valid code ● Includes language codes (ISO 639-1) and currency codes (ISO 4217)
  • 12. Implementation Ruby on Rails web application backed by Jena Fuseki SPARQL 1.1 endpoint. ● Validates both RDFa and HTML5 Microdata ● Czech and English localization ● Validation results in HTML or JSON-LD ● RSpec tests for each validation rule ● Open source: https://github.com/OPLZZ/job-posting-validator
  • 15. Experimental validation of a JobPosting corpus ● 1332 seed URLs from 752 distinct pay-level domains obtained via Google Custom Search Engine restricted to schema: JobPosting ● Sample of 42 872 web pages obtained by crawling seed URLs ● Each page validated, validation results in JSON-LD loaded to Elasticsearch for exploration
  • 17. Datatype property used as object property Most common path to error: schema:title Possible cause: incorrect understanding of markup precedence rules: <a property="title" href="#title">SEO guru</a> [] schema:title <#title> . [] schema:title "SEO guru" .
  • 18. Empty literal value Most common path to error: schema: addressRegion Possible cause: incomplete data used to generate HTML from fixed templates Less common in manually marked-up HTML
  • 19. Incorrect character case in schema:Postaladdress Both RDFa and HTML5 Microdata are case-sensitive. Spread across 116 unique PLDs. “The default mode of authoring [Schema.org markup] is copy and edit.” — R.V. Guha
  • 20. Object property used as datatype property Most common path to error: schema:jobLocation Common cause: simpler markup without intermediate resources <p property="jobLocation"> <p rel="jobLocation"> Munich <p rel="address"> </p> <p property= "addressLocality"> Munich </p> </p> </p>
  • 21. Unsuccessful experiments Web Data Commons ● Errors smoothed by extraction to RDF ● Not suitable as a source of seed URLs: job postings disappear quickly Veterans Job Bank ● Data from few PLDs, lacks variety ● Severe restrictions on automated downloads through its API
  • 22. Questions? Acknowledgements: The presented research was partially supported by the project of Operational Programme Human Resources and Employment no. CZ. 1.04/5.1.01/77.00440. Image credits: Check List designed by Arthur Shlain from the thenounproject.com Puzzle designed by John from the thenounproject.com