SlideShare a Scribd company logo
Mapping and RDF Dataset Quality Assessment
with RDFUnit SPARQL-based test cases applied to [R2]RML mappings
Test-driven Assessment of [R2]RML
Mappings to Improve Dataset Quality
http://rml.io • http:// rdfunit.aksw.org
Anastasia Dimou1, Dimitris Kontokostas2, Markus Freudenberg2, Ruben Verborgh1,
Jens Lehmann2, Erik Mannens1, Sebastian Hellmann2, Rik Van de Walle1
…WHERE { ?resource %%P1%% ?c.
FILTER (DATATYPE(?c) != %%D1%%) }
<#Mapping>
rr:predicateObjectMap [
rr:predicate foaf:age; rr:objectMap [rml:reference “Age " ] ].
[R2]RML Mapping
Name Surname Age
Anastasia Dimou 12
Dimitris Kontokostas 15
http://example.com/
{Name}_{Surname}
foaf:Project
foaf:age "Age"
xsd:floathttp://example.com/
Anastasia_Dimou
foaf:Project
foaf:age "12"
xsd:float
http://example.com/
Dimitris_Kontokostas
foaf:Project
foaf:age "15"
xsd:float
RDF Dataset
for [R2]RML mappingsfor RDF dataset
… WHERE { ?resource foaf:age ?c.
FILTER (DATATYPE(?c) != xsd:int) }
… WHERE {
?resource rr:predicateObjectMap ?poMap.
?poMap rr:predicate %%P1%%;
rr:objectMap ?objM.
?objM rr:datatype ?c.
FILTER (?c != %%D1%%) }
size time #fail test cases #violations
DBPedia EN 115K 11s 1 160
DBPedia NL 53K 6s 1 124
DBLP 368 12s 2 8
Quality Assessment results for [R2]RML mappings
size time #fail test cases #violations
DBPedia EN 62M 16h 1,128 3.2M
DBPedia NL 21M 1.5h 683 815K
DBLP 12M 12h 7 8.1M
Quality Assessment results for RDF dataset
The number of errors grows
linearly in function of the number of iterations and
geometrically if multiple references and returned values
Shift the Quality Assessment from the RDF dataset
also to the Mapping definitions that generate the dataset
The number of errors grows
linearly in function of the number of applicable Term Maps.
The time to execute the assessment is significantly reduced
Same violations appear repeatedly over distinct entities.
Quality Assessment (QA)
(-) is not incorporated into the publishing workflow
(-) is performed by third parties
Dataset Quality Assessment (DQA)
(-) results are not incorporated into the dataset
(-) adjustments are manually, rarely applied & not at the root
(-) adjustments are overwritten when
a new version of the original data is mapped & published
Mapping Quality Assessment (MQA)
(+) discover violations before they are even generated
(+) specify the origin of the violation
(+) structural adjustments can still be applied easily
(+) reduce the effort required to act upon QA results
(+) prevents same violations to appear repeatedly
within the dataset and over distinct entities
(+) prevents the generation of low quality RDF datasets
(+) uniform Mapping and Dataset Quality Assessment
The violations derive from mapping definitions
that specify how the RDF dataset will be generated
1Ghent University – iMinds – Multimedia Lab 2AKSW, University of Leipzig

More Related Content

What's hot

Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
Dr Muhammad Adnan
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r
Simple Research
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platforms
Graph-TA
 
Pricipal Component Analysis Using R
Pricipal Component Analysis Using RPricipal Component Analysis Using R
Pricipal Component Analysis Using RKarthi Keyan
 
Making the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the worldMaking the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the world
Aileen Buckley
 
Making the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the worldMaking the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the world
Aileen Buckley
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
Sergey Enin
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
Albert Meroño-Peñuela
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
Rich Harris
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysisAbhiram Kanigolla
 

What's hot (10)

Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r
 
Graphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platformsGraphalytics: A big data benchmark for graph processing platforms
Graphalytics: A big data benchmark for graph processing platforms
 
Pricipal Component Analysis Using R
Pricipal Component Analysis Using RPricipal Component Analysis Using R
Pricipal Component Analysis Using R
 
Making the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the worldMaking the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the world
 
Making the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the worldMaking the most of raster data from the arcgis living atlas of the world
Making the most of raster data from the arcgis living atlas of the world
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
An Introduction to Mapping, GIS and Spatial Modelling in R (presentation)
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 

Viewers also liked

Towards an Interface for User-Friendly Linked Data Generation Administration
Towards an Interface for User-Friendly Linked Data Generation AdministrationTowards an Interface for User-Friendly Linked Data Generation Administration
Towards an Interface for User-Friendly Linked Data Generation Administration
andimou
 
Extraction and Semantic Annotation of Workshop Proceedings in HTML using RML
Extraction and Semantic Annotation of Workshop Proceedings in HTML using RMLExtraction and Semantic Annotation of Workshop Proceedings in HTML using RML
Extraction and Semantic Annotation of Workshop Proceedings in HTML using RML
andimou
 
A Generic Language for Integrated RDF Mappings of Heterogeneous Data
A Generic Language for Integrated RDF Mappings of Heterogeneous DataA Generic Language for Integrated RDF Mappings of Heterogeneous Data
A Generic Language for Integrated RDF Mappings of Heterogeneous Data
andimou
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Language
andimou
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
andimou
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
andimou
 
DBpedia Mappings Quality Assessment
DBpedia Mappings Quality AssessmentDBpedia Mappings Quality Assessment
DBpedia Mappings Quality Assessment
andimou
 
Mappings Validation
Mappings ValidationMappings Validation
Mappings Validation
andimou
 

Viewers also liked (8)

Towards an Interface for User-Friendly Linked Data Generation Administration
Towards an Interface for User-Friendly Linked Data Generation AdministrationTowards an Interface for User-Friendly Linked Data Generation Administration
Towards an Interface for User-Friendly Linked Data Generation Administration
 
Extraction and Semantic Annotation of Workshop Proceedings in HTML using RML
Extraction and Semantic Annotation of Workshop Proceedings in HTML using RMLExtraction and Semantic Annotation of Workshop Proceedings in HTML using RML
Extraction and Semantic Annotation of Workshop Proceedings in HTML using RML
 
A Generic Language for Integrated RDF Mappings of Heterogeneous Data
A Generic Language for Integrated RDF Mappings of Heterogeneous DataA Generic Language for Integrated RDF Mappings of Heterogeneous Data
A Generic Language for Integrated RDF Mappings of Heterogeneous Data
 
Mapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping LanguageMapping Hierarchical Sources into RDF using the RML Mapping Language
Mapping Hierarchical Sources into RDF using the RML Mapping Language
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
 
DBpedia Mappings Quality Assessment
DBpedia Mappings Quality AssessmentDBpedia Mappings Quality Assessment
DBpedia Mappings Quality Assessment
 
Mappings Validation
Mappings ValidationMappings Validation
Mappings Validation
 

Similar to Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
BAINIDA
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
RomanaPernischov
 
Rangeland hydrology and erosion model
Rangeland hydrology and erosion modelRangeland hydrology and erosion model
Rangeland hydrology and erosion model
Soil and Water Conservation Society
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
Sascha Dittmann
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Teradata Aster
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
seminar100326a.pdf
seminar100326a.pdfseminar100326a.pdf
seminar100326a.pdf
ShrutiPanda12
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
MongoDB
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
University of Washington
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
Alejandro Llaves
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
Alejandro Llaves
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
Spark Summit
 
1-5 ADS Notes.pdf
1-5 ADS Notes.pdf1-5 ADS Notes.pdf
1-5 ADS Notes.pdf
RameshBabuKellamapal
 

Similar to Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality (20)

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
 
Rangeland hydrology and erosion model
Rangeland hydrology and erosion modelRangeland hydrology and erosion model
Rangeland hydrology and erosion model
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
seminar100326a.pdf
seminar100326a.pdfseminar100326a.pdf
seminar100326a.pdf
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
1-5 ADS Notes.pdf
1-5 ADS Notes.pdf1-5 ADS Notes.pdf
1-5 ADS Notes.pdf
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

  • 1. Mapping and RDF Dataset Quality Assessment with RDFUnit SPARQL-based test cases applied to [R2]RML mappings Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality http://rml.io • http:// rdfunit.aksw.org Anastasia Dimou1, Dimitris Kontokostas2, Markus Freudenberg2, Ruben Verborgh1, Jens Lehmann2, Erik Mannens1, Sebastian Hellmann2, Rik Van de Walle1 …WHERE { ?resource %%P1%% ?c. FILTER (DATATYPE(?c) != %%D1%%) } <#Mapping> rr:predicateObjectMap [ rr:predicate foaf:age; rr:objectMap [rml:reference “Age " ] ]. [R2]RML Mapping Name Surname Age Anastasia Dimou 12 Dimitris Kontokostas 15 http://example.com/ {Name}_{Surname} foaf:Project foaf:age "Age" xsd:floathttp://example.com/ Anastasia_Dimou foaf:Project foaf:age "12" xsd:float http://example.com/ Dimitris_Kontokostas foaf:Project foaf:age "15" xsd:float RDF Dataset for [R2]RML mappingsfor RDF dataset … WHERE { ?resource foaf:age ?c. FILTER (DATATYPE(?c) != xsd:int) } … WHERE { ?resource rr:predicateObjectMap ?poMap. ?poMap rr:predicate %%P1%%; rr:objectMap ?objM. ?objM rr:datatype ?c. FILTER (?c != %%D1%%) } size time #fail test cases #violations DBPedia EN 115K 11s 1 160 DBPedia NL 53K 6s 1 124 DBLP 368 12s 2 8 Quality Assessment results for [R2]RML mappings size time #fail test cases #violations DBPedia EN 62M 16h 1,128 3.2M DBPedia NL 21M 1.5h 683 815K DBLP 12M 12h 7 8.1M Quality Assessment results for RDF dataset The number of errors grows linearly in function of the number of iterations and geometrically if multiple references and returned values Shift the Quality Assessment from the RDF dataset also to the Mapping definitions that generate the dataset The number of errors grows linearly in function of the number of applicable Term Maps. The time to execute the assessment is significantly reduced Same violations appear repeatedly over distinct entities. Quality Assessment (QA) (-) is not incorporated into the publishing workflow (-) is performed by third parties Dataset Quality Assessment (DQA) (-) results are not incorporated into the dataset (-) adjustments are manually, rarely applied & not at the root (-) adjustments are overwritten when a new version of the original data is mapped & published Mapping Quality Assessment (MQA) (+) discover violations before they are even generated (+) specify the origin of the violation (+) structural adjustments can still be applied easily (+) reduce the effort required to act upon QA results (+) prevents same violations to appear repeatedly within the dataset and over distinct entities (+) prevents the generation of low quality RDF datasets (+) uniform Mapping and Dataset Quality Assessment The violations derive from mapping definitions that specify how the RDF dataset will be generated 1Ghent University – iMinds – Multimedia Lab 2AKSW, University of Leipzig