SlideShare a Scribd company logo
1 of 23
Nikolaos Konstantinou
Nikos Houssos
Anastasia Manta
Exposing Bibliographic Information as Linked
Open Data using Standards-based Mappings:
Methodology and Results
3rd International Conference on Integrated Information (IC-ININFO’13)
Prague, Czech Republic, September 5-9, 2013
National Technical University of Athens
School of Electrical and Computer Engineering
Multimedia, Communications & Web Technologies
09-Sep-13
Introduction
 Linked Open Data (LOD) paradigm constantly
gaining worldwide acceptance
 Examples in various domains include:
 Government data
 http://www.data.gov.uk
 Financial data
 http://www.openspending.org
 News data
 http://www.guardian.co.uk/data
 Cultural heritage
 http://www.europeana.eu
 Bibliographic information
 http://data.ekt.gr2
09-Sep-13
Image source: http://lod-cloud.net
Why Linked Open Data (LOD)?
 Mature technological background
 W3C Recommendations, i.e. Web standards
 RDF, OWL, SPARQL, R2RML, but also HTTP, XML, etc.
 LOD benefits (indicatively)
 Integration
 With data models from other domains
 Expressiveness
 In describing information
 Query answering
 Graphs: beyond keyword-based searches
3
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
The EKT case (1/3)
 National Documentation Centre (EKT)
 Part of the National Hellenic Research Foundation
(NHRF)
 Mission-critical digital preservation
 Numerous repositories, maintained by teams of
software engineers, librarians and domain experts
 A living organism is created around these repositories
 Problem statement: How to benefit from semantic
technologies while:
 Keeping existing practices unaltered (as possible)
 Respecting nationwide responsibility
 Ensuring viability and durability of the result
4
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
The EKT case (2/3)
 The national archive of PhD theses
(http://phdtheses.ekt.gr)
 29,284 theses
 21,793 full text records
 35,925 downloads from 68 countries
 14,742 registered users from 97 countries
 173,610 online views
 The Helios repository (http://helios-eie.ekt.gr)
 5,735 records by researchers
affiliated with the NHRF
 1,930 full text records
 700 videos5
The EKT case (3/3)
 Suggested methodology and approach
 Maintain LOD repositories side-by-side with existing
bibliographic content repositories
 Respect standards to the maximum degree possible
 Regarding technologies and vocabularies involved
 Use open-source tools
 R2RML Parser
 Export database contents as RDF
 Biblio-Transformation-Engine (BTE)
 Process authority files
6
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
The R2RML Parser (1/3)
 An R2RML implementation
 A tool that can export relational database contents
as RDF graphs, based on an R2RML mapping
document
 See http://www.w3.org/2001/sw/wiki/R2RML_Parser
 R2RML
 RDB to RDF Mapping Language
 W3C Recommendation, as of Sept. 2012
 Reusable mapping definitions
 Supported by numerous tools
 db2triples, d2rq, capsenta’s ultrawrap, openlink’s virtuoso, etc.
7
3rd International Conference on Integrated Information (IC-ININFO’13)
The R2RML Parser (2/3)
 Command-line tool
 Fully written in Java
 Open-source ( )
 Publicly available at
https://github.com/nkons/r2rml-parser
 Tested against MySQL and PostgreSQL
 Output can be written in RDF/OWL
 N3, Turtle, N-Triple, TTL, RDF/XML notation
 Relational database (Jena SDB backend)
8
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
The R2RML Parser (3/3)
 Covers most of the R2RML constructs
 See https://github.com/nkons/r2rml-parser/wiki
 Allows arbitrary SQL queries to be used as logical
views (rr:sqlQuery construct)
 Allows SQL functions and function nesting
 Allows foreign keys
 Limitations
 No query nesting, union, intersection or difference
 No multiple graphs from a single execution
 No support for rr:defaultGraph, rr:graph, rr:graphMap
 Does not offer SPARQL-to-SQL translations
9
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
The Big Picture
DSpace field Values
dc.creator Kollia, Zoe
Sarantopoulou, Evangelia
Cefalas, Alciviadis
Constantinos
Kobe, S.
Samardzija, Z.
dc.date 2004
dc.format.extent 379-382
dc.identifier.uri http://hdl.handle.net/10
442/7055
dc.language eng
dc.publisher Springer
dc.title Nanometric size control
and treatment of
historic paper
manuscript and prints
with laser light at 157
nm
dc.type Article
dc.subject Printmaking and
Engraving
Resulting RDF snippet in turtle syntax
<http://data.ekt.gr/helios/item/10442/7055>
a dcterms:BibliographicResource;
dcterms:creator "Kobe, S." ,
<http://data.ekt.gr/person/48>,
<http://data.ekt.gr/person/14>,
"Samardzija, Z.",
<http://data.ekt.gr/person/112>;
dcterms:date "2004";
dcterms:extent "379-382";
dcterms:identifier
"http://hdl.handle.net/10442/7055" ;
dcterms:language
<http://www.lexvo.org/page/iso639-3/eng>;
dcterms:publisher "Springer";
dcterms:title
"Nanometric size control and treatment of
historic paper manuscript and prints with
laser light at 157 nm";
dcterms:type "Article“;
dc.subject
<http://id.loc.gov/authorities/classification/NE1-
NE978>.
 From DSpace (http://dspace.org) records to RDF
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
R2RML Mapping Definition Example
@prefix map: <#>.
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix dcterms:
<http://purl.org/dc/terms/>.
map:items
rr:logicalTable <#item-view>;
rr:subjectMap [
rr:template
'http://data.ekt.gr/helios/item/{"handle"}';
rr:class dcterms:BibliographicResource;
].
map:dc-description-abstract
rr:logicalTable <#dc-description-
abstractview>;
rr:subjectMap [ rr:template
'http://data.ekt.gr/helios/item/{"handle"}';
];
rr:predicateObjectMap [
rr:predicate dcterms:abstract;
rr:objectMap [ rr:column '"text_value"' ];
].
<#dc-description-abstract-view>
rr:sqlQuery """
SELECT h.handle AS handle, mv.text_value AS
text_value
FROM handle AS h, item AS i, metadatavalue AS
mv, metadataschemaregistry AS msr,
metadatafieldregistry AS mfr WHERE
i.in_archive=TRUE AND
h.resource_id=i.item_id AND
h.resource_type_id=2 AND
msr.metadata_schema_id=mfr.metadata_schema_id
AND
mfr.metadata_field_id=mv.metadata_field_id AND
mv.text_value is not null AND
i.item_id=mv.item_id AND
msr.namespace =
'http://dublincore.org/documents/dcmi-terms/‘
AND
mfr.element='description' AND
mfr.qualifier='abstract' """.
SQLquery
11
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Biblio-Transformation-Engine (BTE)
 An open-source java framework
https://code.google.com/p/biblio-transformation-engine/
 Part of the core DSpace distribution (release 3.0)
 Enables importing Items via basic bibliographic
formats
 Endnote, BibTex, RIS, TSV, CSV
12
09-Sep-13
Authority files
 Using BTE, a graph with researcher records is
exported
 Input
 MADS*-based XML
 Output
 MADS/RDF
 Subjects of the form
http://data.ekt.gr/persons/{researcher_id}
* Metadata Authority Description Schema: http://www.loc.gov/standards/mads/
13
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
The L in LOD
 Open Data is Linked when it contains links to
other URI’s
 Allows the user to discover more things
 In the EKT case, we linked fields
 dc.language to lexvo.org (language-related
concepts)
 E.g. “eng” to http://www.lexvo.org/page/iso639-3/eng
 dc.subject to LCC terms (Library of Congress
Classification)
 E.g. “Printmaking and Engraving” to
http://id.loc.gov/authorities/classification/NE1-NE978
14
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
System Architecture
 Virtuoso-backed quadstore
 Hosts RDF dumps from repository contents
 Integrated query capabilities
 Exposes a SPARQL endpoint and a faceted browser
http://data.ekt.grrepository metadata repository metadatamapping definitionmapping definition
Sparql
endpoint
Faceted browsing
Greek PhD theses repository NHRF Helios repository
15
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Virtuoso – data.ekt.gr
 SPARQL endpoint
 http://data.ekt.gr/sparql
 Allows arbitrary SPARQL queries on all graphs
 Results in HTML, JSON, RDF/XML, CSV etc.
 Allows programmatic access
 Faceted view
 http://data.ekt.gr/fct
 Full-text search capabilities
16
Discussion – Benefits (1/2)
 Semantic annotation
 Data is unambiguously interpreted and understood
by humans and software clients
 Query simplification
 Complex SQL queries
can be mapped to
concepts
SPARQL Query: Article abstracts
SELECT ?id ?abstract
FROM <http://data.ekt.gr/helios>
FROM <http://data.ekt.gr/phdtheses>
WHERE {
?a rdf:type
dcterms:BibliographicResource .
?a dcterms:identifier ?id .
?a dcterms:abstract ?abstract }
SQL Query: Article abstracts
SELECT h.handle AS handle, mv.text_value
AS text_value
FROM handle AS h, item AS i, metadatavalue
AS
mv, metadataschemaregistry AS msr,
metadatafieldregistry AS mfr WHERE
i.in_archive=TRUE AND
h.resource_id=i.item_id AND
h.resource_type_id=2 AND
msr.metadata_schema_id=mfr.metadata_schema
_id AND
mfr.metadata_field_id=mv.metadata_field_id
AND
mv.text_value is not null AND
i.item_id=mv.item_id AND
msr.namespace =
'http://dublincore.org/documents/dcmi-
terms/' AND
mfr.element='description' AND
mfr.qualifier='abstract' """.
17
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Discussion – Benefits (2/2)
 Increased discoverability
 Through interconnections to other datasets
 Reduced effort required for schema modifications
 New concepts can be created without altering the
source schema
 Synthesis
 Integration, fusion, mashups
 Inference
 Reasoning is possible over the result
 Reusability
 Third parties can reuse the data
18
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Discussion – Challenges (1/2)
 Multidisciplinarity
 Computer Science, Library Science
 Contributions from both domains are required
 The technological barrier
 No advanced mapping tools exist yet
 Presence of a technical expert is required
 Result is prone to errors
 Even after the resulting graph is produced
 Lack of validation or automation can leave errors or
bad practices go unnoticed
19
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Discussion – Challenges (2/2)
 Concept mismatch
 RDB fields and values may not be exact matches to
RDF concepts and instances
 Identical mappings will not always be present
 Exceptions to general mapping rules
 Automated curation procedures will apply to the
majority but not to all metadata fields and values
 Post-transformation manual interventions will be
required
20
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Synchronous vs. Asynchronous access
 Asynchronous: persistent RDF views
 Data is exposed periodically
 RDF graph is materialized
 Data does not change as frequently as it does in e.g. sensor
or social network data
 More viable option in the case of digital repositories
 Synchronous: transient views
 Real-time SPARQL-to-SQL translation
 RDF data is not materialized (as in SQL views)
 Queries are round-trips to the database
 Higher cost in terms of computational burden
 Small benefit (since data does not change frequently)
21
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Conclusions – Future Work
 Conclusions
 Balance between
 Experimenting with state-of-the-art technologies
 Initial investment pays off in numerous ways
 Carrying the responsibility of maintaining national archives
 Ensure dataset high value and, most importantly, its viability
 Future work
 Put more effort in R2RML Parser development
 Cover more R2RML functionality, offer more related services
 Improve dataset
 Quantity: Map and export more database fields, and more
datasets as RDF graphs in http://data.ekt.gr
 Quality: Denser links to other datasets
22
09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
Thank you!
Questions?

More Related Content

Similar to Exposing Bibliographic Information as Linked Open Data using Standards-based Mappings: Methodology and Results

RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Semantic Web from the 2013 Perspective
Semantic Web from the 2013 PerspectiveSemantic Web from the 2013 Perspective
Semantic Web from the 2013 PerspectiveAdrian Paschke
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Beat Signer
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Legislative data portals and linked data quality
Legislative data portals and linked data qualityLegislative data portals and linked data quality
Legislative data portals and linked data qualityJose Emilio Labra Gayo
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF AgeINRIA-OAK
 
LoCloud - D3.5: Historical place names service
LoCloud - D3.5: Historical place names serviceLoCloud - D3.5: Historical place names service
LoCloud - D3.5: Historical place names servicelocloud
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertationssinglish
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sourcesrumito
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Figoblog
 
Interconnecting Belgian national and regional address data using EC ISA "Loca...
Interconnecting Belgian national and regional address data using EC ISA "Loca...Interconnecting Belgian national and regional address data using EC ISA "Loca...
Interconnecting Belgian national and regional address data using EC ISA "Loca...PeterWinstanley1
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agricultureValeria Pesce
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defensemarek_pomocka
 

Similar to Exposing Bibliographic Information as Linked Open Data using Standards-based Mappings: Methodology and Results (20)

RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Semantic Web from the 2013 Perspective
Semantic Web from the 2013 PerspectiveSemantic Web from the 2013 Perspective
Semantic Web from the 2013 Perspective
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the StackLOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: 3rd relase of the Stack
 
Legislative data portals and linked data quality
Legislative data portals and linked data qualityLegislative data portals and linked data quality
Legislative data portals and linked data quality
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
LoCloud - D3.5: Historical place names service
LoCloud - D3.5: Historical place names serviceLoCloud - D3.5: Historical place names service
LoCloud - D3.5: Historical place names service
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
 
Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertations
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817Ifla swsig meeting - Puerto Rico - 20110817
Ifla swsig meeting - Puerto Rico - 20110817
 
Interconnecting Belgian national and regional address data using EC ISA "Loca...
Interconnecting Belgian national and regional address data using EC ISA "Loca...Interconnecting Belgian national and regional address data using EC ISA "Loca...
Interconnecting Belgian national and regional address data using EC ISA "Loca...
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 

More from Nikolaos Konstantinou

An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsNikolaos Konstantinou
 
Materializing the Web of Linked Data
Materializing the Web of Linked DataMaterializing the Web of Linked Data
Materializing the Web of Linked DataNikolaos Konstantinou
 
Introduction: Linked Data and the Semantic Web
Introduction: Linked Data and the Semantic WebIntroduction: Linked Data and the Semantic Web
Introduction: Linked Data and the Semantic WebNikolaos Konstantinou
 
Deploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsDeploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsNikolaos Konstantinou
 
Creating Linked Data from Relational Databases
Creating Linked Data from Relational DatabasesCreating Linked Data from Relational Databases
Creating Linked Data from Relational DatabasesNikolaos Konstantinou
 
Generating Linked Data in Real-time from Sensor Data Streams
Generating Linked Data in Real-time from Sensor Data StreamsGenerating Linked Data in Real-time from Sensor Data Streams
Generating Linked Data in Real-time from Sensor Data StreamsNikolaos Konstantinou
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsNikolaos Konstantinou
 
Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...
Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...
Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...Nikolaos Konstantinou
 
Διαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματα
Διαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματαΔιαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματα
Διαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματαNikolaos Konstantinou
 
Publishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesPublishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesNikolaos Konstantinou
 
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor NetworksFrom Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor NetworksNikolaos Konstantinou
 
A rule-based approach for the real-time semantic annotation in context-aware ...
A rule-based approach for the real-time semantic annotation in context-aware ...A rule-based approach for the real-time semantic annotation in context-aware ...
A rule-based approach for the real-time semantic annotation in context-aware ...Nikolaos Konstantinou
 
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...Nikolaos Konstantinou
 

More from Nikolaos Konstantinou (16)

An Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF GraphsAn Approach for the Incremental Export of Relational Databases into RDF Graphs
An Approach for the Incremental Export of Relational Databases into RDF Graphs
 
Materializing the Web of Linked Data
Materializing the Web of Linked DataMaterializing the Web of Linked Data
Materializing the Web of Linked Data
 
Introduction: Linked Data and the Semantic Web
Introduction: Linked Data and the Semantic WebIntroduction: Linked Data and the Semantic Web
Introduction: Linked Data and the Semantic Web
 
Technical Background
Technical BackgroundTechnical Background
Technical Background
 
Deploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsDeploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software Tools
 
Creating Linked Data from Relational Databases
Creating Linked Data from Relational DatabasesCreating Linked Data from Relational Databases
Creating Linked Data from Relational Databases
 
Generating Linked Data in Real-time from Sensor Data Streams
Generating Linked Data in Real-time from Sensor Data StreamsGenerating Linked Data in Real-time from Sensor Data Streams
Generating Linked Data in Real-time from Sensor Data Streams
 
Conclusions: Summary and Outlook
Conclusions: Summary and OutlookConclusions: Summary and Outlook
Conclusions: Summary and Outlook
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF Graphs
 
OR2012 Biblio-transformation-engine
OR2012 Biblio-transformation-engineOR2012 Biblio-transformation-engine
OR2012 Biblio-transformation-engine
 
Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...
Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...
Priamos: A Middleware Architecture for Real-Time Semantic Annotation of Conte...
 
Διαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματα
Διαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματαΔιαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματα
Διαχείριση Ψηφιακού Περιεχομένου με το DSpace: Λειτουργία και τεχνικά ζητήματα
 
Publishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesPublishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web Technologies
 
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor NetworksFrom Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
 
A rule-based approach for the real-time semantic annotation in context-aware ...
A rule-based approach for the real-time semantic annotation in context-aware ...A rule-based approach for the real-time semantic annotation in context-aware ...
A rule-based approach for the real-time semantic annotation in context-aware ...
 
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
VisAVis: An Approach to an Intermediate Layer between Ontologies and Relation...
 

Recently uploaded

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Exposing Bibliographic Information as Linked Open Data using Standards-based Mappings: Methodology and Results

  • 1. Nikolaos Konstantinou Nikos Houssos Anastasia Manta Exposing Bibliographic Information as Linked Open Data using Standards-based Mappings: Methodology and Results 3rd International Conference on Integrated Information (IC-ININFO’13) Prague, Czech Republic, September 5-9, 2013 National Technical University of Athens School of Electrical and Computer Engineering Multimedia, Communications & Web Technologies 09-Sep-13
  • 2. Introduction  Linked Open Data (LOD) paradigm constantly gaining worldwide acceptance  Examples in various domains include:  Government data  http://www.data.gov.uk  Financial data  http://www.openspending.org  News data  http://www.guardian.co.uk/data  Cultural heritage  http://www.europeana.eu  Bibliographic information  http://data.ekt.gr2 09-Sep-13 Image source: http://lod-cloud.net
  • 3. Why Linked Open Data (LOD)?  Mature technological background  W3C Recommendations, i.e. Web standards  RDF, OWL, SPARQL, R2RML, but also HTTP, XML, etc.  LOD benefits (indicatively)  Integration  With data models from other domains  Expressiveness  In describing information  Query answering  Graphs: beyond keyword-based searches 3 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 4. The EKT case (1/3)  National Documentation Centre (EKT)  Part of the National Hellenic Research Foundation (NHRF)  Mission-critical digital preservation  Numerous repositories, maintained by teams of software engineers, librarians and domain experts  A living organism is created around these repositories  Problem statement: How to benefit from semantic technologies while:  Keeping existing practices unaltered (as possible)  Respecting nationwide responsibility  Ensuring viability and durability of the result 4 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 5. The EKT case (2/3)  The national archive of PhD theses (http://phdtheses.ekt.gr)  29,284 theses  21,793 full text records  35,925 downloads from 68 countries  14,742 registered users from 97 countries  173,610 online views  The Helios repository (http://helios-eie.ekt.gr)  5,735 records by researchers affiliated with the NHRF  1,930 full text records  700 videos5
  • 6. The EKT case (3/3)  Suggested methodology and approach  Maintain LOD repositories side-by-side with existing bibliographic content repositories  Respect standards to the maximum degree possible  Regarding technologies and vocabularies involved  Use open-source tools  R2RML Parser  Export database contents as RDF  Biblio-Transformation-Engine (BTE)  Process authority files 6 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 7. The R2RML Parser (1/3)  An R2RML implementation  A tool that can export relational database contents as RDF graphs, based on an R2RML mapping document  See http://www.w3.org/2001/sw/wiki/R2RML_Parser  R2RML  RDB to RDF Mapping Language  W3C Recommendation, as of Sept. 2012  Reusable mapping definitions  Supported by numerous tools  db2triples, d2rq, capsenta’s ultrawrap, openlink’s virtuoso, etc. 7 3rd International Conference on Integrated Information (IC-ININFO’13)
  • 8. The R2RML Parser (2/3)  Command-line tool  Fully written in Java  Open-source ( )  Publicly available at https://github.com/nkons/r2rml-parser  Tested against MySQL and PostgreSQL  Output can be written in RDF/OWL  N3, Turtle, N-Triple, TTL, RDF/XML notation  Relational database (Jena SDB backend) 8 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 9. The R2RML Parser (3/3)  Covers most of the R2RML constructs  See https://github.com/nkons/r2rml-parser/wiki  Allows arbitrary SQL queries to be used as logical views (rr:sqlQuery construct)  Allows SQL functions and function nesting  Allows foreign keys  Limitations  No query nesting, union, intersection or difference  No multiple graphs from a single execution  No support for rr:defaultGraph, rr:graph, rr:graphMap  Does not offer SPARQL-to-SQL translations 9 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 10. The Big Picture DSpace field Values dc.creator Kollia, Zoe Sarantopoulou, Evangelia Cefalas, Alciviadis Constantinos Kobe, S. Samardzija, Z. dc.date 2004 dc.format.extent 379-382 dc.identifier.uri http://hdl.handle.net/10 442/7055 dc.language eng dc.publisher Springer dc.title Nanometric size control and treatment of historic paper manuscript and prints with laser light at 157 nm dc.type Article dc.subject Printmaking and Engraving Resulting RDF snippet in turtle syntax <http://data.ekt.gr/helios/item/10442/7055> a dcterms:BibliographicResource; dcterms:creator "Kobe, S." , <http://data.ekt.gr/person/48>, <http://data.ekt.gr/person/14>, "Samardzija, Z.", <http://data.ekt.gr/person/112>; dcterms:date "2004"; dcterms:extent "379-382"; dcterms:identifier "http://hdl.handle.net/10442/7055" ; dcterms:language <http://www.lexvo.org/page/iso639-3/eng>; dcterms:publisher "Springer"; dcterms:title "Nanometric size control and treatment of historic paper manuscript and prints with laser light at 157 nm"; dcterms:type "Article“; dc.subject <http://id.loc.gov/authorities/classification/NE1- NE978>.  From DSpace (http://dspace.org) records to RDF 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 11. R2RML Mapping Definition Example @prefix map: <#>. @prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix dcterms: <http://purl.org/dc/terms/>. map:items rr:logicalTable <#item-view>; rr:subjectMap [ rr:template 'http://data.ekt.gr/helios/item/{"handle"}'; rr:class dcterms:BibliographicResource; ]. map:dc-description-abstract rr:logicalTable <#dc-description- abstractview>; rr:subjectMap [ rr:template 'http://data.ekt.gr/helios/item/{"handle"}'; ]; rr:predicateObjectMap [ rr:predicate dcterms:abstract; rr:objectMap [ rr:column '"text_value"' ]; ]. <#dc-description-abstract-view> rr:sqlQuery """ SELECT h.handle AS handle, mv.text_value AS text_value FROM handle AS h, item AS i, metadatavalue AS mv, metadataschemaregistry AS msr, metadatafieldregistry AS mfr WHERE i.in_archive=TRUE AND h.resource_id=i.item_id AND h.resource_type_id=2 AND msr.metadata_schema_id=mfr.metadata_schema_id AND mfr.metadata_field_id=mv.metadata_field_id AND mv.text_value is not null AND i.item_id=mv.item_id AND msr.namespace = 'http://dublincore.org/documents/dcmi-terms/‘ AND mfr.element='description' AND mfr.qualifier='abstract' """. SQLquery 11 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 12. Biblio-Transformation-Engine (BTE)  An open-source java framework https://code.google.com/p/biblio-transformation-engine/  Part of the core DSpace distribution (release 3.0)  Enables importing Items via basic bibliographic formats  Endnote, BibTex, RIS, TSV, CSV 12 09-Sep-13
  • 13. Authority files  Using BTE, a graph with researcher records is exported  Input  MADS*-based XML  Output  MADS/RDF  Subjects of the form http://data.ekt.gr/persons/{researcher_id} * Metadata Authority Description Schema: http://www.loc.gov/standards/mads/ 13 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 14. The L in LOD  Open Data is Linked when it contains links to other URI’s  Allows the user to discover more things  In the EKT case, we linked fields  dc.language to lexvo.org (language-related concepts)  E.g. “eng” to http://www.lexvo.org/page/iso639-3/eng  dc.subject to LCC terms (Library of Congress Classification)  E.g. “Printmaking and Engraving” to http://id.loc.gov/authorities/classification/NE1-NE978 14 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 15. System Architecture  Virtuoso-backed quadstore  Hosts RDF dumps from repository contents  Integrated query capabilities  Exposes a SPARQL endpoint and a faceted browser http://data.ekt.grrepository metadata repository metadatamapping definitionmapping definition Sparql endpoint Faceted browsing Greek PhD theses repository NHRF Helios repository 15 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 16. Virtuoso – data.ekt.gr  SPARQL endpoint  http://data.ekt.gr/sparql  Allows arbitrary SPARQL queries on all graphs  Results in HTML, JSON, RDF/XML, CSV etc.  Allows programmatic access  Faceted view  http://data.ekt.gr/fct  Full-text search capabilities 16
  • 17. Discussion – Benefits (1/2)  Semantic annotation  Data is unambiguously interpreted and understood by humans and software clients  Query simplification  Complex SQL queries can be mapped to concepts SPARQL Query: Article abstracts SELECT ?id ?abstract FROM <http://data.ekt.gr/helios> FROM <http://data.ekt.gr/phdtheses> WHERE { ?a rdf:type dcterms:BibliographicResource . ?a dcterms:identifier ?id . ?a dcterms:abstract ?abstract } SQL Query: Article abstracts SELECT h.handle AS handle, mv.text_value AS text_value FROM handle AS h, item AS i, metadatavalue AS mv, metadataschemaregistry AS msr, metadatafieldregistry AS mfr WHERE i.in_archive=TRUE AND h.resource_id=i.item_id AND h.resource_type_id=2 AND msr.metadata_schema_id=mfr.metadata_schema _id AND mfr.metadata_field_id=mv.metadata_field_id AND mv.text_value is not null AND i.item_id=mv.item_id AND msr.namespace = 'http://dublincore.org/documents/dcmi- terms/' AND mfr.element='description' AND mfr.qualifier='abstract' """. 17 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 18. Discussion – Benefits (2/2)  Increased discoverability  Through interconnections to other datasets  Reduced effort required for schema modifications  New concepts can be created without altering the source schema  Synthesis  Integration, fusion, mashups  Inference  Reasoning is possible over the result  Reusability  Third parties can reuse the data 18 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 19. Discussion – Challenges (1/2)  Multidisciplinarity  Computer Science, Library Science  Contributions from both domains are required  The technological barrier  No advanced mapping tools exist yet  Presence of a technical expert is required  Result is prone to errors  Even after the resulting graph is produced  Lack of validation or automation can leave errors or bad practices go unnoticed 19 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 20. Discussion – Challenges (2/2)  Concept mismatch  RDB fields and values may not be exact matches to RDF concepts and instances  Identical mappings will not always be present  Exceptions to general mapping rules  Automated curation procedures will apply to the majority but not to all metadata fields and values  Post-transformation manual interventions will be required 20 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 21. Synchronous vs. Asynchronous access  Asynchronous: persistent RDF views  Data is exposed periodically  RDF graph is materialized  Data does not change as frequently as it does in e.g. sensor or social network data  More viable option in the case of digital repositories  Synchronous: transient views  Real-time SPARQL-to-SQL translation  RDF data is not materialized (as in SQL views)  Queries are round-trips to the database  Higher cost in terms of computational burden  Small benefit (since data does not change frequently) 21 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)
  • 22. Conclusions – Future Work  Conclusions  Balance between  Experimenting with state-of-the-art technologies  Initial investment pays off in numerous ways  Carrying the responsibility of maintaining national archives  Ensure dataset high value and, most importantly, its viability  Future work  Put more effort in R2RML Parser development  Cover more R2RML functionality, offer more related services  Improve dataset  Quantity: Map and export more database fields, and more datasets as RDF graphs in http://data.ekt.gr  Quality: Denser links to other datasets 22 09-Sep-133rd International Conference on Integrated Information (IC-ININFO’13)