Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Extraction & Semantic Annotation 
of Workshop Proceedings 
in HTML using RML 
Anastasia Dimou, Miel Vander Sande, Pieter C...
Upcoming SlideShare
Loading in …5
×

Extraction and Semantic Annotation of Workshop Proceedings in HTML using RML

412 views

Published on

Despite the significant number of existing tools, incorporating data into the Linked Open Data cloud remains complicated; hence discouraging data owners to publish their data as Linked Data. Unlocking the semantics of published data, even if they are not provided by the data owners, can contribute to surpass the barriers posed by the low availability of Linked Data and come closer to the realization of the envisaged Semantic Web. RML, a generic mapping language based on an extension over R2RML, the W2C standard for mapping relational databases into RDF, offers a uniform way of defining the mapping rules for data in heterogeneous formats. In this paper, we present how we adjusted our prototype RML Processor, taking advantage of RML's scalability, to extract and map data of workshop proceedings published in html to the RDF data model for the Semantic Publishing Challenge needs.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Extraction and Semantic Annotation of Workshop Proceedings in HTML using RML

  1. 1. Extraction & Semantic Annotation of Workshop Proceedings in HTML using RML Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Laurens De Vocht, Ruben Verborgh, Erik Mannens, Rik Van de Walle {firstname.surname}@ugent.be RML extends the W3C R2RML language, offering a generic way of defining rules for mapping data in heterogeneous formats to RDF. Re-formatting HTML documents is not always possible, especially if mappings occur on-the-fly. RML got extended to map HTML documents using the W3C standardized CSS3 selectors as reference formulation at the RML mapping rules. But CSS3 selectors might not be fine-grained and/or prior cleansing might not be possible. Hence, Regular Expressions are used for embedded post-extraction processing. Ghent University – iMinds – Multimedia Lab rr:subjectMap rr:predicateMap rr:objectMap extraction extraction <span class="CEURVOLNR">Vol-1056</span> Vol-1056 <h1> <a href="http://salad2013.linkedservices.org/"> <span class="CEURVOLACRONYM"> SALAD 2013</span> </a><br> <span class="CEURVOLTITLE"> Services and Applications over Linked APIs and Data </span> </h1> span.CEURVOLNR rr:predicate dcterms:title span.CEURVOLTITLE rr:predicate bibo:presentedAt span. CEURLOCTIME http://semweb.mmlab.be • iminds.be <h3> <span class="CEURLOCTIME"> Montpellier, France, May 26, 2013 </span>. </h3> http://ceur-ws.org/Vol-1056/ dcterms:title rr:template "http://ceur-ws.org/{span.CEURVOLNR}/" “Services and Applications over Linked APIs and Data” CEUR-WS HTML document bibo:presentedAt “Montpellier, France, May 26, 2013” “Montpellier” “^([sw]*),” rml:reference rr:template subject rr:constant predicate rml:reference Regular Expression object post-processing post-processing RDF representation <http://ceur-ws.org/Vol-1056/> dcterms:title “Services and Applications over Linked APIs and Data” ; bibo:presentedAt “Montpellier” . http://rml.io

×