Personalised
access to
cultural heritage
spaces

Roadmap from ESEPaths to EDMPaths:
a note on representing annotations res...
Roadmap from ESEPaths to EDMPaths:
a note on representing annotations resulting from automatic
enrichment
Aitor Soroa, Ene...
Roadmap from ESEPaths to EDMPaths:
a note on representing annotations resulting from automatic
enrichment
Aitor Soroa, Ene...
• Event information associated with the item: CHOs often provide event- or activity-related
information, such as people wa...
<record>
<!-- Existing ESE record -->
<dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier>
<europeana:ur...
• <paths:related_item> which links the ESE record with related CH items. The element
has the following attributes:
– confi...
The first criterion states that all PATHS annotations should be described using EDM. As
will be shown below, some annotatio...
<http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation
edm:aggregatedCHO <http://data.europeana.eu/i...
• Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any
concept which refers to a (type of)...
<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/...
Using specific metadata fields to represent enrichments Alternatively, if a PATHS
enrichment is known to be certain, a new m...
linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represent...
It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-...
[Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for secon...
• Event information associated with the item: CHOs often provide event- or activity-related
information, such as people wa...
<record>
<!-- Existing ESE record -->
<dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier>
<europeana:ur...
• <paths:related_item> which links the ESE record with related CH items. The element
has the following attributes:
– confi...
The first criterion states that all PATHS annotations should be described using EDM. As
will be shown below, some annotatio...
<http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation
edm:aggregatedCHO <http://data.europeana.eu/i...
• Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any
concept which refers to a (type of)...
<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;
edm:aggregatedCHO <http://data.europeana.eu/...
Using specific metadata fields to represent enrichments Alternatively, if a PATHS
enrichment is known to be certain, a new m...
linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represent...
It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-...
[Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for secon...
Upcoming SlideShare
Loading in...5
×

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

336
-1

Published on

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010] for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of
the PATHS project is to augment CHOs (items) with information that will enrich the user’s experience. The additional information includes links between items in cultural collections and from items to external sources like Wikipedia. With this goal, the PATHS project has applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
336
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

  1. 1. Personalised access to cultural heritage spaces Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Authors: explore ! paths ! www.paths-project.eu! search! Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac
  2. 2. Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac February 10, 2014 Contents 1 Introduction 1 2 ESEPaths 2 3 Roadmap for basic conversion of ESEPaths to EDM 4 4 Using Open Annotation to represent attributes in relations 4.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 5 Conclusion 11 1 Introduction This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1 for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of the PATHS project is to augment CHOs (items) with information that will enrich the user’s experience. The additional information includes links between items in cultural collections and from items to external sources like Wikipedia. With this goal, the PATHS project has applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana. Using these techniques, PATHS enriches CH items with the following information [Agirre and de Lacalle, 2011, Otegi et al., 2012]: • Informativeness score: each item is associated to a value indicating the overall “informativeness” of the item, which is derived from the amount of text in its metadata and inversely proportional to the number of items where the same text is mentioned. • Vocabulary terms: vocabulary terms associated to the item. These terms are used for creating the tag clouds shown to the user. 1 http://pro.europeana.eu/edm-documentation 1
  3. 3. Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac February 10, 2014 Contents 1 Introduction 1 2 ESEPaths 2 3 Roadmap for basic conversion of ESEPaths to EDM 4 4 Using Open Annotation to represent attributes in relations 4.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 5 Conclusion 11 1 Introduction This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1 for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of the PATHS project is to augment CHOs (items) with information that will enrich the user’s experience. The additional information includes links between items in cultural collections and from items to external sources like Wikipedia. With this goal, the PATHS project has applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana. Using these techniques, PATHS enriches CH items with the following information [Agirre and de Lacalle, 2011, Otegi et al., 2012]: • Informativeness score: each item is associated to a value indicating the overall “informativeness” of the item, which is derived from the amount of text in its metadata and inversely proportional to the number of items where the same text is mentioned. • Vocabulary terms: vocabulary terms associated to the item. These terms are used for creating the tag clouds shown to the user. 1 http://pro.europeana.eu/edm-documentation 1
  4. 4. • Event information associated with the item: CHOs often provide event- or activity-related information, such as people walking, etc. We enrich the items by means of a predefined list of words that can be used to refer to events. This data allows answering questions like “give me items with people running”, “items with people playing”, etc. • Related items: CH items which are semantically related. • Background links that relate CH items with external resources such as Wikipedia. When linking a CH item with some external resource, we keep track of the original text snippet from which the association is derived. For instance, an item could be related to a Wikipedia article because of some text snippet of the dc:description field. In such case we store the reference to the field and offset as attributes.2 (note that in some cases however there is little point in keeping the text, because the enrichment is done based on a combination of metadata fields) The PATHS project started in 2011, and it adopted the representation schema of choice then, ESE3 . We extended it extended to a format called ESEPaths to represent the enrichment information just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this document we describe a proposal for representing PATHS enrichments following EDM (Europeana Data Model), the new data model used by Europeana. The document is structured as follows. We first introduce ESEPaths (Section 2), then the roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible (advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn. 2 ESEPaths PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichment information described above. Specifically, ESEPaths adds the following fields: • <paths:informativeness> with the informativeness score of the ESE record. • <paths:vocabulary>, which links the ESE record with vocabulary terms. The element has the following attributes: – name: name of the external vocabulary. – URI: the address (URI) of the specific category in the vocabulary. – confidence: the confidence of the association. • <paths:event> which links the ESE record with external events. The element has the following attributes: – source: the name of the external resource of the event (for instance, WordNet). – canonical_form: the canonical word form of the annotated event. – confidence: confidence of the association. 2 Keeping track of this information, for instance, for an interface showing those annotations, as it can emphasize the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it. 3 http://www.europeana.eu/schemas/ese/ 2
  5. 5. <record> <!-- Existing ESE record --> <dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier> <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <dc:title>Stembridge Windmill, High Ham, Somerset</dc:title> <dc:description>This is a random-coursed blue lias ...</dc:description> <dcterms:isPartOf>Bowes Museum</dcterms:isPartOf> <dc:subject>1670</dc:subject> <dc:type>Image</dc:type> <europeana:provider>CultureGrid</europeana:provider> <europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt> <europeana:hasObject>false</europeana:hasObject> <europeana:country>uk</europeana:country> <europeana:type>IMAGE</europeana:type> <europeana:language>en</europeana:language> <!-- ESEPaths augmentation --> <!-- item informativeness --> <paths:informativeness>0.7</paths:informativeness> <!-- vocabulary mapping --> <paths:vocabulary confidence="0.8" source="wikicat" URI="http://en.wikipedia.org/wiki/Category:Tower_mills"> Tower Mills</paths:vocabulary> <!-- events --> <paths:event confidence="0.8" source="wordnet" canonical_form="play" start_offset="120" end_offset="127" field="dc:description"> playing</paths:event> <!-- related items --> <paths:related_item confidence="0.8" field="dc:subject" field_no="1" method="LDA"> http://www.europeana.eu/portal/record/09405t/A6F9A </paths:related_item> <!-- background links items --> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> Figure 1: Example of an ESEPaths record 3
  6. 6. • <paths:related_item> which links the ESE record with related CH items. The element has the following attributes: – confidence: confidence of the association. – method: which method produced the association – field: the name of the ESE field whose content suggests the similarity relation. – field_no: the position of the ESE field described above (useful in case the ESE records contains more than one field with the same name). • <paths:background_link>: which links the ESE record with an item from an external resource. The element has the following attributes: – source: the name of the external resource. – start_offset: the offset (in characters) within the field element where the text anchor begins. – end_offset: the offset (in characters) within the field element where the text anchor ends. – field: the field of the ESE record where the anchor for this relation is located. – confidence: confidence of the association. – method: which method produced the association. – title: title of the URL which the background link points to. – sentiment: polarity of the textual information included in the corresponding link. It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” for neutral. Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy of the original ESE record, whereas the new elements (in the paths namespace) are at the end. Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page. 3 Roadmap for basic conversion of ESEPaths to EDM As said before, all the data produced by the PATHS project is encoded following the ESE format extended with new elements. However, Europeana is switching from ESE to a new data model, EDM. The main difference between ESE and EDM is that the latter is more expressive and based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section, we outline the main design we devise for switching from ESEPaths to EDM. The main design criteria we have followed is the following: 1. All PATHS annotations should be properly represented using EDM. 2. It must be possible to retrieve particular PATHS annotations. 3. We should depart as less as possible from standard EDM. 4
  7. 7. The first criterion states that all PATHS annotations should be described using EDM. As will be shown below, some annotation attributes are difficult to represent following EDM and, as a consequence, a compromise has to be made between describing PATHS annotations in their full richness and using proper EDM concepts and properties for representing them. The second criterion states that the EDM representation has to respect the types of the PATHS annotations. For instance, it has to be possible to retrieve all background links of a particular CH item (as opposite as, say, its related items). Finally, the last criterion states that we should use widely used EDM objects and properties as possible. In particular, the EDM representation should use the set of elements described by Europeana’s instructions for providers4 , when possible. We now describe the main steps to describe the PATHS annotations to EDM. From ESEPaths to EDM We start describing the resources which are already in Europeana. This includes an Europeana ore:Aggregation resource with information about the digital aggregation process itself (provider, etc)5 . <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. Europeana also provides a proxy for the CHO, attached to this aggregation6 : <http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>; # Original ESE data dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". We now describe the way to represent the enrichment annotations as provided by the PATHS project. We encapsulate these annotations into a new ore:Aggregation. This aggregation resource records a first set of enrichments created by the PATHS project over the original CH object. It includes all relevant information like provider name, access rights, etc. as well as the annotations referring to the whole CH object, as opposed to enrichment information extracted from some subset of the CH object’s metadata. 4 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders The resource identifier of the aggregation used in the example is not real. The real one should be provided by Europeana. 6 Note again that the resource identifier of the proxy used in the example is not real. 5 5
  8. 8. <http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". There are some notes to be aware of: • The isShownAt property points to the original record, as the PATHS project does not store any information besides the proper enrichment of CH items. • The edm:rights property refers to the annotated information (instead of the rights of the original CH item). • As said before, the paths:informativeness element pertains to the PATHS aggregation resource because it refers to the CH object as a whole. Finally, we create a proxy resource for the PATHS aggregation and describe the remaining paths annotations within the scope (as properties) of this resource: <http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> Representing various types of enrichment. As shown in the example, the proxy resource relates the CH item with external resources such as vocabulary concepts, events, related items or objects from some external sources (such as Wikipedia or dbpedia). As all the associations are described by means of the high-level edm:isRelatedTo property, it is necessary to properly declare the types of the external objects related to the CH object. Otherwise, there would be no way to discriminate among the different types of PATHS annotations (for instance, there would be no way to specifically retrieve the vocabulary concepts related to a CH object). As a first solution, we can include a separate description for the resources linked to the CH object using SKOS7 . Within PATHS we define the following types of external resources: • Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclass of skos:Concept. 7 • Vocabulary concepts are of type skos:Concept. http://www.w3.org/2004/02/skos 6
  9. 9. • Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any concept which refers to a (type of) event (such as “run”, “play”, etc). • Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept. Note that these classes are meant to offer a way to discriminate among the different types of annotations inside the PATHS project. The classes are therefore loosely defined, in the sense that they do not describe the proper semantic type of the resources. For instance, PATHS can relate a CH object with a dbpedia resource representing a place (New_York), a person (Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicit common type for all those resources can be inherited from their “background link” status. Also note that at the time being, Europeana would not be able to perfectly ingest data that uses such sub-classes, as they depart from the set of elements described by Europeana’s instructions for providers8 . This would require Europeana to handle specialisations of EDM, which is not precisely scheduled at the time of writing. Based on the above, we also include the following statements in the example: <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. <http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept; skos:prefLabel "Archeology"@en. along with the definitions of these new types: paths:EventConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Event Concept"@en ; skos:definition "A concept describing an Event"@en . paths:RelatedItemConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Related Item Concept"@en ; skos:definition "A concept describing a CH record"@en . paths:BackgroundLinkConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Background Link Concept"@en ; skos:definition "A concept describing an object from an external source such as dbpedia"@en . The above definitions can be put next to the annotation data, in a separate file directly provided to Europeana or others, or even served over the Web in a Linked Data scenario. The whole EDM representation for the item is shown in Figure 2. 8 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders 7
  10. 10. <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. <http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>; # Existing ESE record dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". <http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". <http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. Figure 2: EDM representation of the ESEPaths example 8
  11. 11. Using specific metadata fields to represent enrichments Alternatively, if a PATHS enrichment is known to be certain, a new metadata field can be created for the CH object. For instance if the mapping of the CH record to a vocabulary concept is known to be sure, we can create a new dc:subject field linking the CH record with the appropriate vocabulary concept. Note however that PATHS enrichments are automatically performed, and it is not certain that a concept enrichment derived from a dc:subject would result in a dc:subject relation between the object and the concept. The link to the concept may have been identified based on only a small part of the original field, thus missing some of the original semantics. Thus some manual assessment has to be done in order to promote the annotation into a proper metadata field. 4 Using Open Annotation to represent attributes in relations The roadmap described in the previous section covers the main aspects of ESEPaths. However, there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inherits RDF’s focus on binary relations: attributes on relations. Almost all annotations created by the PATHS project have some information associated to them. Especially, many annotations record a confidence value, describing the level of certainty of the automatic method when creating the annotation. A way to overcome this limitation in an RDF-based model would be to reify the annotation into an instance of a dedicated class, and represent the annotation attributes using class properties. For this we can re-use elements from the Open Annotation (OA) model9 . Consider this ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> We would create the following oa:Annotation for it: background_link1 a oa:Annotation ; a paths:BackgroundLinkAnnotation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In the example, the <paths:background_link> annotation has been converted (reified) to an oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation, 9 http://www.openannotation.org/spec/core/ 9
  12. 12. linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represented as properties of this new resource. An alternative of the above approach would be using the OA “motivation” property for representing the annotation. The OA motivation is meant to represent “the reasons why the Annotation was created, not just the agents involved” 10 , which fits particularly well with the kind of information we want to represent. The “motivation” approach would lead to the following triplets: background_link1 a oa:Annotation ; oa:motivatedBy paths:backgroundLinkMotivation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In this case, the <paths:background_link> object is of type oa:Annotation, and it is also oa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept. Both approaches described so far solve the main problem of attaching attributes to relations, and also the need of defining specific relations for PATHS such as paths:background_link, that would conflict with the metadata fields currently used by EDM. Note however that the properties of the newly defined reified annotations are still specific for PATHS (paths:source, paths:confidence, etc). On a side note, using reified concepts for annotation raises the issue of whether we should still keep the proxy-based representation next to it. Because now all the PATHS enrichment data is attached to the reified annotation, the Proxy object described in Section 3 will convey little or no information at all, compared to the original data. 4.1 Offsets and selectors There is another piece of ESEPaths data, which is not currently represented in EDMPaths, namely, the field and offset attributes of the relations. Because all PATHS annotations are extracted from the textual content of some metadata field in the original CH record representation, ESEPaths annotations keeps track of the original text snippet (called the anchor ) which was used to derive the enrichment. In order to track this kind of provenance information, EDM could re-use the selectors from the Open Annotation model11 . For instance, Consider the following ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> ... <paths:background_link start_offset="0" end_offset="11" field="dc:subject" ... > http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> 10 11 http://www.openannotation.org/spec/core/core.html#Motivations http://www.openannotation.org/spec/core/specific.html#Selectors 10
  13. 13. It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsets could be translated to the following Open Annotation snippet: background_link1 a oa:Annotation ; oa:hasTarget anchor1 ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> . anchor1 a oa:SpecificResource ; oa:hasSource ??? ; # which type has this object ? oa:hasSelector selector1 . selector1 a oa:TextPositionSelector ; oa:start 0 ; oa:end 11 . As noted in the snippet, our problem is then to define the type of the anchor1 resource. This object should represent the dc:subject field of CH record “09405/8F49”, but there is actually no way to describe this with EDM. We thus decided to leave this piece of information out of our proposed solution. 5 Conclusion In this work we describe a method for representing automatically created PATHS annotations into the EDM model. We first describe a simple way for representing the annotations and discuss its benefits and drawbacks. One important weakness of the simple annotation schema lies in its inability to represent attributes of annotations, such as confidence scores. To overcome this limitation we propose a more complex solution that involves reifing the annotation properties as instances of dedicated classes, and representing the annotation attributes using class properties. For this we have re-used elements from the Open Annotation (OA) model. The method presented here, called EDMPaths, is able to properly represent the annotations following EDM, but some information which was previously present following ESE has been left out. In particular, information regarding the particular offset of the anchor that caused the annotation was produced has proven difficult to represent. One of our main design goals has been to avoid creating new non-standard classes and properties when defining EDMPaths. We think we have succeed on this particular aspect, mainly by reusing elements from initiatives such as the Open Annotation model. However, the proposal describes some properties which are still specific for the PATHS project. References [Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing and representation of content for first prototype. Technical report, PATHS project. [Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van de Sompel, H. (2010). The europeana data model (EDM). In World Library and Information Congress: 76th IFLA general conference and assembly, pages 10–15. 11
  14. 14. [Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for second prototype. Technical report, PATHS project. 12
  15. 15. • Event information associated with the item: CHOs often provide event- or activity-related information, such as people walking, etc. We enrich the items by means of a predefined list of words that can be used to refer to events. This data allows answering questions like “give me items with people running”, “items with people playing”, etc. • Related items: CH items which are semantically related. • Background links that relate CH items with external resources such as Wikipedia. When linking a CH item with some external resource, we keep track of the original text snippet from which the association is derived. For instance, an item could be related to a Wikipedia article because of some text snippet of the dc:description field. In such case we store the reference to the field and offset as attributes.2 (note that in some cases however there is little point in keeping the text, because the enrichment is done based on a combination of metadata fields) The PATHS project started in 2011, and it adopted the representation schema of choice then, ESE3 . We extended it extended to a format called ESEPaths to represent the enrichment information just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this document we describe a proposal for representing PATHS enrichments following EDM (Europeana Data Model), the new data model used by Europeana. The document is structured as follows. We first introduce ESEPaths (Section 2), then the roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible (advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn. 2 ESEPaths PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichment information described above. Specifically, ESEPaths adds the following fields: • <paths:informativeness> with the informativeness score of the ESE record. • <paths:vocabulary>, which links the ESE record with vocabulary terms. The element has the following attributes: – name: name of the external vocabulary. – URI: the address (URI) of the specific category in the vocabulary. – confidence: the confidence of the association. • <paths:event> which links the ESE record with external events. The element has the following attributes: – source: the name of the external resource of the event (for instance, WordNet). – canonical_form: the canonical word form of the annotated event. – confidence: confidence of the association. 2 Keeping track of this information, for instance, for an interface showing those annotations, as it can emphasize the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it. 3 http://www.europeana.eu/schemas/ese/ 2
  16. 16. <record> <!-- Existing ESE record --> <dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier> <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <dc:title>Stembridge Windmill, High Ham, Somerset</dc:title> <dc:description>This is a random-coursed blue lias ...</dc:description> <dcterms:isPartOf>Bowes Museum</dcterms:isPartOf> <dc:subject>1670</dc:subject> <dc:type>Image</dc:type> <europeana:provider>CultureGrid</europeana:provider> <europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt> <europeana:hasObject>false</europeana:hasObject> <europeana:country>uk</europeana:country> <europeana:type>IMAGE</europeana:type> <europeana:language>en</europeana:language> <!-- ESEPaths augmentation --> <!-- item informativeness --> <paths:informativeness>0.7</paths:informativeness> <!-- vocabulary mapping --> <paths:vocabulary confidence="0.8" source="wikicat" URI="http://en.wikipedia.org/wiki/Category:Tower_mills"> Tower Mills</paths:vocabulary> <!-- events --> <paths:event confidence="0.8" source="wordnet" canonical_form="play" start_offset="120" end_offset="127" field="dc:description"> playing</paths:event> <!-- related items --> <paths:related_item confidence="0.8" field="dc:subject" field_no="1" method="LDA"> http://www.europeana.eu/portal/record/09405t/A6F9A </paths:related_item> <!-- background links items --> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> Figure 1: Example of an ESEPaths record 3
  17. 17. • <paths:related_item> which links the ESE record with related CH items. The element has the following attributes: – confidence: confidence of the association. – method: which method produced the association – field: the name of the ESE field whose content suggests the similarity relation. – field_no: the position of the ESE field described above (useful in case the ESE records contains more than one field with the same name). • <paths:background_link>: which links the ESE record with an item from an external resource. The element has the following attributes: – source: the name of the external resource. – start_offset: the offset (in characters) within the field element where the text anchor begins. – end_offset: the offset (in characters) within the field element where the text anchor ends. – field: the field of the ESE record where the anchor for this relation is located. – confidence: confidence of the association. – method: which method produced the association. – title: title of the URL which the background link points to. – sentiment: polarity of the textual information included in the corresponding link. It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” for neutral. Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy of the original ESE record, whereas the new elements (in the paths namespace) are at the end. Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page. 3 Roadmap for basic conversion of ESEPaths to EDM As said before, all the data produced by the PATHS project is encoded following the ESE format extended with new elements. However, Europeana is switching from ESE to a new data model, EDM. The main difference between ESE and EDM is that the latter is more expressive and based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section, we outline the main design we devise for switching from ESEPaths to EDM. The main design criteria we have followed is the following: 1. All PATHS annotations should be properly represented using EDM. 2. It must be possible to retrieve particular PATHS annotations. 3. We should depart as less as possible from standard EDM. 4
  18. 18. The first criterion states that all PATHS annotations should be described using EDM. As will be shown below, some annotation attributes are difficult to represent following EDM and, as a consequence, a compromise has to be made between describing PATHS annotations in their full richness and using proper EDM concepts and properties for representing them. The second criterion states that the EDM representation has to respect the types of the PATHS annotations. For instance, it has to be possible to retrieve all background links of a particular CH item (as opposite as, say, its related items). Finally, the last criterion states that we should use widely used EDM objects and properties as possible. In particular, the EDM representation should use the set of elements described by Europeana’s instructions for providers4 , when possible. We now describe the main steps to describe the PATHS annotations to EDM. From ESEPaths to EDM We start describing the resources which are already in Europeana. This includes an Europeana ore:Aggregation resource with information about the digital aggregation process itself (provider, etc)5 . <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. Europeana also provides a proxy for the CHO, attached to this aggregation6 : <http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>; # Original ESE data dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". We now describe the way to represent the enrichment annotations as provided by the PATHS project. We encapsulate these annotations into a new ore:Aggregation. This aggregation resource records a first set of enrichments created by the PATHS project over the original CH object. It includes all relevant information like provider name, access rights, etc. as well as the annotations referring to the whole CH object, as opposed to enrichment information extracted from some subset of the CH object’s metadata. 4 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders The resource identifier of the aggregation used in the example is not real. The real one should be provided by Europeana. 6 Note again that the resource identifier of the proxy used in the example is not real. 5 5
  19. 19. <http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregation edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". There are some notes to be aware of: • The isShownAt property points to the original record, as the PATHS project does not store any information besides the proper enrichment of CH items. • The edm:rights property refers to the annotated information (instead of the rights of the original CH item). • As said before, the paths:informativeness element pertains to the PATHS aggregation resource because it refers to the CH object as a whole. Finally, we create a proxy resource for the PATHS aggregation and describe the remaining paths annotations within the scope (as properties) of this resource: <http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> Representing various types of enrichment. As shown in the example, the proxy resource relates the CH item with external resources such as vocabulary concepts, events, related items or objects from some external sources (such as Wikipedia or dbpedia). As all the associations are described by means of the high-level edm:isRelatedTo property, it is necessary to properly declare the types of the external objects related to the CH object. Otherwise, there would be no way to discriminate among the different types of PATHS annotations (for instance, there would be no way to specifically retrieve the vocabulary concepts related to a CH object). As a first solution, we can include a separate description for the resources linked to the CH object using SKOS7 . Within PATHS we define the following types of external resources: • Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclass of skos:Concept. 7 • Vocabulary concepts are of type skos:Concept. http://www.w3.org/2004/02/skos 6
  20. 20. • Events are of type paths:EventConcept, a subclass of skos:Concept. It represents any concept which refers to a (type of) event (such as “run”, “play”, etc). • Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept. Note that these classes are meant to offer a way to discriminate among the different types of annotations inside the PATHS project. The classes are therefore loosely defined, in the sense that they do not describe the proper semantic type of the resources. For instance, PATHS can relate a CH object with a dbpedia resource representing a place (New_York), a person (Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicit common type for all those resources can be inherited from their “background link” status. Also note that at the time being, Europeana would not be able to perfectly ingest data that uses such sub-classes, as they depart from the set of elements described by Europeana’s instructions for providers8 . This would require Europeana to handle specialisations of EDM, which is not precisely scheduled at the time of writing. Based on the above, we also include the following statements in the example: <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. <http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept; skos:prefLabel "Archeology"@en. along with the definitions of these new types: paths:EventConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Event Concept"@en ; skos:definition "A concept describing an Event"@en . paths:RelatedItemConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Related Item Concept"@en ; skos:definition "A concept describing a CH record"@en . paths:BackgroundLinkConcept a owl:Class ; rdfs:subClassOf skos:Concept ; rdfs:label "Background Link Concept"@en ; skos:definition "A concept describing an object from an external source such as dbpedia"@en . The above definitions can be put next to the annotation data, in a separate file directly provided to Europeana or others, or even served over the Web in a Linked Data scenario. The whole EDM representation for the item is shown in Figure 2. 8 http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders 7
  21. 21. <http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:dataProvider "English Heritage - Viewfinder"; edm:provider "CultureGrid"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>; edm:rights <http://www.europeana.eu/rights/rr-f/>. <http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>; # Existing ESE record dc:creator "Davies, J O"; dc:date "[2001]"; dc:title "Stembridge Windmill, High Ham, Somerset"; dc:description "This is a random-coursed blue lias ...". <http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation; edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>; edm:provider "PATHS"; edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>; edm:rights <http://www.paths-project.eu/rights/rr-f/>; # item informativeness paths:informativeness "0.7". <http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy; ore:proxyFor <http://data.europeana.eu/item/09405/8F49>; ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49> # vocabulary mapping edm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>; # events edm:isRelatedTo <http://www.paths-project.eu/event/playing>; # related items edm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>; # background links items edm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>. # Or <http://dbpedia.org/resource/Archaeology> <http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept; skos:prefLabel "Tower Mills"@en. <http://www.paths-project.eu/event/playing> a paths:EventConcept; skos:prefLabel "playing"@en. <http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept. Figure 2: EDM representation of the ESEPaths example 8
  22. 22. Using specific metadata fields to represent enrichments Alternatively, if a PATHS enrichment is known to be certain, a new metadata field can be created for the CH object. For instance if the mapping of the CH record to a vocabulary concept is known to be sure, we can create a new dc:subject field linking the CH record with the appropriate vocabulary concept. Note however that PATHS enrichments are automatically performed, and it is not certain that a concept enrichment derived from a dc:subject would result in a dc:subject relation between the object and the concept. The link to the concept may have been identified based on only a small part of the original field, thus missing some of the original semantics. Thus some manual assessment has to be done in order to promote the annotation into a proper metadata field. 4 Using Open Annotation to represent attributes in relations The roadmap described in the previous section covers the main aspects of ESEPaths. However, there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inherits RDF’s focus on binary relations: attributes on relations. Almost all annotations created by the PATHS project have some information associated to them. Especially, many annotations record a confidence value, describing the level of certainty of the automatic method when creating the annotation. A way to overcome this limitation in an RDF-based model would be to reify the annotation into an instance of a dedicated class, and represent the annotation attributes using class properties. For this we can re-use elements from the Open Annotation (OA) model9 . Consider this ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> <paths:background_link source="wikipedia" start_offset="0" end_offset="11" field="dc:subject" confidence="0.015" method="wikipedia-miner-1.2.0" title="Archaeology"> http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> We would create the following oa:Annotation for it: background_link1 a oa:Annotation ; a paths:BackgroundLinkAnnotation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In the example, the <paths:background_link> annotation has been converted (reified) to an oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation, 9 http://www.openannotation.org/spec/core/ 9
  23. 23. linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the original relation are now represented as properties of this new resource. An alternative of the above approach would be using the OA “motivation” property for representing the annotation. The OA motivation is meant to represent “the reasons why the Annotation was created, not just the agents involved” 10 , which fits particularly well with the kind of information we want to represent. The “motivation” approach would lead to the following triplets: background_link1 a oa:Annotation ; oa:motivatedBy paths:backgroundLinkMotivation ; oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ; #Or <http://dbpedia.org/resource/Archaeology> paths:source <http://en.wikipedia.org> ; #Or <dbpedia.org> paths:confidence "0.015" . In this case, the <paths:background_link> object is of type oa:Annotation, and it is also oa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept. Both approaches described so far solve the main problem of attaching attributes to relations, and also the need of defining specific relations for PATHS such as paths:background_link, that would conflict with the metadata fields currently used by EDM. Note however that the properties of the newly defined reified annotations are still specific for PATHS (paths:source, paths:confidence, etc). On a side note, using reified concepts for annotation raises the issue of whether we should still keep the proxy-based representation next to it. Because now all the PATHS enrichment data is attached to the reified annotation, the Proxy object described in Section 3 will convey little or no information at all, compared to the original data. 4.1 Offsets and selectors There is another piece of ESEPaths data, which is not currently represented in EDMPaths, namely, the field and offset attributes of the relations. Because all PATHS annotations are extracted from the textual content of some metadata field in the original CH record representation, ESEPaths annotations keeps track of the original text snippet (called the anchor ) which was used to derive the enrichment. In order to track this kind of provenance information, EDM could re-use the selectors from the Open Annotation model11 . For instance, Consider the following ESEPaths snippet: <record> ... <europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri> ... <paths:background_link start_offset="0" end_offset="11" field="dc:subject" ... > http://en.wikipedia.org/wiki/Archaeology </paths:background_link> </record> 10 11 http://www.openannotation.org/spec/core/core.html#Motivations http://www.openannotation.org/spec/core/specific.html#Selectors 10
  24. 24. It describes an “background link” annotation for the CH object “09405/8F49” which was extracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsets could be translated to the following Open Annotation snippet: background_link1 a oa:Annotation ; oa:hasTarget anchor1 ; oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> . anchor1 a oa:SpecificResource ; oa:hasSource ??? ; # which type has this object ? oa:hasSelector selector1 . selector1 a oa:TextPositionSelector ; oa:start 0 ; oa:end 11 . As noted in the snippet, our problem is then to define the type of the anchor1 resource. This object should represent the dc:subject field of CH record “09405/8F49”, but there is actually no way to describe this with EDM. We thus decided to leave this piece of information out of our proposed solution. 5 Conclusion In this work we describe a method for representing automatically created PATHS annotations into the EDM model. We first describe a simple way for representing the annotations and discuss its benefits and drawbacks. One important weakness of the simple annotation schema lies in its inability to represent attributes of annotations, such as confidence scores. To overcome this limitation we propose a more complex solution that involves reifing the annotation properties as instances of dedicated classes, and representing the annotation attributes using class properties. For this we have re-used elements from the Open Annotation (OA) model. The method presented here, called EDMPaths, is able to properly represent the annotations following EDM, but some information which was previously present following ESE has been left out. In particular, information regarding the particular offset of the anchor that caused the annotation was produced has proven difficult to represent. One of our main design goals has been to avoid creating new non-standard classes and properties when defining EDMPaths. We think we have succeed on this particular aspect, mainly by reusing elements from initiatives such as the Open Annotation model. However, the proposal describes some properties which are still specific for the PATHS project. References [Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing and representation of content for first prototype. Technical report, PATHS project. [Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van de Sompel, H. (2010). The europeana data model (EDM). In World Library and Information Congress: 76th IFLA general conference and assembly, pages 10–15. 11
  25. 25. [Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and representation of content for second prototype. Technical report, PATHS project. 12

×