The document discusses building an ontology-based application to communicate museum content in multiple languages on the Semantic Web. It aims to make cultural heritage accessible to both humans and computers by generating natural language descriptions from semantic data. The application uses Grammatical Framework to linearly multiple museum datasets and ontologies into 15 languages. It addresses challenges in cross-linguistically representing classes, properties, word order, tense, and reference. The system was demonstrated to generate descriptions of paintings from the Louvre museum in English and French.
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
1. Multilingual Access to Cultural
Heritage Content on the Semantic
Web
Dana Dannells, Aarne Ranta, Ramona Enache,
Mariana Damova, Maria Mateva
mariana.damova@mozajka.co
LATECH - ACL’2013
Sofia, August 2013
2. The aim …
To build an ontology-based application for
communication of museum content on the
Semantic Web and make it accessible in 15
languages.
Why ? …
• Semantic Technologies
• Communicating with Semantic Web
easier for computers than for humans
• Cultural heritage on the Web requires easy human computer
interaction
Natural Language
LATECH - ACL’2013
3. • Easy integration of multiple data-sources
– once the schemata of these sources is semantically aligned, the inference
capabilities of the engine supports the interlinking and combination of
the facts from the different sources;
• Easy querying against rich or diverse data schemata
– inference is applied to match semantics of the query to the semantics of
the data, regardless of the vocabulary and the data modeling patterns
used for encoding of the data;
• Based on open standards (W3C)
– RDF, RDFs
– OWL
– SPARQL
Semantic Technologies: Main Characteristics
"The Semantic Web is an extension of the current web in which
information is given well-defined meaning, better enabling computers
and people to work in cooperation.“ [Berners-Lee et al. "The Semantic
Web", Scientific American, May 2001]
LATECH - ACL’2013
4. • graphs published on the web and explorable across servers in a
manner similar to the way the HTML web is navigated
• combining facts and knowledge from different datasets is the
ultimate goal of the Semantic Web
Need of convincing real life use cases demonstrating the benefits of
these technologies
Linked Open Data: Driver of the Semantic Web
MacManus, the Founder and Editor-in-Chief of ReadWriteWeb defined an exemplary test for
the Semantic Web
cities around the world which have Modigliani art works
Cultural Heritage
FactFoge (factfoge.net ) gives the result based on info from 6 datasets)
LATECH - ACL’2013
5. Outline
http://molto-project.eu
• Museum Reason-able View: Interoperable cultural
heritage knowledge bases
• Ontology-based multilingual grammar for retrieving
and generating museum content:
– RDF to NL – well-formed descriptions
– NL to RDF – SPARQL queries linearization
• Cross-language retrieval and representation system
using Semantic Web technology
LATECH - ACL’2013
6. Museum Reason-able View: Interoperable
cultural heritage knowledge base
Domain ontology: CIDOC Conceptual Reference Model (CRM) v. 5.0.1
Application ontologies: Painting ontology and Museum Artifacts Ontology (MAO)
Ontology Classes Properties
CIDOC-CRM 87 130
Painting ontology 197 107
MAO 10 20
Total 836 440
The museum Linked Open Data (LOD)
Gothenburg City Museum (GCM): Only two collections (GSM and GIM)
DBpedia: Larger amount of painting entities
Source Amount of entities
GCM 48
DBpedia 614
Total 662
Loaded into OWLIM semantic repository, inference with respect to OWL Horst
Amounting to 1,987,616 retrievable RDF triples, from 460,367 explicit statements including
label translations in multiple languages
- Paintings from the Gothenburg museum acquired though
donation
- Museum objects from Britain
- Museum artefacts preserved in the museum since 2005
- Paintings from the GSM collection
- Inventory numbers of the paintings from the GSM collection
- Location of the objects created by Anders Hafrin
- Paintings with length less than 1 m
PREFIX fb: <http://rdf.freebase.com/ns/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
PREFIX umbel-sc: <http://umbel.org/umbel/sc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ff: <http://factforge.net/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_l
WHERE {
dbpedia:Amedeo_Modigliani fb:visual_art.visual_artist.artworks ?o .
?o fb:visual_art.artwork.owners [
fb:visual_art.artwork_owner_relationship.owner ?
ow ] .
?o fb:type.object.name ?painting_l.
?ow ff:preferredLabel ?owner_l .
OPTIONAL { ?ow fb:location.location.containedby
[ rdf:type dbp-ont:City ; ff:preferredLabel ?city_fb_con ] } .
OPTIONAL { ?ow dbp-ont:location [ rdf:type dbp-ont:City ;
ff:preferredLabel ?city_db_l ] }
FILTER ( bound(?city_fb_con) || bound(?city_db_l) )
}
NLP Interface required
LATECH - ACL’2013
7. GF – Grammatical Framework
Text
Query
Answer
Data
YAQLRGL
Lexicon
Application Grammar
A grammar formalism, based on Martin Loef’s type theory
Division between
- abstract syntax, i.e. the semantic representation of the domain
- concrete syntaxes, representing linearizations in various target languages
- resource library with the syntaxe of 30 languages, covering a couple of hundred of
syntactic structures
Multilingual coverage
Supported languages
Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, Hebrew, Italian, German,
Norwegian, Romanian, Russian, Spanish, Swedish
LATECH - ACL’2013
10. Text grammar
Covers a subset of the ontology classes and properties
Nine classes: Title, Painter, Type, Colour, Size, Year, Material, Museum, Place
One function, three sentences
Painting description, e.g. query: Everything known about a painting
Guernica was painted by Pablo Picasso in 1937. It measures 349 by 776 cm. This
painting is displayed at the Museo Nacional Centro de Arte Reina Sofía.
LATECH - ACL’2013
11. Lexicon grammar
Input:
createdBy (Guernica, Pablo Picasso)
isA (Pablo Picasso, Painter)
isA (Guernica, Painting)
Direct verbalization:
Guernica is a painting. Guernica was created by Pablo
Picasso. Pablo Picasso is a painter.
Actual realization:
Guenica was painted by Pablo Picasso.LATECH - ACL’2013
12. Multilingual data grammar
Contains ontology entities that were extracted from GCM and
DBpedia
No adequate translations from DBpedia; Entities of museum
names are automatically translated from Wikipedia
Class Entities
Title 662
Painter 116
Museum 104
Place 22
Total 904
LATECH - ACL’2013
14. Example: Show everything you know about
all paintings at the Louvre (English)
Unfinished portrait of General Bonaparte was painted on canvas by
Jacques-Louis David in 1798. It measures 65 by 81 cm. This work is
displayed at the Musée du Louvre.
Unfinished portrait of General Bonaparte a été peint sur canvas par
Jacques-Louis David en 1798. Il est de 65 sur 81 cm. Cette oeuvre est
exposée au Musée du Louvre.
http://museum.ontotext.com
LATECH - ACL’2013
15. Multilingual Challenges
• Lexicalizations
Classes: compounds, multiword expressions
Properties: verbs, adverbs, prepositions
• Order of semantic elements
Material, Year
• Tense and voice
Past, past participle, present, active/passive
• Aggregation
Conjunction, relative clause, punctuations
• Coreference
Pronoun, noun, empty reference
LATECH - ACL’2013
16. Related Work
Ion Androutsopoulos, Vassiliki Kokkinaki, Aggeliki Dimitromanolaki, Jo Calder,
Jon Oberl, and Elena Not. 2001. Generating multilingual personalized
descriptions of museum exhibits: the M-PIRO project. In Proceedings of
the International Conference on Computer Applications and Quantitative
Methods in Archaeology.
Nadjet Bouayad-Agha, Gerard Casamayor, Si- mon Mille, Marco Rospocher,
Horacio Saggion, Luciano Serafini, and Leo Wanner. 2012. From Ontology
to NL: Generation of multilingual user-oriented environmental
reports. Lecture Notes in Computer Science, 7337.
Basil Ell, Denny Vrandeci˘c, and Elena Simperl. 2012. SPARTIQULATION –
Verbalizing SPARQL queries. In Proceedings of ILD Workshop, ESWC 2012.
Axel-Cyrille Ngonga Ngomo, Lorenz Buhmann,¨ Christina Unger, Jens
Lehmann, and Daniel Gerber. 2013. Sorry, i don’t speak sparql: trans-
lating sparql queries into natural language. In Proceedings of the 22nd
international conference on World Wide Web, WWW ’13, pages 977–
988, Republic and Canton of Geneva, Switzerland. International World
Wide Web Conferences Steering Committee.
LATECH - ACL’2013
17. Conclusion and Future Work
• Method for NL to Ontology interoperability
• Large scale Semantic Web interlinked dataset
• Multilingual analysis and generation on the fly
• Experiment with the coverage of the NL grammars
• Incorporate in a real application, i.e. Europeana
• Scalability of GF
• Enlarge the knowledge base
LATECH - ACL’2013
18. References http://molto-project.eu
Mariana Damova, Dana Dannells, Ramona Enache, Maria Mateva, Aarne
Ranta Natural Language Interaction with Semantic Web Knowledge Bases
and LOD. In: Towards the Multilingual Semantic Web, Paul Buitelaar and
Philip Cimiano, eds., Springer, Heidelberg, Germany, 2013.
Dana Dannells, Mariana Damova, Ramona Enache, Milen Chechev.
Multilingual Online Generation from Semantic Web Ontologies. In:
Proceedings of WWW'2012, Lyon, France, April 2012.
Dana Dannells, Mariana Damova, Ramona Enache, and Milen Chechev. A
Framework for Improved Access to Museum Databases in the Semantic
Web. In Recent Advances in Natural Language Processing (RANLP).
Language Technologies for Digital Humanities and Cultural Heritage
(LaTeCH), Hissar, Bulgaria, 2011.
Mariana Damova, Dana Dannells. Reason-able View of Linked Data for
Cultural Heritage In: Proceedings of S3T'2011, Burgas, Bulgaria,
September 2011, Advances in Intelligent and Soft Computing, ISSN: 1867-
5662, Springer Verlag, Heidelberg, Germany, 2011.
LATECH - ACL’2013
19. Thank you for your attention
Questions
mariana.damova@mozajka.co
LATECH - ACL’2013