UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
Â
06 gioca-ontologies
1. Chapter 6: Metadata and ontologies for
digital cultural heritage documentation
Information Technology and Arts Organizations
A.A. 2010-2011 Information Technology and Arts Organizations 1
2. Syllabus (3/3)
5. Databases
1. Entities, attributes and relations
2. Primary key and foreign key, data domain, query language (SQL)
3. Examples using Access DBMS
4. Spatial access to digital content: GIS and GPS
5. GIS examples using ESRI Arcview
6. Metadata and ontologies for digital cultural heritage documentation
1. XML, RDF
2. Dublin Core
3. Semantic Web
4. OWL, ontologies
5. Cidoc-CRM
A.A. 2010-2011 Information Technology and Arts Organizations 2
3. Motivations
⢠Digital data are stored into files and databases
⢠The data representation is important because if common
convention are taken , different applications can cooperate ,
communicate and elaborate data to provide advanced services
Interoperability
⢠Internet is a perfect spreader of digital information about art
and culture
⢠A lot of standards are present it is difficult to have high
level of interoperability
⢠Information can be written in many ways (different languages,
synonyms, ...)
META-DATA means âdata about dataâ
A.A. 2010-2011 Information Technology and Arts Organizations 3
4. Looking for a film budget...
WEB 2.0 SPARQL
326.000 results
1 result âş
WEB 1.0 SQL
A.A. 2010-2011 Information Technology and Arts Organizations 4
5. A cultural search engine...
http://e-culture.multimedian.nl/demo/session/search
A.A. 2010-2011 Information Technology and Arts Organizations 5
7. A step toward ontologies...
during these few lectures Iâve understood the importance of
new technological devices in the arts: in particular way, they <feedback>
could really help the public in better understand the context
and the history of a piece of art. The concepts Iâve learned
<description> This survey takes less than ten minutes to be completed. The first
make me think about new opportunities and new systems section is composed of 10 multiple choices questions followed by 5 open questions.
which can be implemented using basic devices and means Thank you for your feedback.
of communication.I would like to learn more about ICTs in
the fields of music and live performances, if there are some
</description>
important steps forward in this direction, because they are <course>
the scenarios in which Iâm more interested in. Iâm also
interested in archives and in query formulation.I personally Please insert here the name of the module your are going to evaluate
have some difficulties with the computer language and with </course>
its working mechanisms. I also have some problems with
abbreviations: I would like to know literary the meaning of <student>
these acronyms in order to better understand their <name>Put here your name</name>
applicability and functions.In these sessions Iâve found
interesting the fact that even if technology seems very <ID>What we can use for this?</ID>
complicated, in reality it is a sum of different small and </student>
simple components which can be all be leaded by a
fundamental rationale. </feedback>
Text XML (eXtensible Markup Language)
⢠TAG
⢠No structure
⢠Structured data
⢠For a computer it is just a sequence of chars
⢠TAGs can be nested
⢠Nesting does not represent any ârelationshipâ
⢠It can be represented by a tree
⢠TAG name are free
⢠No âcommon meaningâ associated to each tag
A.A. 2010-2011 Information Technology and Arts Organizations 7
8. ...second step...
<rdf:RDF>
<rdf:Description rdf:about="subject"> <rdf:RDF>
Namespace
xmlns:rdf=âhttp://www.w3.org/1999/02/22-rdf-syntax-ns#â
<predicate rdf:resource="object" /> xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
URI
<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Oxford
http://en.wikipedia.org/wiki/Oxford
http://en.wikipedia.org/wiki/Oxford">
dc:title>Oxford</dc:title>
</dc:title
<dc:title>Oxford</dc:title>
<predicate>literalvalue</predicate> dc:coverage>Oxfordshire</dc:coverage>
</dc:coverage
<dc:coverage>Oxfordshire</dc:coverage>
<dc:publisher rdf:resource=âhttp://en.wikipedia.org />
http://en.wikipedia.org
http://en.wikipedia.orgâ
</rdf:Description>
</rdf:Description>
Literal </rdf:RDF>
... Statement
DC (Dublin Core)
⢠Just 15 elements
<rdf:Description .... />
⢠A DC resource can be represented using RDF/XML
</rdf:RDF> ⢠It can be seen as a namespace for resources description
RDF (Resource Description Framework) ⢠It can be used to describe a single resource
⢠Triples (subject-predicate-object) ⢠To describe a complex domain we need something different
⢠Statements
⢠We can relate different resources ISO Standard 15836:2009
⢠It can be represented by a graph
⢠Everything in unique identified (URI)
⢠Namespaces vocabularies
A.A. 2010-2011 Information Technology and Arts Organizations 8
9. ...third step!
RDFS (RDF Schema) CIDOC-CRM (Conceptual Reference Model)
⢠RDF + meaning to âspecial resourcesâ
⢠Concept of Class
⢠Predicate is also known as Property
âThe CIDOC Conceptual Reference
Model (CRM) provides definitions and a
formal structure for describing the
implicit and explicit concepts and
relationships used in cultural
OWL (Ontology Web Language) heritage documentation.â
⢠Can be written in RDF An ontology of 80 classes and 132
⢠Added ânew propertiesâ properties
⢠intersectionOf, unionOf, complementOf
ISO Standard 21127:2006
⢠someValuesFrom, allValuesFrom
⢠Cardinality of a property
⢠Different types of property
⢠Symmetric
⢠Functional
⢠InverseFunctional
A.A. 2010-2011 Information Technology and Arts Organizations 9
10. The Semantic Web project
âMost of the Web's content today is designed for humans to read, not for
computer programs to manipulate meaningfullyâ
Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). "The Semantic Web".
Scientific American Magazine.
http://www.sciam.com/article.cfm?id=the-semantic-web&print=true. Retrieved March 26, 2008.
http://en.wikipedia.org/wiki/Semantic_Web
http://en.wikipedia.org/wiki/DBpedia
A.A. 2010-2011 Information Technology and Arts Organizations 10
11. XML: gives a structure to data (syntax)
<TagnameA attrName1=âAttrValueXâ ⌠AttrNamen= â AtrrValueYâ >Text</TagNameA>
OR
<TagnameB attrName1=âAttrValueXâ ⌠AttrNamen= â AtrrValueYâ />
Example:
<root>
<author>
<name>Luca</name>
<email>luca.roffia@unibo.it</email>
</author>
<author2 name="Luca" email="luca.roffia@unibo.it"/>
</root>
⢠Tags can be nested the first opened is the last to be closed the
structure of a XML document can be represented by a tree
⢠A well formed document has only one root element
⢠At the beginning of the document (before the root element) there is a line
which declares the language, the version the encoding and other
characteristic, i.e. <?xml version="1.0" encoding="UTF-8"?>
A.A. 2010-2011 Information Technology and Arts Organizations 11
12. Semantics in XML
Same meaning but different structure
<Works>
<Work1>Monnalisa</Work1> <Works Work1=âMonnalisaâ
<Work2>Last Supper</Work2> Work2=âLast Supperâ/>
</Works>
Same structure but different meaning
<Works> <Europe>
<Work1>Monnalisa</Work1> <Nation1>Italy</Nation1>
<Work2>Last Supper</Work2> <Nation2>France</Nation2>
</Works> </ Europe >
A.A. 2010-2011 Information Technology and Arts Organizations 12
13. The next step...RDF
⢠Resource: everything we want to identify. Identification is done by using
URI (Universal Resource Identifier):
an URL (called namespace or prefix) + suffix
⢠Statement: a triple like subject â predicate â object
Object can be a resource or a primitive type,
Subject and predicate are resources, i.e.
i.e. number, string
they are identified by an URI
Example
URI
http://dbpedia.org/resource/ (Namespace) + Rio_de_Janeiro (Suffix) URI: http://dbpedia.org/resource/Rio_de_Janeiro
http://dbpedia.org/property/ (Namespace) + populationTotal (Suffix) URI: http://dbpedia.org/property/populationTotal
http://dbpedia.org/ontology/ (Namespace) + birthPlace (Suffix) URI: http://dbpedia.org/ontology/birthPlace
http://dbpedia.org/resource/ (Namespace) + Paulo_Coelho (Suffix) URI: http://dbpedia.org/resource/Paulo_Coelho
STATEMENTS
http://dbpedia.org/resource/Rio_de_Janeiro - http://dbpedia.org/property/populationTotal - 6093472
http://dbpedia.org/resource/Rio_de_Janeiro - http://dbpedia.org/ontology/birthPlace - http://dbpedia.org/resource/Paulo_Coelho
A.A. 2010-2011 Information Technology and Arts Organizations 13
14. RDF graph
â˘A set of RDF statements can subject Drupal 7 data model
used to describe a domain.
This set is called RDF
knowledge base
predicate
â˘An RDF knowledge base can
object
by represented by using a
labelled graph: each node subject
represents a resource, i.e.
subject or object, and each
edge represents a predicate
predicate
object
NAMESPACES
A.A. 2010-2011 Information Technology and Arts Organizations 14
15. RDF/XML statement
⢠An RDF statement can be expressed by using the XML syntax
⢠In order to make a RDF statement more concise, a namespace can be
specified by using this convention @prefix namespace:URL
Examples:
@prefix dbpedia-owl:http://dbpedia.org/ontology/
@prefix dbpprop:http://dbpedia.org/property/
@prefix dbpedia:http://dbpedia.org/resource/
http://dbpedia.org/resource/Rio_de_Janeiro - http://dbpedia.org/ontology/birthPlace - http://dbpedia.org/resource/Paulo_Coelho
becomes:
dbpedia:Rio_de_Janeiro â dbpedia-owl:birthPlace â dbpedia:Paulo_Coelho
<rdf:Description about= âdbpedia:Rio_de_Janeiroâ>
<dbpedia-owl:birthPlace>
âRDF/XML: âdbpedia:Paulo_Coelhoâ
</dbpedia-owl:birthPlace>
</rdf:Description>
A.A. 2010-2011 Information Technology and Arts Organizations 15
16. Dublin Core: a famous ânamespaceâ
The DC was born in Dublin (Ohio) in 1995. It was created by a research
group organized by the Online Computer Library Center (OCLC) and
by the National Center of Supercomputer Application (NCSA)
Motivations:
⢠Museums organize and present their resources in different ways
⢠Even if the structures used to handle information are compatible,
often there are difficulties in data interpretation, caused by a
different terminology and semantics
⢠Effort in cultural integration from different institutions are still
limited. Integration of resources owned by different cultural sites
could be very helpful for users. They could use an unified interface to
search different kind of resources, available in different formats
(from a real object to a digital representation)
⢠The main obstacle is the structural/semantic incompatibility
between information system hosted by institutions
It is important to adopt a common/standard interchange data
format: a standard to represent information
A.A. 2010-2011 Information Technology and Arts Organizations 16
17. CIMI (Computer Interchange of
Museum Information)
CIMI (Computer Interchange of Museum Information): is âa
consortium of cultural heritage institutions and organizations
working together to remove barriers to sharing our most
valuable cultural information.â
The consortium develops relevant standards and encourages
open, standards-based approaches to creating and sharing
digital information
CIMI worked on the application of Dublin Core in museum
resources and supply guidelines for the implementation of this
standard in cultural heritage domain
ISO Standard 15836:2009 of February 2009
A.A. 2010-2011 Information Technology and Arts Organizations 17
18. Dublin Core goals
⢠Simplicity of creation and maintenance
The Dublin Core element set has been kept as small and simple as possible to allow a non-
specialist to create simple descriptive records for information resources easily and
inexpensively, while providing for effective retrieval of those resources in the
networked environment
⢠Commonly understood semantics
The Dublin Core can help a non-specialist searcher to find her way by supporting a common
set of elements, the semantics (meaning) of which are universally understood and
supported
⢠Extensibility
Dublin Core developers have recognized the importance of providing a mechanism for
extending the DC element set for additional resource discovery needs
Dublin Core aims to allow the exchange of information in different
environment to simplify the discovery of resources
communicate collaborate exchange
A.A. 2010-2011 Information Technology and Arts Organizations 18
19. One to one principle
In general Dublin Core metadata describes one manifestation or
version of a resource, rather than assuming that manifestations
stand in for one another
⢠Surrogates are described separately from the original object
A jpeg image of the Mona Lisa has much in common with the original
painting, but it is not the same as the painting. As such the digital
image should be described as itself
The problem of the original and surrogates is important for the
museums, where the originals are exposed and the surrogates has to
be described accurately but at the same time efficiently
⢠This principle, in many cases, simplify the resource
description
The author of the of the original Mona Lisa is the painter, while the
author of the photo is the photograph
A.A. 2010-2011 Information Technology and Arts Organizations 19
20. The 15 DC ELEMENTS
RESOURCE
TITLE A name given to the resource.
DESCRIPTION Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource.
Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium,
TYPE or dimensions of the resource, use the Format element.
Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. A jurisdiction may be a named administrative
entity or a geographic place to which the resource applies. Recommended best practice is to use a controlled vocabulary such as the Thesaurus of
COVERAGE Geographic Names [TGN]. Temporal topic may be a named period, date, or date range. Where appropriate, named places or time periods can be used in
preference to numeric identifiers such as sets of coordinates or date ranges.
Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled
SUBJECT vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element.
RELATIONSHIPS
Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. Relationships may be
expressed reciprocally (if the resources on both ends of the relationship are being described) or in one direction only, even when there is a refinement available to
RELATION allow reciprocity. If text strings are used instead of identifying numbers, the reference should be appropriately specific. For instance, a formal bibliographic
citation might be used to point users to a particular resource.
The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a
SOURCE string conforming to a formal identification system. In general, include in this area information about a resource that is related intellectually to the described
resource but does not fit easily into a Relation element, e.g. Image from page 54 of the 1922 edition of Romeo and Juliet
INTELLECTUAL PROPERTIES
CREATOR Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.
CONTRIBUTOR Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity
PUBLISHER Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.
RIGHTS Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights.
IDENTIFICATION
Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF
DATE profile of ISO 8601 [W3CDTF].
Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types
FORMAT [MIME].
IDENTIFIER Recommended best practice is to identify the resource by means of a string conforming to a formal identification system.
LANGUAGE Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646].
http://dublincore.org/documents/usageguide/elements.shtml http://dublincore.org/documents/dcmi-terms/
A.A. 2010-2011 Information Technology and Arts Organizations 20
21. AN EXAMPLE OF DUBLIN CODE IN RDF
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/metadata/dublin_core#">
<rdf:Description rdf:about="http://www.dlib.org">
<dc:Title>D-Lib Program</dc:Title>
<dc:Description>
The D-Lib program supports the community of people
with research interests in digital libraries and
electronic publishing.
</dc:Description>
<dc:Publisher>
Corporation For National Research Initiatives
</dc:Publisher>
<dc:Date>1995-01-07</dc:Date>
<dc:Subject>Research; statistical methods</dc:Subject>
<dc:Type>World Wide Web Home Page</dc:Type>
<dc:Format>text/html</dc:Format>
<dc:Language>en</dc:Language>
</rdf:Description>
</rdf:RDF>
A.A. 2010-2011 Information Technology and Arts Organizations 21
22. Considerations
⢠The Dublin Core is a standard useful in the mapping phase
in different domains
⢠It can not be applied to model complex domains
⢠It is possible to add attributes called qualifiers to improve
the quality and the detail of the information
â âDumb-Down Principleâ : every application can ignore the
qualifiers for which it has not an interpretation
⢠This data model describes each single resource
⢠Relationships among resources are not well specified
⢠DC namespace can be used in RDF/XML statements
A.A. 2010-2011 Information Technology and Arts Organizations 22
23. RDF Vocabulary Description Language -
- RDF Schema (RDFS)
⢠It extends RDF to include basic features needed to define
ontologies
â Everything is called resource (subject, predicate, object)
â The predicate is also called property
⢠It allows to give a meaning to âspecialâ resources
⢠It introduces the concept of Class
⢠The rdfs:Class resource is the class
of all the RDF classes
The technique of inheritance is the process of merging the
differentiae along the path above any category: Living is
defined as animate material Substance, and Human is rational Tree of Porphyry
sensitive animate material Substance.
A.A. 2010-2011 Information Technology and Arts Organizations 23
24. RDFS graph example
Classes rdfs:Class
rdfs:subClassOf rdfs:subClassOf
WorkOfArt URI? Artist
rdf:type rdf:type
dc:creator
http://en.wikipedia.org/wiki/David_%28Donatello%29 http://en.wikipedia.org/wiki/Donatello
rdf:type rdf:type
dc:creator
http://en.wikipedia.org/wiki/David_%28Michelangelo%29 http://en.wikipedia.org/wiki/Michelangelo
Instances
A.A. 2010-2011 Information Technology and Arts Organizations 24
25. OWL (Ontology Web Language)
⢠It is integrated with RDF OWL is directly accessible to web
applications
⢠It allows to create a knowledge base about a domain of interest in
terms of:
â individuals: are the basic elements of the domain, e.g. Donatello
â concepts (classes): describe sets of individuals having similar
characteristics , e.g. Artist
â roles (properties): describe relationships between pairs of individuals,
e.g. dc:creator
⢠RDFS allows to model:
â Hierarchy of classes and properties
â Domain and range of properties
⢠OWL extends RFDS in terms of:
â Logical operation on classes
â Additional property characteristics: Transitive, Symmetric, Functional
A.A. 2010-2011 Information Technology and Arts Organizations 25
26. OWL Properties examples
Figures from:
A Practical Guide To Building OWL Ontologies Using ProtĂŠgĂŠ 4 and CO-ODE Tools, Edition 1.2, Matthew Horridge
The University Of Manchester, Copyright @ The University Of Manchester, March 13, 2009
A.A. 2010-2011 Information Technology and Arts Organizations 26
27. CIDOC-CRM
⢠Semantic interoperability in culture can be achieved by an âextensible
ontologyâ and explicit event modeling, that provides shared explanation
rather than prescription of a common data structure
⢠âThe intended scope of the CIDOC CRM may be defined as all information
required for the scientific documentation of cultural heritage
collections, with a view to enabling wide area information exchange and
integration of heterogeneous sourcesâ
⢠âThe CIDOC Conceptual Reference Model (CRM) provides definitions and a
formal structure for describing the implicit and explicit concepts and
relationships used in cultural heritage documentation.â
⢠The CIDOC-CRM models actors, events, objects in space and time
The ontology is a language that IT experts and cultural experts can share
A.A. 2010-2011 Information Technology and Arts Organizations 27
28. From DC to CIDOC-CRM
Type: Text
Title: Protocol of Proceedings of Crimea Conference
Title.Subtitle: II. Declaration of Liberated Europe
Date: February 11, 1945
Creator: The Premier of the Union of Soviet Socialist Republics
The Prime Minister of the United Kingdom
The President of the United States of America
Publisher: State Department
Subject: Postwar division of Europe and Japan
Metadata (Dublin Core)
Document
âThe following declaration has been approved:
About⌠The Premier of the Union of Soviet Socialist
Republics,the Prime Minister of the United Kingdom
and the President of the United States of America
have consulted with each other in the common
interests of the people of their countries and
those of liberated Europe. They jointly declare
their mutual agreement to concertâŚand to ensure
that Germany will never again be able to disturb
the peace of the worldâŚâŚâ
A.A. 2010-2011 Information Technology and Arts Organizations 28
29. CIDOC-CRM example from Martin Doerr, Steve Stead âThe CIDOC
CRM, a Standard for the Integration of Cultural Information
E52 Time-Span E53 Place
February 1945
E39 Actor
P82 at some time within ce at
k pla
P1
1p P7 too
art
icip
a ted
in
E7 Activity
âCrimea Conferenceâ P6
7 E38 Image
E39 Actor is
re
f er
re
d
P86 falls within to
E65 Creation
Event
E39 Actor * P9
4h
as
cre
ate
d d
me
e rfor P81 ongoing throughout
4p
P1
E52 Time-Span E31 Document
11-2-1945 âYalta Agreementâ
A.A. 2010-2011 Information Technology and Arts Organizations 29