These slides describe the general concept of semantic Web and Linked Data, then they illustrate the concept of digital object. Finally they give a use case.
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
How to model digital objects within the semantic web
1. How to model Digital Objects within
the Semantic Web
Angelica Lo Duca
IIT-CNR
angelica.loduca@iit.cnr.it
2. Overview
● Preliminary concepts:
Semantic Web
● What is a digital object
● How to model a digital
object
○ the concept of
ontology
○ Europeana Data
Model
○ use case: Clavius’s
manuscripts
4. The World Wide Web
Knowledge Base
texts, images, videos,...
Protocol
to access and generate
the Knowledge Base
5. Web 1.0
Static Web
Web 2.0
Social Web or Interactive
Web
Web 3.0
Semantic Web
Early Nineties
2004
2006
6. Each web site is a
collection of web pages
Each web page can refer to
one another with global
links called Uniform
Resource Locators (URLs)
Web 2.0 = Web of documents
Web 2.0
communities
blogs
social
networks
wikis
video/photo
sharing
privacy
browsing
7. Limitations of Web
2.0
● Too much information
with too little structure
● Finding information is not
easy
● Data aggregation and
reuse
● Data integration from
different sources
● Inference of new
information
8. Limitations of Web 2.0 (cont.)
● Web content is
heterogeneous
● in terms of content
● in terms of structure
● in terms of character
encoding
10. Give a structure to the
content of Web pages so
that also machines can
understand it!
11. “The Semantic Web is an extension of the current web in
which information is given well-defined meaning, better
enabling computers and people to work in cooperation.”
[Tim Berners-Lee et al. 2001.]
12. Web 2.0 VS Web 3.0
Web 2.0 Web 3.0
granularity Web of documents Web of data
target consumers humans machines
13. Web of Data
● Granularity: resource
○ resource: everything that has an identity.
○ a web resource is a structure accessible on the web
● Target consumers: intelligent agents (machines)
● Integration & reuse: easier
○ Resources have unique identifiers - Uniform
Resource Identifiers (URIs)
14. The Semantic Web
(Web 3.0)
The Semantic Web defines a
formal knowledge expressed in a
formal language having:
● a machine-readable notation
● a formal syntax that is strongly
coupled with the web
architecture
● a formal semantics that
provides an access
mechanism.
Tim Berners-Lee, James Hendler, and Ora Lassila.
The semantic web. Scientific American Magazine,
2001.
15. Syntax VS Semantics
● Syntax concerned with arrangement of symbols
● Semantics concerned with the relation between
symbols strings and the world: what things actually
mean
int x = "five";
syntax is okay - type identifier = value
semantics is wrong - "five" is not an int
16. What are the main
aims of the SW?
● Automated
query-answering
● Automated use of the data
(reasoning, planning,
acting, etc.)
● ...
17. The Semantic Web vision (W3C)
● Extend principles of the Web from documents to data
● Data should be accessed using the general Web
architecture (e.g., URI-s, protocols, …)
● Data should be related to one another just as documents
are already
● Creation of a common framework that allows:
○ Data to be shared and reused across applications
○ Data to be processed automatically
○ New relationships between pieces of data to be inferred
18. Three strategies ● Introduce concepts of
artificial intelligence, to
make machines capable
of reasoning
● Motivate companies to
publish their data freely,
using standards defined
by Semantic Web
● Promote data reusage.
19. Resource Description Framework (RDF)
● RDF gives a standard model to represent knowledge
○ RDF is W3C Recommendation
● RDF is a data model
○ Originally used for metadata for web resources, then
generalized
○ Encodes structured information
○ Universal, machine readable exchange format
● Data structured in graphs
○ Nodes, Arcs
● RDF Exploits XML
20. RDF Triples
● Node and arcs labels should be unambiguous
Monna Lisa Leonardo da Vinci
hasAuthor
Subject Predicate Object
24. Digital Object
● Defined data structure, machine independent
○ metadata
● Consisting of a set of elements
○ Each of the form <type,value>
○ One of which is the unique identifier
● Identifiers are known as “Handles”
○ Format is “prefix/suffix”
○ Prefix is unique to a naming authority
○ Suffix can be any string of bits assigned by that authority
25. Unique Identifier
● Each Digital Object has its own unique &
persistent ID
● Content Providers assign Ids
● No theoretical limits on number of DOs Per
Repository
● Objects may be replicated in Multiple
Repositories
26. A specific example
Letter from Galileo Galilei to
Christopher Clavius
● Four folios
○ two recto
○ two verso
● a textual transcription of the
letter
27. Archives
● Digital Objects are stored in
Repositories (Archives) and
organized in collections.
● Archives are defined not by their
form but by their purpose.
● Archives must be tangible,
whether physical or electronic,
visual, aural or written.
● Archives must exist in some
concrete form outside our mind.
An archive is a collection of documentary
materials, created, collected, used and
kept by a person, family, organization,
government or other public or private
entity in the contact of their daily work and
life and preserved because contain
enduring value as evidence of and
information about activities and events.
28. Digital Preservation The formal action of ensuring
that digital objects remain
accessible and usable over
time.
A digital archive can be
damaged if original digital
objects are deleted or
overwritten.
30. What is an
ontology?
● A philosophical discipline
a. a branch of philosophy
that deals with the
nature and the
organisation of reality
b. Science of Being
(Aristotle,
Metaphysics, IV, 1)
31. Ontology in computer science
An ontology is an engineering artifact consisting of:
● A vocabulary used to describe (a particular view of) some domain
● An explicit specification of the intended meaning of the vocabulary
● Almost always includes how concepts should be classified
● Constraints capturing additional knowledge about the domain
● Ideally, an ontology should:
● Capture a shared understanding of a domain of interest
● Provide a formal and machine manipulable model of the
domain
32. The Continuum
Less Complexity More
Folksonomy
personalized labels
List
ambiguity control
Synonym Ring
synonym control
(equivalency)
Taxonomy
ambiguity control
synonym control
hierarchical relationships
Thesaurus
ambiguity control
synonym control
hierarchical relationships
associative relationships
scope note
Ontology
ambiguity control
synonym control
hierarchical relationships
associative relationships
classes, properties, localization,
annotation, reasoning
33. Example: the ontology
representing animals
● Vocabulary and meaning (“definitions”)
● Elephant is a concept whose members are a kind of animal
● Herbivore is a concept whose members are exactly those animals who
eat only plants or parts of plants
● Adult Elephant is a concept whose members are exactly those
elephants whose age is greater than 20 years
● Background knowledge/constraints on the domain (“general axioms”)
● Adult Elephant s weigh at least 2,000 kg
● All Elephant s are either African Elephant s or IndianElephant
● No individual can be both a Herbivore and a Carnivore
34. Why ontologies?
● To share common understanding of the
structure of information among people
or software agents
● To enable reuse of domain knowledge
● To make domain assumptions explicit
● To analyse domain knowledge
35. Building an
ontology
● Define classes
● Arrange in Taxonomic
hierarchy
● sub-class/super-class
model
● Define slots and Describe
allowed values for these
slots
● Fill values for slots for
instances
36. Rules
● There is no one correct way to model a domain.
● There are always viable alternatives
● Best solution depends on Applications and Extensions
● Iterative Process
● Concepts in Ontology close to objects (physical/logical) and relationships
in the domain of interest
● Objects are generally nouns
● Relationships are generally verbs in a sentence
37. Step 1: Domain & Scope
Question Example of Answer
What is the domain of interest ? Digital Humanities
What is the purpose of this Ontology ? Classify Digital Objects
What are the expected type of questions ? E.g. Who is the Monna Lisa’s author?
Who would maintain the Ontology ? The ontology creator
38. Step 2: Re-use
existing ontologies
● If they exist, Sure..
● Problems in merging
Ontologies ?
● Format Conflicts
● Same concept, different
representation
39. Step 3: Enumerate terms
Question Example of Answer
What are the terms?
Digital Object, Person, Place,
Provider, Event, ..
What are their properties
Digital Object: title, description,
currentLocation, author
Person: name, surname, birth date, biography
Place: name, description, geographical
coordinates
What are their relationships Digital object - author - Person
Digital object - currentLocation - Place
40. Step 4: Define
classes and
hierarchies
● Top-Down Approach
● Bottom-Up Approach
● Mixed
Person
Agent
Organization
41. Step 5: Define properties
● Intrinsic property as Person’s eye
color
● Extrinsic property as Person’s
name
● Parts, if the object is structured;
these can be physical and
abstract parts
a. Ex: members of the
organization
● Relationships between members
of a class
Person
Agent
Organization
name
description
ownerbiography
42. Step 6: Define slots
● Slot Cardinality: define how
many slot a class can have
● Slot Value Type: what are the
values that can be filled ?
● Common types:
a. String
b. Number
c. Boolean
d. Enumeration
Person
Agent
Organization
name
description
ownerbiography
1..*
1..*
1 1..*
43. Step 7: Create Instances
● Leonardo da Vinci (Person)
a. name: Leonardo da Vinci
b. description: famous artist of the
16th Century
c. biography: Leonardo da Vinci was
born bla bla…
● Monna Lisa
a. title: Monna Lisa
b. author: Leonardo da Vinci
c. current location: Louvre, Paris
d. ...
46. What is Europeana Europeana is the European
digital platform for Cultural
Heritage.
More than 3,000 institutions
across Europe have
contributed to Europeana.
47. Europeana Data Model - Motivation
● Different libraries, museums and archives use different
metadata standards.
● This data needs to appear in a meaningful way in a
crosscultural, multilingual context such as Europeana.
The Europeana Data Model (EDM) aims to bridge these
gaps in the Europeana context.
48. Challenges ● Accommodate different
data models
● Accommodate domain
specific requirements
● Avoid losing data and
keep the best granularity
● Co-exist with the original
data
49. Requirements
● Distinguish “provided objects” (painting, book, movie, etc.) from
their digital representations
● Distinguish object from its metadata record
● Allow multiple records for a same object, containing potentially
contradictory statements about it
● Support for objects that are composed of other objects
● Support for contextual resources, including concepts from
controlled vocabularies
50. EDM basis ● OAI ORE (Open Archives Initiative
Object Reuse & Exchange) for
organizing an object’s metadata
and digital representation(s)
● Dublin Core for descriptive
metadata
● SKOS (Simple Knowledge
Organization System) for
conceptual vocabulary
representation
● CIDOC-CRM for event and
relationships between objects
53. EDM basic pattern
A data provider submits to Europeana a “bundle” of an object and its
digital representation(s)
54. Provided CHO The ProvidedCHO represents the
cultural heritage object.
55. Web Resource One or more WebResources are
provided for the cultural heritage
object.
56. Aggregation The Aggregation represents the set
of related resources about one real
object contributed by one provider.
1 ProvidedCHO
1 or more Web Resources
59. The Clavius on the Web Project
The Clavius on the Web project (CoW) aims at making accessible on the Web
the Clavius Correspondence, owned by the Historical Archives of the Pontifical
Gregorian University (APUG) in Rome.
Christopher Clavius (1537-1612) was a jesuit mathematician and astronomer
and one of the most important characters in the scientific scene of the late
Sixteenth Century.
These manuscripts consist of two volumes of correspondence (266 letters)
and seven volumes of works, some of these ones printed and some not still
published.