SlideShare a Scribd company logo
1
Digital Library Content Model
Dagobert Soergel
College of Information Studies
University of Maryland
Department of Library and Information Studies
University at Buffalo
2
The Problem
Digital libraries must
1. Store a wide variety of often complex information objects
and display these objects on different platforms.
This requires modeling information objects, their internal
structure, and relationships among them.
2. Provide data that support discovery, interpretation, use,
and management of information objects.
This requires a good metadata model
3. Support annotation of information objects.
Annotations turn out to be surprisingly diverse.
An annotation my refer to only a part of an information object.
This requires an elegant model that can deal with many cases.
3
Purpose of the talk
To reexamine a number of basic notions regarding
the content of a digital library (or, more generally, any
information system) to achieve sound definitions
Developed in the framework of the
DELOS Digital Library Reference Model
a framework for describing digital libraries, their content,
users, and functions and, for each, their qualities and
associated policies
4
Premisses
• Modeling the content domain is complex and much
thinking is muddled
• Need to be able to handle both “data” and “documents”
• Any reference model
• needs to be abstract and must not commit to any
particular standard or design decision
• rather, it must provide a framework for specifying
the commitments of any particular DL
(or information system)
5
Issues
0 Scope of this talk and modeling constructs
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
6
Scope of this talk
• A reference model for a broadly conceived digital
library will be able to model most any information
system, thus will be useful very broadly.
• The focus on digital libraries is in the application,
especially the type of collection, to which the model is
applied.
7
Scope: level of abstraction
• The reference model should stay on an abstract level. It should
not require specific standards but rather
allow for plugging in any standard, such as RDA or DC.
• A DL should indicate to the users what standard it uses
for things like time, place, type of relationship, type of resource
• The reference model should
not require design choices but rather
provide a framework for specifying design choices,
such as selectivity of the collection. A DL will then indicate
whether its collection is selective or fully inclusive
8
Modeling constructs
• The reference model should be based on an
entity-relationship model (E-R model).
• Second-order logic: relationship instances are resources that
can in turn be related to anything.
Apply pragmatically for useful navigation and common-sense
inferences; stay away from types of reasoning that run into
problems with second order logic.
• Must add mechanisms for indicating the degree of precision or
the degree of certainty of statements.
9
Issues
1a Content in the overall context of
a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
10
Content in the overall context of a
DL reference model
• Resources
• Structured data
• Unstructured data, text
• Uses of data
11
Everything is a resource
W3C definition
A resource is anything that can be identified or named.
Any resource is represented by a resource identifiern
Resource includes
● external (non-digital) objects or events and
● digital object or event,
wherever that digital object or event may reside or occur.
Same as topic in topic maps
In an E-R model, entity types, entity instances (entity values),
relationship types, and relationship instances are all resources
In RDA: Resource restricted to information object.
Advantages of broader definition will become clear.
12
Structured data = statements
Resource 1 <relationship> Resource 2
SoftwareModule <createdBy> LegalEntity
SoftwareModule <annotatedBy> Information object
Event <happenedIn> (Date1, Date2)
Multi-way relationships, frames
Statements are information objects, that is, they are
resources that can in turn be related to anything
Statement also called proposition or assertions (or fact)
13
More on structured data
Data consist of statements about resources.
Such statements can be conceived as relationship instances
in which the resource in focus occupies one argument slot.
A simple statement using a binary relationship or a
multi-way relationship (a frame instance with slots filled)
(objects in an object-oriented database)
Drug treatment frame instance
Drug Taxoteer
treatsDisease Cancer, estrogen-negative
inPopulationGroup Elderly
hasSuccessRate 55%
14
More on structured data
Slot fillers are also known as data values.
A data value makes sense only when it is seen in relation to
one or more resources, for example as a slot filler in a frame.
Examples
The value 55% makes sense only in the right context, such as in
the success slot of a drug treatment frame
The value 185 cm makes sense only if we know it is the height of
a person or the length of a pair of skis.
15
There are two ways to communicate such statements.
1. Structured data:
One learns what one wants to know about the resource in
focus immediately from a relationship instance.
Hamlet <authoredBy> Shakespeare
The drug treatment frame on Taxoteer
The actual data of interest are represented in a database
16
There are two ways to communicate such statements.
2. Unstructured data:
One needs to extract what one wants to know from a text or
image that is related to the resource in focus.
Shakespeare schrieb den Hamlet im Jahre 1625
Hamlet wurde von Shakespeare verfasst
Taxoteer ist effektiv in der Behandlung von Krebsen die keine
Rezeptoren fuer Estrogen haben. In aelteren Personen liegt
die Erfolgsrate bei 50%.
The data of interest are stored in what is commonly known as
document.
17
Functions of data
Data about a resource may serve any of the following functions:
• learn about the resource and its various characteristics
• learn about the history and context of the resource
• learn how to use the resource
• manage the resource
• preserve the resource
The sections about metadata (roughly: data about an information object)
will specialize this list
18
Relationship as the
basic modeling construct
Important principle:
Many concepts in a DL reference model are best modeled
based on relationships rather than based on entities
For example, “annotation-hood”
resides not in an information object but in the relationship
InformationObjectA <annotates> InformatioObjectB
InformationObject B <annotatedBy> InformationObjectA
19
Resource type examples
• Information objects
Incl. documents, data streams, databases, queries and their
results (virtual information objects, such as database reports,
virtual collections)
• Actors that can search for, create, and manage resources
• Functions and services
• Software modules
• Policies
• Languages
• Ideas, concepts
20
Inheritance
Many reference model constructs are specified at the level of resource.
They inherit down to the different resource types, especially
information objects
For example, the following statement types are valid for Resource
Resource <identifiedBy> Identifier
Resource <characterizedBy> QualityParameter
Resource <regulatetBy> Policy
Therefore, they are also valid for InformationObject or Actor or Policy
21
Issues
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
22
Information objects 1
1. A formal relationship instance
(such a row in a table or a structured data record)
2. A document (written or spoken text, image, sound) from which a
human reader can learn about the resource in focus or about the
relationships among several resources.
Information extraction: document → formal relationship instances.
A collection of information objects is in turn an information object
• a table in a relational database = a collection of rows, each
representing a relationship instance or a collection of relationship
instances
• a collection of documents
23
Information objects 2
An information object may be a close representation of an
external object or event, for example
• An image (photograph or painting) of a building. There may be
many such images taken from different angles etc.
• A video recording of a soccer game. There may be several
such video recordings, each capturing different scenes, or
capturing the same scene from different angles, or following
different players, etc. These are different information objects
representing the same external event.
24
Real world objects, concepts, ideas
To provide full access to the information objects it contains,
a digital library must manage data about any kind of object
(real world objects, concepts, ideas) in its subject domain.
Why?
1. The DL may represent data in the form of a database
2. Users look for information objects that deal with or are
digital representations of any kind of object.
This idea underlies Topic Maps which were originally designed to
improve access to documents by relating the topics discussed in
these documents.
25
Real world objects, concepts, ideas
Examples (these are all resources)
• People (focus of biographical reference tools)
• Organizations (focus of organization directories)
• Events (focus of developing "event gazetteers")
• Places (focus of gazetteers)
• Dates
• Mathematical theorems (focus of mathematical encyclopedias)
• Concepts, ideas
• Problems and proposed solutions
• Computer programs (focus of software directories or libraries)
The reference model should have a more complete list and indicate
sources dealing with these
26
Issues
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and
relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
27
Levels, versions,
and relationships
• Work, manifestation, item (individual copy)
• Linked through relationships
28
Work
Intellectual or artistic entity, as the abstract essence or as a text,
image, or piece of music.
Range:
• A basic story or theme
• the story of Faust
• the myth of the Great Flood
• A text telling the story, such as
• Goethe's Faust
• the account of the Great Flood in the Bible (original Hebrew)
• the account of the same myth in another culture
• A specific version of the account in the Hebrew Bible
a Latin translation of the account in the Hebrew Bible
29
Manifestation
A specific rendering of a work by means of a graphical image or
sound, taken in the abstract; the idea of such a rendering.
Examples:
• The text of Goethe's Faust printed in a particular typeface and layout
A performance at which the text is recited also renders the text but is
more properly considered a separate, but related, work.
• A specific score of a given version of Schubert's Fifth.
A performance of that version of Schubert’s Fifth also renders the
piece of music but is considered a separate, but related, work.
Also the rendering of a work in the form of digital storage that can be
transformed to a graphical image or sound, again taken as the abstract
pattern of digital signals.
30
Item, individual copy
The embodiment of a manifestation in a physical object
We can perceive the content of an manifestation only through an
individual copy of it (unless we have memorized the visual expression
manifest in a manifestation and can conjure it up from memory).
There are works that have only one manifestation of which there is
only one copy.
31
Relationships among information objects
The story of Faust <dealsWith> Pact with the devil
The story of Faust <isToldIn> Marlow’s Faust
The story of Faust <isToldIn> Goethe’s Faust
Goethe’s Faust <authoredBy> Goethe, Johann Wolfgang von
Goethe’s Faust <hasManifestation> R1231
R1231 <publishedBy> Cotta
R1231 <hasDate> 1871
R1232 <isCopyOf> R1231
R1232 <ownedBy> (HRieth, 1896, 1956)
R1232 <ownedBy> (DSoergel, 1956, *)
32
Hierarchical inheritance
• Data about a work inherit to all works below it along <isToldIn>,
<hasVersion> etc. Therefore
Goethe' Faust <dealsWith> Pact with the devil
• Data about a work inherit to all its manifestations. Therefore
R1231 <authoredBy> Goethe, Johann Wolfgang von
• Data about a manifestation inherit to all its items
• Hierarchical inheritance increases efficiency
• More efficient catalog input
• More efficient catalog storage
• More efficient representation and reading of search results
More relationships
R271 The man I killed, by Michael Halliday
R519 The man I killed, play by Christopher Wern
R519 <isBasedOn> R271
R315 Handbook of commercial geography, by Robert
Chisholm
R783 Chisholm's handbook of commercial geography,
entirely rewritten by L. Dudley Stamp and S.
Carter Gilmour.
R783 <entirelyRewrittenFrom> R315
33
34
Relationship to FRBR
Notes on Terminology
• The FRBR distinction between work and expression should
be rethought. It is unclear and consequently poorly
understood, and it may not be necessary. Just have work.
The intuition FRBR tries to capture in this distinction is better
handled through relationships among works as defined here.
• Following FRBR I use the term manifestation.
Other term: edition (in the sense of German Ausgabe),
but edition also means German Auflage,
so use of the term edition can be confusing.
• It would be nice to be able to use graphic expression as a
synonym for rendering, but to avoid any further confusion with
FRBR it is best not to use the term expression at all.
35
Version control
Important, but not elaborated here
36
Issues
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects /
resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
37
Composite information objects / resources
Examples
• Book divided into chapters, sections, paragraphs, words (XML
Document Object Model, DOM or TEI)
Each part can be seen as a separate information object
• Movie with images, soundtrack, close captions, script, all
coordinated (MPEG-7)
• A medical record with patient data, test data, images, live
monitoring data streams, diagnoses, drugs prescribed, etc.
38
Composite information objects / resources
Abstractly:
Each component is a separate information object,
composition expressed through relationships
In practice:
Many document models for composite (or compound)
documents supporting presentation
DL needs to allow specification, for each document,
of the particular document model used
39
Issues
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
40
Identifying information objects
1 Initial definition upon entry into the digital library.
2 Definition on the spot
Examples
Annotate a specific segment of a text document or a region
of an image or sound document or
Anchor an annotation to a specific location in a document.
The segment or anchor is a new information object that is
included in the original information object, and this new
information object is linked with any of several annotation
relationships to a new information object created by the user.
Related to composite objects. More on this under annotation
41
Issues
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance,
context, usage
3 Annotation
42
Data about information objects
Metadata =
data about information objects
if used for discovering, interpreting, and using information objects
Relate information objects to other types of resources. Examples:
InformationObject <hasCreator> Actor
InformationObject <dealsWith> Actor
InformationObject <containsText> Text (or, more specifically Word)
Relate a word in a text to the concept that is the meaning in which the
word is used in this particular position.
InformationObjectA <hasAbstract> InformationObjectB
InformationObjectA <hasCriticalCommentary> InformationObjectC
InformationObjectD <hasSupportiveCommentary>InformationObjectC
43
More on defining metadata
The “metadata-hood” of an information object does not reside in the
information object, but in its relationship to another information
object and, more specifically, in its use
A piece of data
is used as metadata
if it is used for the purpose of discovering, interpreting, and
using information objects, which then give the ultimate data
wanted.
The same piece of data may fill the ultimate need to of the user in
one situation and be used as metadata in another situation.
44
Not metadata
• Data about resources that are not information objects are not
metadata even if they are similar in form.
• Data about information objects are not always used as metadata.
For example, using author data to count a faculty members
publications or citation data to compute impact
• Extensive discussion of what exactly is the definition of metadata is
not a good use of resources. A system should provide the data that
are useful to a user for whatever purpose; what each piece of data is
called is less important.
45
Metadata typologies
Metadata (and data in general) can be divided into categories from
several perspectives, and within each perspective there exist
several approaches.
Some examples of how to categorize metadata
• by purposes or use. Since the same unit of metadata can be
used for several purposes, the resulting categories overlap.
• by source, for example, extracted, assigned by cataloger,
assigned by user (social tagging), from usage tracking
• by intrinsic characteristics, for example data about
provenance or about the format of the information object
46
Some metadata uses
A Learn about information objects and interpret them; this includes
A1 Learn about the identity and characteristics of information objects
(descriptive metadata)
A2 Learn about the history and other features of the context of the
information object (contextual metadata)
B Learn how to use an information object, including
B1 Learn how to gain legal access (access and rights metadata)
B2 Learn how to gain technical access to the information object
(what machinery and software is needed to access the
information object for a given purpose, such as assimilation by a
person or processing by a computer program)
C Manage information objects (administrative metadata), in particular
C1 Manage the preservation of information objects
(preservation metadata).
47
Usage data
Data on usage of resources
and on usage rights, usage history, future use / preservation important
for discovering, interpreting, and using resources as well as managing
resources
Some of these data can be collected automatically
If the resource in question is an information object, this kind of data is
often used as metadata
48
Issues
1a Content in the overall context of a DL reference model
1b Modeling information objects
1c Levels, versions, and relationships
1d Composite information objects / resources
1e Resource identifiers
2 Metadata, including provenance, context, usage
3 Annotation
49
Annotation
InformationObjectA <annotatedBy> InformationObjectB
InformationObjectB may be created on the spot in order to annotate A
(InformationObjectB and the annotation relationship have the same
author) or B may preexist (the annotation relationship between A and B
is introduced by a third party)
Specific type of annotation expressed by specializing the annotatedBy
relationship, for example
InformationObjectA <criticizedBy> InformationObjectB
InformationObjectA <hasCriticalCommentary> InformationObjectC
InformationObjectD <hasSupportiveCommentary> InformationObjectC
InformationObjectE <isPartOfSpeech> PartOfSpeech
Annotation-hood is in the relationship, not in the information object
50
Annotation
Annotation-hood is in the relationship, not in the information object
There is a wide range of relationship types that are called annotations.
Linguists think of annotations differently than scholars making
comments on a text.
Rather than trying to define exactly what “annotation” means, the
reference model should include a comprehensive list of relationship
types that might be considered annotation by somebody so that
anybody can define their meaning of annotation by giving the
appropriate subset of annotation relationship types.
The same thought applies to metadata, discussed on a later slide.
51
Special resource types for annotations
Some annotations require special types of resources.
Examples
Annotate a text with part-of-speech indications
annotated resource : a one-word fragment of the text
annotating resource: a value from a list of parts of speech
Annotate a text with meaning for word sense disambiguation
annotated resource : a word or phrase in the text
annotating resource: a value from a list of meanings defined in some way
Annotation through underlining or other marks
annotated resource : a fragment of text or other information object
annotating resource: a pair (sign, meaning), e.g. (underline, important) or
(?, check this out) or (X, nonsense)
The annotated resource and the annotating resource may be very short
52
Annotation and metadata
Metadata and annotation data overlap, and different
communities and individuals have different definitions of what is
included in metadata and what is included in annotations.
The precise nature of a unit of data about an information
object is determined by the relationship type and the resource
that is linked to. The interpretation of each type of data is in the
eye of the beholder.
Need an inventory of relationship types (a type of ontology)
For example, the CIDOC Content Reference Model (CIDOC/CRM)
is an inventory of broad relationship types.
In such an inventory, one could indicate who considers a given
relationship type as usable as metadata and/or as belonging to
annotation.
53
Take-home message 1
The entity-relationship model (E-R model) provides
the unifying principle for a digital library content model
The E-R model allows representation of structured data of
any complexity on a conceptual level.
Defining relationships between information objects handles
• Modeling information objects
• Levels, versions, and relationships
• Composite information objects / resources
• Metadata
• Annotation
Many notions are captured better through
relationships than fine distinctions of entity types
54
Take-home message 2
Any reference model
• needs to be abstract and must not commit to any
particular standard or design decision
• rather, it must provide a framework for specifying
the commitments of any particular DL
(or information system)
A reference model provides
a systematic framework for description and analysis,
not a prescription
Dagobert Soergel
dsoergel at umd.edu
www.dsoergel.com
55
Omitted slides
56
57
Construction process
• Need to be sure all applicable concepts from various
sources such as the 5S model and FRBR/CRM are
included, either in the skeleton model or in a list of
values / choices, as appropriate
• There is still work to be done to pull reference model
subject matter out of the reference architecture
document, and vice versa.
58
Construction process
• We should have an online version of the reference
model document with the following properties
• Links to discussion of issues and underlying
rationale, capturing some of the discussion in the
group.
• Links from the reference model to the appropriate
section of the reference architecture
• The Wiki page may not quite do it.
59
There are two ways to communicate such statements.
1. One learns what one wants to know about the resource in
focus immediately from a relationship instance.
Hamlet <authoredBy> Shakespeare
The drug treatment frame on Taxoteer
The actual data of interest are represented in a database that
captures these statements (relationship instances), such as
a collection of Prolog statements
a relational database
an object-oriented database
2. One needs to consult an information object that is related to
the resource in focus.
Shakespeare schrieb den Hamlet im Jahre 1625
Hamlet wurde von Shakespeare verfasst
Taxoteer ist effektiv in der Behandlung von Krebsen die keine
Rezeptoren fuer Estrogen haben. In aelteren Personen liegt die
Erfolgsrate bei 50%
• The DL designer must decide how to identify
the new resource that is a part of an existing
resource and
the new text object created by the annotator
and how to store the link between these two
information objects
60
61
Identifying information objects
Architecture issues
Definition on the spot, options
(1) use completely independent identifiers and store the relationship explicitly
(2) use dependent identifiers
The part of a document can be identified by
document identifier followed by information that uniquely identifies the part.
The part relation is implied by the structure of the identifier.
The annotation information object could be identified by
the identifier of the resource being annotated followed by a short string that identifies the nth
annotation of this resource (like a footnote).
The relationship between the resource and the resource annotating it would be implied by
the identifier (however, the specific type of the annotation relationship would not be captured
this way).
The resource that annotates still can be referenced from any other context.
Implicit representation
Embedded annotations: The annotation is embedded in the document, linked to a point in a
text that is identified only by the place of the annotation. This could be converted to an
explicit representation.
62
Some metadata uses
This is a specialization of the functions of data given above
A learn about other data, that is, information objects, and understand them; this includes
A1 learn about the identity and characteristics of information objects
(descriptive metadata)
A2 learn about the history and other features of the context of the
information object (contextual metadata)
B learn how to use an information object (source of data), including
B1 learn how to gain legal access to the information object
(access and use rights metadata)
B2 learn how to gain technical access to the information object
(what machinery and software is needed to access the information object
for a given purpose, such as assimilation by a person or processing by a
computer program)
C manage information objects (administrative metadata), in particular
C1 manage the preservation of information objects (preservation metadata).
63
Metadata in the reference model
When describing a DL using the reference model,
need to be able to indicate any typology of metadata
used in the DL

More Related Content

Similar to DLF-JCDL2007ExpandedKoeln.ppt

Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
TamiratDejene1
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
Bernadette Hyland-Wood
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
Presentation_euroCRIS_ES
Presentation_euroCRIS_ESPresentation_euroCRIS_ES
Presentation_euroCRIS_ES
Ed Simons
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
Chapter-OBDD.pptx
Chapter-OBDD.pptxChapter-OBDD.pptx
Chapter-OBDD.pptx
XanGwaps
 
BAB 7 Pangkalan data new
BAB 7   Pangkalan data newBAB 7   Pangkalan data new
BAB 7 Pangkalan data new
Nur Salsabila Edu
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
Dr. Haxel Consult
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
Nikos Palavitsinis, PhD
 
UAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.pptUAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.ppt
Rajesh Kumar Das
 
An introduction to repository reference models
An introduction to repository reference modelsAn introduction to repository reference models
An introduction to repository reference models
Julie Allinson
 
Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...
Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...
Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...
Anwar Patel
 
Db lec 01
Db lec 01Db lec 01
Data science
Data scienceData science
Data science
Biniam Behailu
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Chapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptxChapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptx
Eyersu Selemon
 
Information Technology 104
Information Technology 104Information Technology 104
Information Technology 104
'Vladimir Medina
 
Dbms questions
Dbms questionsDbms questions
Dbms questions
Srikanth
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
Vikas Bhushan
 
Management of bibliographic metadata - Metadata management at the Leibniz Inf...
Management of bibliographic metadata - Metadata management at the Leibniz Inf...Management of bibliographic metadata - Metadata management at the Leibniz Inf...
Management of bibliographic metadata - Metadata management at the Leibniz Inf...
suvanni
 

Similar to DLF-JCDL2007ExpandedKoeln.ppt (20)

Chapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdfChapter – 2 Data Models.pdf
Chapter – 2 Data Models.pdf
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Presentation_euroCRIS_ES
Presentation_euroCRIS_ESPresentation_euroCRIS_ES
Presentation_euroCRIS_ES
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Chapter-OBDD.pptx
Chapter-OBDD.pptxChapter-OBDD.pptx
Chapter-OBDD.pptx
 
BAB 7 Pangkalan data new
BAB 7   Pangkalan data newBAB 7   Pangkalan data new
BAB 7 Pangkalan data new
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
 
UAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.pptUAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.ppt
 
An introduction to repository reference models
An introduction to repository reference modelsAn introduction to repository reference models
An introduction to repository reference models
 
Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...
Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...
Lecture Basic HTML tags. Beginning Web Site Design Stanford University Contin...
 
Db lec 01
Db lec 01Db lec 01
Db lec 01
 
Data science
Data scienceData science
Data science
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Chapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptxChapter 2 - EMTE.pptx
Chapter 2 - EMTE.pptx
 
Information Technology 104
Information Technology 104Information Technology 104
Information Technology 104
 
Dbms questions
Dbms questionsDbms questions
Dbms questions
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
 
Management of bibliographic metadata - Metadata management at the Leibniz Inf...
Management of bibliographic metadata - Metadata management at the Leibniz Inf...Management of bibliographic metadata - Metadata management at the Leibniz Inf...
Management of bibliographic metadata - Metadata management at the Leibniz Inf...
 

More from PoojaTripathi92

Solid, Liquid, and Gas.ppt
Solid, Liquid, and Gas.pptSolid, Liquid, and Gas.ppt
Solid, Liquid, and Gas.ppt
PoojaTripathi92
 
CATIA V5 Lectures.ppt
CATIA V5 Lectures.pptCATIA V5 Lectures.ppt
CATIA V5 Lectures.ppt
PoojaTripathi92
 
CAST_Developer.ppt
CAST_Developer.pptCAST_Developer.ppt
CAST_Developer.ppt
PoojaTripathi92
 
ch01.ppt
ch01.pptch01.ppt
ch01.ppt
PoojaTripathi92
 
8023.ppt
8023.ppt8023.ppt
8023.ppt
PoojaTripathi92
 
Resource_Sharing_(2).ppt
Resource_Sharing_(2).pptResource_Sharing_(2).ppt
Resource_Sharing_(2).ppt
PoojaTripathi92
 
Fundamentals-of-Computer.ppt
Fundamentals-of-Computer.pptFundamentals-of-Computer.ppt
Fundamentals-of-Computer.ppt
PoojaTripathi92
 
ch2.ppt
ch2.pptch2.ppt
p_ms-dos-new.ppt
p_ms-dos-new.pptp_ms-dos-new.ppt
p_ms-dos-new.ppt
PoojaTripathi92
 
basicmsdos.ppt
basicmsdos.pptbasicmsdos.ppt
basicmsdos.ppt
PoojaTripathi92
 

More from PoojaTripathi92 (10)

Solid, Liquid, and Gas.ppt
Solid, Liquid, and Gas.pptSolid, Liquid, and Gas.ppt
Solid, Liquid, and Gas.ppt
 
CATIA V5 Lectures.ppt
CATIA V5 Lectures.pptCATIA V5 Lectures.ppt
CATIA V5 Lectures.ppt
 
CAST_Developer.ppt
CAST_Developer.pptCAST_Developer.ppt
CAST_Developer.ppt
 
ch01.ppt
ch01.pptch01.ppt
ch01.ppt
 
8023.ppt
8023.ppt8023.ppt
8023.ppt
 
Resource_Sharing_(2).ppt
Resource_Sharing_(2).pptResource_Sharing_(2).ppt
Resource_Sharing_(2).ppt
 
Fundamentals-of-Computer.ppt
Fundamentals-of-Computer.pptFundamentals-of-Computer.ppt
Fundamentals-of-Computer.ppt
 
ch2.ppt
ch2.pptch2.ppt
ch2.ppt
 
p_ms-dos-new.ppt
p_ms-dos-new.pptp_ms-dos-new.ppt
p_ms-dos-new.ppt
 
basicmsdos.ppt
basicmsdos.pptbasicmsdos.ppt
basicmsdos.ppt
 

Recently uploaded

Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
Celine George
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
Celine George
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
heathfieldcps1
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 

Recently uploaded (20)

Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 

DLF-JCDL2007ExpandedKoeln.ppt

  • 1. 1 Digital Library Content Model Dagobert Soergel College of Information Studies University of Maryland Department of Library and Information Studies University at Buffalo
  • 2. 2 The Problem Digital libraries must 1. Store a wide variety of often complex information objects and display these objects on different platforms. This requires modeling information objects, their internal structure, and relationships among them. 2. Provide data that support discovery, interpretation, use, and management of information objects. This requires a good metadata model 3. Support annotation of information objects. Annotations turn out to be surprisingly diverse. An annotation my refer to only a part of an information object. This requires an elegant model that can deal with many cases.
  • 3. 3 Purpose of the talk To reexamine a number of basic notions regarding the content of a digital library (or, more generally, any information system) to achieve sound definitions Developed in the framework of the DELOS Digital Library Reference Model a framework for describing digital libraries, their content, users, and functions and, for each, their qualities and associated policies
  • 4. 4 Premisses • Modeling the content domain is complex and much thinking is muddled • Need to be able to handle both “data” and “documents” • Any reference model • needs to be abstract and must not commit to any particular standard or design decision • rather, it must provide a framework for specifying the commitments of any particular DL (or information system)
  • 5. 5 Issues 0 Scope of this talk and modeling constructs 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 6. 6 Scope of this talk • A reference model for a broadly conceived digital library will be able to model most any information system, thus will be useful very broadly. • The focus on digital libraries is in the application, especially the type of collection, to which the model is applied.
  • 7. 7 Scope: level of abstraction • The reference model should stay on an abstract level. It should not require specific standards but rather allow for plugging in any standard, such as RDA or DC. • A DL should indicate to the users what standard it uses for things like time, place, type of relationship, type of resource • The reference model should not require design choices but rather provide a framework for specifying design choices, such as selectivity of the collection. A DL will then indicate whether its collection is selective or fully inclusive
  • 8. 8 Modeling constructs • The reference model should be based on an entity-relationship model (E-R model). • Second-order logic: relationship instances are resources that can in turn be related to anything. Apply pragmatically for useful navigation and common-sense inferences; stay away from types of reasoning that run into problems with second order logic. • Must add mechanisms for indicating the degree of precision or the degree of certainty of statements.
  • 9. 9 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 10. 10 Content in the overall context of a DL reference model • Resources • Structured data • Unstructured data, text • Uses of data
  • 11. 11 Everything is a resource W3C definition A resource is anything that can be identified or named. Any resource is represented by a resource identifiern Resource includes ● external (non-digital) objects or events and ● digital object or event, wherever that digital object or event may reside or occur. Same as topic in topic maps In an E-R model, entity types, entity instances (entity values), relationship types, and relationship instances are all resources In RDA: Resource restricted to information object. Advantages of broader definition will become clear.
  • 12. 12 Structured data = statements Resource 1 <relationship> Resource 2 SoftwareModule <createdBy> LegalEntity SoftwareModule <annotatedBy> Information object Event <happenedIn> (Date1, Date2) Multi-way relationships, frames Statements are information objects, that is, they are resources that can in turn be related to anything Statement also called proposition or assertions (or fact)
  • 13. 13 More on structured data Data consist of statements about resources. Such statements can be conceived as relationship instances in which the resource in focus occupies one argument slot. A simple statement using a binary relationship or a multi-way relationship (a frame instance with slots filled) (objects in an object-oriented database) Drug treatment frame instance Drug Taxoteer treatsDisease Cancer, estrogen-negative inPopulationGroup Elderly hasSuccessRate 55%
  • 14. 14 More on structured data Slot fillers are also known as data values. A data value makes sense only when it is seen in relation to one or more resources, for example as a slot filler in a frame. Examples The value 55% makes sense only in the right context, such as in the success slot of a drug treatment frame The value 185 cm makes sense only if we know it is the height of a person or the length of a pair of skis.
  • 15. 15 There are two ways to communicate such statements. 1. Structured data: One learns what one wants to know about the resource in focus immediately from a relationship instance. Hamlet <authoredBy> Shakespeare The drug treatment frame on Taxoteer The actual data of interest are represented in a database
  • 16. 16 There are two ways to communicate such statements. 2. Unstructured data: One needs to extract what one wants to know from a text or image that is related to the resource in focus. Shakespeare schrieb den Hamlet im Jahre 1625 Hamlet wurde von Shakespeare verfasst Taxoteer ist effektiv in der Behandlung von Krebsen die keine Rezeptoren fuer Estrogen haben. In aelteren Personen liegt die Erfolgsrate bei 50%. The data of interest are stored in what is commonly known as document.
  • 17. 17 Functions of data Data about a resource may serve any of the following functions: • learn about the resource and its various characteristics • learn about the history and context of the resource • learn how to use the resource • manage the resource • preserve the resource The sections about metadata (roughly: data about an information object) will specialize this list
  • 18. 18 Relationship as the basic modeling construct Important principle: Many concepts in a DL reference model are best modeled based on relationships rather than based on entities For example, “annotation-hood” resides not in an information object but in the relationship InformationObjectA <annotates> InformatioObjectB InformationObject B <annotatedBy> InformationObjectA
  • 19. 19 Resource type examples • Information objects Incl. documents, data streams, databases, queries and their results (virtual information objects, such as database reports, virtual collections) • Actors that can search for, create, and manage resources • Functions and services • Software modules • Policies • Languages • Ideas, concepts
  • 20. 20 Inheritance Many reference model constructs are specified at the level of resource. They inherit down to the different resource types, especially information objects For example, the following statement types are valid for Resource Resource <identifiedBy> Identifier Resource <characterizedBy> QualityParameter Resource <regulatetBy> Policy Therefore, they are also valid for InformationObject or Actor or Policy
  • 21. 21 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 22. 22 Information objects 1 1. A formal relationship instance (such a row in a table or a structured data record) 2. A document (written or spoken text, image, sound) from which a human reader can learn about the resource in focus or about the relationships among several resources. Information extraction: document → formal relationship instances. A collection of information objects is in turn an information object • a table in a relational database = a collection of rows, each representing a relationship instance or a collection of relationship instances • a collection of documents
  • 23. 23 Information objects 2 An information object may be a close representation of an external object or event, for example • An image (photograph or painting) of a building. There may be many such images taken from different angles etc. • A video recording of a soccer game. There may be several such video recordings, each capturing different scenes, or capturing the same scene from different angles, or following different players, etc. These are different information objects representing the same external event.
  • 24. 24 Real world objects, concepts, ideas To provide full access to the information objects it contains, a digital library must manage data about any kind of object (real world objects, concepts, ideas) in its subject domain. Why? 1. The DL may represent data in the form of a database 2. Users look for information objects that deal with or are digital representations of any kind of object. This idea underlies Topic Maps which were originally designed to improve access to documents by relating the topics discussed in these documents.
  • 25. 25 Real world objects, concepts, ideas Examples (these are all resources) • People (focus of biographical reference tools) • Organizations (focus of organization directories) • Events (focus of developing "event gazetteers") • Places (focus of gazetteers) • Dates • Mathematical theorems (focus of mathematical encyclopedias) • Concepts, ideas • Problems and proposed solutions • Computer programs (focus of software directories or libraries) The reference model should have a more complete list and indicate sources dealing with these
  • 26. 26 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 27. 27 Levels, versions, and relationships • Work, manifestation, item (individual copy) • Linked through relationships
  • 28. 28 Work Intellectual or artistic entity, as the abstract essence or as a text, image, or piece of music. Range: • A basic story or theme • the story of Faust • the myth of the Great Flood • A text telling the story, such as • Goethe's Faust • the account of the Great Flood in the Bible (original Hebrew) • the account of the same myth in another culture • A specific version of the account in the Hebrew Bible a Latin translation of the account in the Hebrew Bible
  • 29. 29 Manifestation A specific rendering of a work by means of a graphical image or sound, taken in the abstract; the idea of such a rendering. Examples: • The text of Goethe's Faust printed in a particular typeface and layout A performance at which the text is recited also renders the text but is more properly considered a separate, but related, work. • A specific score of a given version of Schubert's Fifth. A performance of that version of Schubert’s Fifth also renders the piece of music but is considered a separate, but related, work. Also the rendering of a work in the form of digital storage that can be transformed to a graphical image or sound, again taken as the abstract pattern of digital signals.
  • 30. 30 Item, individual copy The embodiment of a manifestation in a physical object We can perceive the content of an manifestation only through an individual copy of it (unless we have memorized the visual expression manifest in a manifestation and can conjure it up from memory). There are works that have only one manifestation of which there is only one copy.
  • 31. 31 Relationships among information objects The story of Faust <dealsWith> Pact with the devil The story of Faust <isToldIn> Marlow’s Faust The story of Faust <isToldIn> Goethe’s Faust Goethe’s Faust <authoredBy> Goethe, Johann Wolfgang von Goethe’s Faust <hasManifestation> R1231 R1231 <publishedBy> Cotta R1231 <hasDate> 1871 R1232 <isCopyOf> R1231 R1232 <ownedBy> (HRieth, 1896, 1956) R1232 <ownedBy> (DSoergel, 1956, *)
  • 32. 32 Hierarchical inheritance • Data about a work inherit to all works below it along <isToldIn>, <hasVersion> etc. Therefore Goethe' Faust <dealsWith> Pact with the devil • Data about a work inherit to all its manifestations. Therefore R1231 <authoredBy> Goethe, Johann Wolfgang von • Data about a manifestation inherit to all its items • Hierarchical inheritance increases efficiency • More efficient catalog input • More efficient catalog storage • More efficient representation and reading of search results
  • 33. More relationships R271 The man I killed, by Michael Halliday R519 The man I killed, play by Christopher Wern R519 <isBasedOn> R271 R315 Handbook of commercial geography, by Robert Chisholm R783 Chisholm's handbook of commercial geography, entirely rewritten by L. Dudley Stamp and S. Carter Gilmour. R783 <entirelyRewrittenFrom> R315 33
  • 34. 34 Relationship to FRBR Notes on Terminology • The FRBR distinction between work and expression should be rethought. It is unclear and consequently poorly understood, and it may not be necessary. Just have work. The intuition FRBR tries to capture in this distinction is better handled through relationships among works as defined here. • Following FRBR I use the term manifestation. Other term: edition (in the sense of German Ausgabe), but edition also means German Auflage, so use of the term edition can be confusing. • It would be nice to be able to use graphic expression as a synonym for rendering, but to avoid any further confusion with FRBR it is best not to use the term expression at all.
  • 35. 35 Version control Important, but not elaborated here
  • 36. 36 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 37. 37 Composite information objects / resources Examples • Book divided into chapters, sections, paragraphs, words (XML Document Object Model, DOM or TEI) Each part can be seen as a separate information object • Movie with images, soundtrack, close captions, script, all coordinated (MPEG-7) • A medical record with patient data, test data, images, live monitoring data streams, diagnoses, drugs prescribed, etc.
  • 38. 38 Composite information objects / resources Abstractly: Each component is a separate information object, composition expressed through relationships In practice: Many document models for composite (or compound) documents supporting presentation DL needs to allow specification, for each document, of the particular document model used
  • 39. 39 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 40. 40 Identifying information objects 1 Initial definition upon entry into the digital library. 2 Definition on the spot Examples Annotate a specific segment of a text document or a region of an image or sound document or Anchor an annotation to a specific location in a document. The segment or anchor is a new information object that is included in the original information object, and this new information object is linked with any of several annotation relationships to a new information object created by the user. Related to composite objects. More on this under annotation
  • 41. 41 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 42. 42 Data about information objects Metadata = data about information objects if used for discovering, interpreting, and using information objects Relate information objects to other types of resources. Examples: InformationObject <hasCreator> Actor InformationObject <dealsWith> Actor InformationObject <containsText> Text (or, more specifically Word) Relate a word in a text to the concept that is the meaning in which the word is used in this particular position. InformationObjectA <hasAbstract> InformationObjectB InformationObjectA <hasCriticalCommentary> InformationObjectC InformationObjectD <hasSupportiveCommentary>InformationObjectC
  • 43. 43 More on defining metadata The “metadata-hood” of an information object does not reside in the information object, but in its relationship to another information object and, more specifically, in its use A piece of data is used as metadata if it is used for the purpose of discovering, interpreting, and using information objects, which then give the ultimate data wanted. The same piece of data may fill the ultimate need to of the user in one situation and be used as metadata in another situation.
  • 44. 44 Not metadata • Data about resources that are not information objects are not metadata even if they are similar in form. • Data about information objects are not always used as metadata. For example, using author data to count a faculty members publications or citation data to compute impact • Extensive discussion of what exactly is the definition of metadata is not a good use of resources. A system should provide the data that are useful to a user for whatever purpose; what each piece of data is called is less important.
  • 45. 45 Metadata typologies Metadata (and data in general) can be divided into categories from several perspectives, and within each perspective there exist several approaches. Some examples of how to categorize metadata • by purposes or use. Since the same unit of metadata can be used for several purposes, the resulting categories overlap. • by source, for example, extracted, assigned by cataloger, assigned by user (social tagging), from usage tracking • by intrinsic characteristics, for example data about provenance or about the format of the information object
  • 46. 46 Some metadata uses A Learn about information objects and interpret them; this includes A1 Learn about the identity and characteristics of information objects (descriptive metadata) A2 Learn about the history and other features of the context of the information object (contextual metadata) B Learn how to use an information object, including B1 Learn how to gain legal access (access and rights metadata) B2 Learn how to gain technical access to the information object (what machinery and software is needed to access the information object for a given purpose, such as assimilation by a person or processing by a computer program) C Manage information objects (administrative metadata), in particular C1 Manage the preservation of information objects (preservation metadata).
  • 47. 47 Usage data Data on usage of resources and on usage rights, usage history, future use / preservation important for discovering, interpreting, and using resources as well as managing resources Some of these data can be collected automatically If the resource in question is an information object, this kind of data is often used as metadata
  • 48. 48 Issues 1a Content in the overall context of a DL reference model 1b Modeling information objects 1c Levels, versions, and relationships 1d Composite information objects / resources 1e Resource identifiers 2 Metadata, including provenance, context, usage 3 Annotation
  • 49. 49 Annotation InformationObjectA <annotatedBy> InformationObjectB InformationObjectB may be created on the spot in order to annotate A (InformationObjectB and the annotation relationship have the same author) or B may preexist (the annotation relationship between A and B is introduced by a third party) Specific type of annotation expressed by specializing the annotatedBy relationship, for example InformationObjectA <criticizedBy> InformationObjectB InformationObjectA <hasCriticalCommentary> InformationObjectC InformationObjectD <hasSupportiveCommentary> InformationObjectC InformationObjectE <isPartOfSpeech> PartOfSpeech Annotation-hood is in the relationship, not in the information object
  • 50. 50 Annotation Annotation-hood is in the relationship, not in the information object There is a wide range of relationship types that are called annotations. Linguists think of annotations differently than scholars making comments on a text. Rather than trying to define exactly what “annotation” means, the reference model should include a comprehensive list of relationship types that might be considered annotation by somebody so that anybody can define their meaning of annotation by giving the appropriate subset of annotation relationship types. The same thought applies to metadata, discussed on a later slide.
  • 51. 51 Special resource types for annotations Some annotations require special types of resources. Examples Annotate a text with part-of-speech indications annotated resource : a one-word fragment of the text annotating resource: a value from a list of parts of speech Annotate a text with meaning for word sense disambiguation annotated resource : a word or phrase in the text annotating resource: a value from a list of meanings defined in some way Annotation through underlining or other marks annotated resource : a fragment of text or other information object annotating resource: a pair (sign, meaning), e.g. (underline, important) or (?, check this out) or (X, nonsense) The annotated resource and the annotating resource may be very short
  • 52. 52 Annotation and metadata Metadata and annotation data overlap, and different communities and individuals have different definitions of what is included in metadata and what is included in annotations. The precise nature of a unit of data about an information object is determined by the relationship type and the resource that is linked to. The interpretation of each type of data is in the eye of the beholder. Need an inventory of relationship types (a type of ontology) For example, the CIDOC Content Reference Model (CIDOC/CRM) is an inventory of broad relationship types. In such an inventory, one could indicate who considers a given relationship type as usable as metadata and/or as belonging to annotation.
  • 53. 53 Take-home message 1 The entity-relationship model (E-R model) provides the unifying principle for a digital library content model The E-R model allows representation of structured data of any complexity on a conceptual level. Defining relationships between information objects handles • Modeling information objects • Levels, versions, and relationships • Composite information objects / resources • Metadata • Annotation Many notions are captured better through relationships than fine distinctions of entity types
  • 54. 54 Take-home message 2 Any reference model • needs to be abstract and must not commit to any particular standard or design decision • rather, it must provide a framework for specifying the commitments of any particular DL (or information system) A reference model provides a systematic framework for description and analysis, not a prescription
  • 55. Dagobert Soergel dsoergel at umd.edu www.dsoergel.com 55
  • 57. 57 Construction process • Need to be sure all applicable concepts from various sources such as the 5S model and FRBR/CRM are included, either in the skeleton model or in a list of values / choices, as appropriate • There is still work to be done to pull reference model subject matter out of the reference architecture document, and vice versa.
  • 58. 58 Construction process • We should have an online version of the reference model document with the following properties • Links to discussion of issues and underlying rationale, capturing some of the discussion in the group. • Links from the reference model to the appropriate section of the reference architecture • The Wiki page may not quite do it.
  • 59. 59 There are two ways to communicate such statements. 1. One learns what one wants to know about the resource in focus immediately from a relationship instance. Hamlet <authoredBy> Shakespeare The drug treatment frame on Taxoteer The actual data of interest are represented in a database that captures these statements (relationship instances), such as a collection of Prolog statements a relational database an object-oriented database 2. One needs to consult an information object that is related to the resource in focus. Shakespeare schrieb den Hamlet im Jahre 1625 Hamlet wurde von Shakespeare verfasst Taxoteer ist effektiv in der Behandlung von Krebsen die keine Rezeptoren fuer Estrogen haben. In aelteren Personen liegt die Erfolgsrate bei 50%
  • 60. • The DL designer must decide how to identify the new resource that is a part of an existing resource and the new text object created by the annotator and how to store the link between these two information objects 60
  • 61. 61 Identifying information objects Architecture issues Definition on the spot, options (1) use completely independent identifiers and store the relationship explicitly (2) use dependent identifiers The part of a document can be identified by document identifier followed by information that uniquely identifies the part. The part relation is implied by the structure of the identifier. The annotation information object could be identified by the identifier of the resource being annotated followed by a short string that identifies the nth annotation of this resource (like a footnote). The relationship between the resource and the resource annotating it would be implied by the identifier (however, the specific type of the annotation relationship would not be captured this way). The resource that annotates still can be referenced from any other context. Implicit representation Embedded annotations: The annotation is embedded in the document, linked to a point in a text that is identified only by the place of the annotation. This could be converted to an explicit representation.
  • 62. 62 Some metadata uses This is a specialization of the functions of data given above A learn about other data, that is, information objects, and understand them; this includes A1 learn about the identity and characteristics of information objects (descriptive metadata) A2 learn about the history and other features of the context of the information object (contextual metadata) B learn how to use an information object (source of data), including B1 learn how to gain legal access to the information object (access and use rights metadata) B2 learn how to gain technical access to the information object (what machinery and software is needed to access the information object for a given purpose, such as assimilation by a person or processing by a computer program) C manage information objects (administrative metadata), in particular C1 manage the preservation of information objects (preservation metadata).
  • 63. 63 Metadata in the reference model When describing a DL using the reference model, need to be able to indicate any typology of metadata used in the DL