SlideShare a Scribd company logo
1 of 11
Download to read offline
An Annotation Framework for the Semantic Web
Steffen Staab, Alexander Maedche and Siegfried Handschuh
Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany
fsst, ama, shag@aifb.uni-karlsruhe.de
http://www.aifb.uni-karlsruhe.de/WBS
Abstract
Creating metadata by annotating documents is
one of the major techniques for putting machine
understandable data on the Web. Though there
exist many tools for annotating web pages, few
of them fully support the creation of semanti-
cally interlinked metadata, such as necessary for
a truely Semantic Web. In this paper, we present
an ontology-based annotation environment, On-
toAnnotate, which offers comprehensive support
for the creation of semantically interlinked meta-
data by human annotators.
1 Introduction
With the upswing of metadata on the Semantic
Web for means like semantic web portals (Staab
et al., 2000a), there comes the urgent need for
adding semantic metadata to existing web pages
such that they are digestible for humans and ma-
chines. Though there exists a wide range of so-
phisticated, even professional, annotation tools
(cf. Section 4 on related work), none of the ones
that we know of has yet fully exploited the new
wealth of possibilities that come with RDF (Las-
sila & Swick, 1999) and RDF-Schema (Brickley
& Guha, 1999) as metadata formats. In partic-
ular, semantic annotation has so far mostly re-
stricted to describing documents or items in doc-
uments in isolation of each other. In light of the
Semantic Web, what intelligent agents crave for
are web pages and items on web pages that are
not only described in isolation from each other,
but that are also semantically interlinked.
We have used semantically interlinked in-
formation for gathering knowledge relevant in
a particular community of users (Staab et al.,
2000a). The underlying idea was that for that
domain a group of users would provide seman-
tic metadata about the content of relevant web
pages. Thus, our Community Web Portal could
present all this knowledge, taking great advan-
tage of semantic structures: personalization by
semantic bookmarks (“Fred is interested in RDF
research”), conceptual browsing, or the deriva-
tion of implicit knowledge (e.g., if John works
in a project, which is about XML then he knows
something about XML), have been some of the
features that thrived by having semantically in-
terlinked information. Similarly, we envision
that intelligent agents may profit from semanti-
cally interlinked information on the Web in the
future.1
Building the Community Web Portal we found
that there were a number of tricky issues with
providing semantic annotation in this manner:
First of all, the semantic annotation task does
not adhere to a strict template structure, such as
Dublin Core to name one of the more sophis-
ticated ones in use. Rather it needs to follow
the structure given by schema definitions that
may vary with, e.g., domain and purpose. In
fact, our intelligent agents rely on domain on-
tologies. Semantic annotations need to be con-
gruent with ontology definitions in order to al-
low for the advantages we have indicated above.
Secondly, semantically interlinked metadata is
labor-intensive to produce and, hence, expen-
sive. Therefore duplicate annotation must be
avoided. Because semantic annotation is a con-
tinuous process in a distributed setting there are
several sources for duplication. There is knowl-
edge generated by other annotators. In order to
allow for the reuse of their annotations it is im-
portant that one does not start from scratch when
1
Similar projects like WebKB (Martin & Eklund, 1999),
SHOE (Heflin & Hendler, 2000), and, more recently,
DAML (http://www.daml.org) point in the same di-
rection.
annotating sources, but that one builds on oth-
ers efforts (in particular their creation of IDs).
Then, there is a multitude of schema descrip-
tions (ontologies) that also change over time to
reflect changes in the world. Because manual
re-annotation of old web pages seems practically
infeasible, one needs an annotation framework
that allows to handle ontology creation, map-
pings and versioning. Thirdly, purely manual an-
notation is very expensive. Therefore, only very
valuable information will be annotated and it is
necessary to help the human annotator with his
task. What is needed is support for automatic —
or at the least, semi-automatic — semantic an-
notation of web pages. Finally, there is a lack of
experience in creating semantically interlinked
metadata for web pages. It is not clear how hu-
man annotators perform overall and, hence, it is
unclear what can be assumed as a baseline for
the machine agent. Though there are correspond-
ing investigations for only indexing documents,
e.g. in library science (Leonard, 1977), a corre-
sponding richer assignment of interlinked meta-
data that takes advantage of the object structures
of RDF is lacking.
In this paper, we deal with the first three of the
above mentioned issues. Regarding the fourth
problem on evaluating semantic annotation we
refer the interested reader to a companion paper
(Staab et al., 2000b).
We first (Section 2) present our basic tool for
ontology-based semantic annotation and, then,
consider the issue of semantic annotation as an
ongoing process. In particular, interlinkage be-
tween objects and evolving metadata schema
need to be managed to avoid redundant annota-
tions and re-annotating, respectively. Section 3
describes the different layers for semi-automatic
semantic annotation and introduces the tech-
niques that we use in our tool. Before con-
cluding, we discuss related work in the areas of
evolving schemata and crawling, semi-automatic
semantic annotation — prerequisite experiences
and techniques for useful, semantically inter-
linked metadata on the Semantic Web.
2 Ontology-based Semantic Annotation
An ontology is commonly defined as an explicit,
formal specification of a shared conceptualiza-
tion of an domain of interest. This means that
an ontology describes some application-relevant
part of the world in a machine-understandable
way. The concepts and concept definitions that
are part of the ontology have been agreed upon
by a community of people who have an interest
in the corresponding ontology. The core “ingre-
dients” of an ontology are its set of concepts, its
set of properties, and the relationships between
the elements of these two sets.
Ontological structures may give additional
value to semantic annotations. They allow for
additional possibilities on the resulting semantic
annotations, such as inferencing or conceptual
navigation that we have mentioned before. But
also the reference to a commonly agreed set of
concepts by itself constitutes an additional value
through its normative function. Furthermore, an
ontology directs the attention of the annotator to
a predefined choice of semantic structures and,
hence, gives some guidance about what and how
items residing in the documents may be anno-
tated.
Besides of these advantages that ontology-
based semantic annotation yields in compari-
son to “free text metadata generation”, the ex-
tended set of capabilities also entails some new
problems that need to be solved. In partic-
ular, semantic interlinkage between document
items incurs the difficulty to adequately manage
these interlinkages. Essentially, this means that
an ontology-based annotation tool must address
the issue of object identity and its management
across many documents. Also, ontologies may
have elaborate definitions of concepts. When
their meaning changes, when old concepts need
to be erased, or when new concepts come up,
the ontology changes. Because updating previ-
ous annotations is generally too expensive, one
must deal with change management of ontolo-
gies in relation to their corresponding annota-
tions. Finally, one must prevent redundant an-
notation which stem from duplicate pages on the
web or annotation work done by fellow annota-
tors. Hence, we provide two basic mechanisms
for recognizing document identity. In the re-
mainder of this section we embed these require-
ments into a coherent framework.
2.1 OntoAnnotate — The Core Tool
While most annotation tools implicitly subscribe
to a particular ontology (e.g., Dublin Core), our
tool, OntoAnnotate, makes the relationship be-
tween particular ontologies and their parts, i.e.
concepts and properties, explicit. OntoAnnotate,
presents to the user an interface that dynami-
cally adapts to the given ontology. It has been
developed based on our earlier experiences with
manual ontology-based semantic annotation that
have been described in (Erdmann et al., 2000).
As principal language for semantic annota-
tions and ontologies, OntoAnnotate relies on
RDF and RDF Schema. RDF Schema can be
seen as a language for lightweight ontology de-
scriptions, allowing to define the interlinkage
between different concepts (called “classes” in
RDFS), properties, and objects (i.e “class mem-
bers”, also called “instances”). To name but a
few other possible formats, WebKB uses Con-
ceptual Graphs (Martin & Eklund, 1999), SHOE
employs horn logic rules (Heflin & Hendler,
2000), and we have formerly exploited F-Logic
(Staab et al., 2000a). RDF and RDF Schema,
however, provide completely web compatible
common denominator that everyone agrees on
now. Therefore we have replaced proprietory
formats we have used originally.
OntoAnnotate allows for the easy annotation
of HTML documents. One may create objects
with URIs and relate them to text passages,
which are then highlighted. The semantic mean-
ing of the objects and the text passages is given
by four semantic categories:
1. Object identification: New objects are cre-
ated by asserting the existence of an object
with a unique identifier. The annotation tool
supports the creation of object identifiers
from text passages.
This is a mostly syntactic operation, the
only semantic consequence is that the set of
existing objects is augmented by one.
2. Object–class relationships: Each object is
assigned to a class of objects by the human
annotator. In general, objects may be as-
serted to belong to multiple classes. To keep
the user interface and the evaluation sim-
pler, OntoAnnotate only allows single clas-
sification.
3. Object–attribute relationships: Each ob-
ject may be related to attribute values by
an attribute. Each attribute value is either
a text passage chosen by highlighting or
a string typed in by the annotator. For a
given object the annotator can only create
object–attribute relationships if the object’s
class definition allows its creation, i.e. if the
class definition includes a corresponding at-
tribute.
An attribute is a property the domain of
which is a literal.
4. Object–object relationships: Each object
may be related to all existing objects (in-
cluding itself) via an (object) relation. For a
given object the annotator can only create
object–object relationships if the object’s
class definition allows its creation, i.e. if
the class definition includes a correspond-
ing (object) relation.
An relation is a property the domain of
which is a resource.
Figure 1 shows the screen for navigating the
ontology and creating annotations in OntoAnno-
tate. The left pane displays the document and
the right panes show the ontological structures
contained in the ontology, namely classes, at-
tributes and relations. In addition, the right pane
shows the current semantic annotation knowl-
edge base, i.e. existing objects, their classifi-
cation, object–attribute relationships and object–
object relationships created during the semantic
annotation.
To illustrate the annotation process with On-
toAnnotate, we sketch a small annotation sce-
nario using our tool: Annotation typically starts
with identifying a new object. The user pro-
vides a new object identifier and selects the ap-
propriate class of this object from a tree view.
In our example, the object identifier RStuder
is typed in and the class FULLPROFESSOR is
selected from the ontology. Upon categoriza-
tion of a new object into a class C, OntoAn-
notate shows the possible attributes of C (cf.
the attributes ADDRESS, NAME, PHONE, etc. of
FULLPROFESSOR in the right upper pane of
Figure 1: Screenshot of the OntoAnnotate GUI.
Figure 1) and the actual attributes of the cho-
sen object (cf. Karlsruhe, Rudi Studer,
etc. in right upper pane of Figure 1). In
addition, one may look at the object rela-
tions of C (cf. affiliation, cooperate-
With, etc. in the right lower pane of Fig-
ure 1) and the actual relations of the chosen ob-
ject. In order to dynamically display the prop-
erties of classes and their instances, OntoAn-
notate queries the annotation inference server.
The annotator continues with marking text pas-
sages and drags them into empty fields of the
attribute table, thereby creating new attributes
relationships between the currently chosen ob-
ject and the currently marked text passage (e.g.,
between RStuder and studer@aifb.uni-
karlsruhe.de in Figure 1). The annotator
may create metadata describing new object–
object relationships by choosing an object re-
lation and, then, either creating a new object
on the fly or by choosing one of the objects,
pre-selected by OntoAnnotate according to the
range restriction of the chosen relation. For in-
stance, the AFFILIATION of a PERSON must be an
ORGANIZATION. Therefore, only organizations
are offered as potential fillers for the affiliation
relation of RStuder.
2.2 Object Identity
The first version of OntoAnnotate already relied
on ontology structures to guide annotation, but
it did not consider annotation as being a pro-
cess carried through in a complex environment.
The general problem stems from the fact that
without corresponding tool support, annotators
would too often create new objects rather than re-
use existing ones. Therefore new properties were
not attached to existing objects, but to new enti-
ties. In case studies, like the Community Web
Portal (Staab et al., 2000a) annotators came up
with many different object identifiers for single
persons, which made it impossible to combine
all the data about these persons.
Considering semantic annotation as a contin-
uous process, we came up with two new require-
ments:
1. The annotation inferencing server needs to
maintain object identifiers during the anno-
tation process.
2. A crawler needs to gather relevant object
identifiers for the start of the annotation.
The first requirement is solved by the annota-
tion inference server, by adding objects to and
querying objects from the server during actual
annotation as described in the previous subsec-
tion.
The second requirement has been solved by al-
lowing the annotator to start a focused crawl of
RDF facts — covering the document and annota-
tion server, but also relevant parts of the Web —
which provides the annotation inference server
with an initial set of object identifiers, categories,
attributes and relations. Thus, the metadata pro-
vided by other annotaters may be used as the
starting point that one may contribute additional
data to.
Currently, RDF data is comparatively weakly
interlinked. Hence, it is sufficient to restrict the
focus of the crawl by web server restrictions and
depth of the crawl. With more metadata on the
Web, one needs to employ more sophisticated
techniques in the future.
2.3 Ontology Changes
There exists a tight interlinkage between evolv-
ing ontologies and the semantic document anno-
tation. In any realistic application scenario, in-
coming information that is to be annotated does
not only require some more annotating, but also
continuous adaptation to new semantic terminol-
ogy and relationships.
Heflin and Hendler (Heflin & Hendler, 2000)
have elaborated in great detail on how ontology
revisioning may influence semantic annotations.
Therefore, we here only sketch one example re-
vision and its effects:
When an existing class definition is refined,
the maintainer of the semantic annotations may
explore the objects that belong to this class. He
may decide individually or for all objects
 that the objects stay in the class and, hence,
the semantic meaning of the annotations
is extended by additional semantic con-
straints;
 that the objects are categorized to belong
only to the superclasses of the re-defined
class and, hence the semantic meaning of
the annotations is reduced by cutting away
semantic constraints;
 that the objects are moved to another class.
Along similar lines, other cases of ontology revi-
sions are treated.
The annotation maintainer may explore all the
possibilities in the ontology engineering tool,
OntoEdit (Staab  Maedche, 2000) and may
define mapping rules to bridge between differ-
ent ontology revisions. Later on, querying may
take advantage of these mappings to also retrieve
“old” annotations.
2.4 Document Identity
In order to avoid duplicate annotation, existing
semantic annotations of documents should be
recognized. Because interesting semantic an-
notations will eventually refer to external web
pages that change, the annotator needs some
hints when he encounters a document that has
been annotated before, but that may have slightly
changed since. Finally, the annotator also needs
to recognize that this may be a duplication of
another document seen before (e.g. on a mirror
site).
For these recognition tasks we provide the fol-
lowing mechanisms: In our local setting we have
a document management system where anno-
tated documents and their metadata are stored.
OntoAnnotate uses the URI to detect the re-
encounter of previously annotated documents
and highlights annotations in the old document
for the user. Then the user may decide to ignore
or even delete the old annotations and create new
metadata, he may augment existing data, or he
may just be satisfied with what has been anno-
tated before.
In order to recognize that a document has been
annotated before, but now appears under a differ-
ent URI, OntoAnnotate searches in the document
management system computing similarity with
existing documents by document vector models.
If there appear documents the similarity which
to the currently viewed document is near 1, then
these are indicated to the annotator such that he
may check for congruency.
These two techniques for recognizing docu-
ment identity are very basic, but effective for
maintaining document identity in OntoAnnotate,
given a dynamic environment such as the Web.
2.5 OntoAnnotate — The Semantic
Annotation Environment
The overall annotation environment as outlined
in this section is depicted in Figure 2: The core
OntoAnnotate is used for viewing web pages and
actually providing annotations. It also stores an-
notated documents in the document management
system and adds new metadata to the annota-
tion inference server. The latter is also queried
for providing conceptual restrictions given by the
ontology. Thus, the annotator’s view is restricted
to conceptual structures that are congruent with
the given ontology.
The annotation process is started either with an
annotation inference server without objects, or
the server process is fed with metadata crawled
from the Web and the document server. The an-
notation inference server supports multiple on-
tologies. Annotations refer to the classes and
properties that were used for their creation by
namespaces. F-Logic rules are finally used to
map between different namespaces, thus allow-
ing to keep track of semantic annnotations (at
least to some degree) even when the currently
used ontology is replaced by an update.
The user additionally has the possibility to
use semi-automatic means for recognizing class
instances and properties between them. In
the subsequent section we will further describe
the text analysis component that supports semi-
automatic semantic text annotation.
3 Semi-Automatic Annotation
Based on our experiences and the existing anno-
tation tool for supporting ontology-based seman-
tic annotation of texts, we now approach semi-
automatic annotation of documents. In gen-
eral, one may distinguish between different kinds
of semi-automatic annotation mechanisms, that
have already researched in existing work:
 Wrapper Generation: Especially in the
case of annotating web pages that mainly
consist of HTML tables, one may annotate
the first row of the table and automatically
enumerate over the residual rows of the ta-
ble.
 Pattern Matching: Regularity of word
expressions may be captured by regular
expression based patterns. For example
given the pattern fwordg*fGmbHg yields
for the german language to generic pattern
for company names, and, thus, successfully
recognize instances of the class COMPANY
of the ontology. Patterns are stored with the
concepts of the domain ontology.
 Information Extraction: The most com-
plex mechanism for semi-automatic anno-
tation is full fledged ontology-based infor-
mation extraction based on a shallow text
processing strategy.
Depending in the structure given in the docu-
ments one may apply one of the methods listed
above. We here only shortly describe the mecha-
nisms that we currently use in our tool OntoAn-
notate. In real-world documents typically all
three methods (and more) will have to be applied
in combination. In our future work we will ana-
lyze the structures contained in the documents to
derive a suitable processing strategy for the doc-
uments and document parts.
Wrapper Generation. Recently, several ap-
proaches have been proposed for wrapping semi-
structured documents, such as HTML docu-
ments. Wrapper factories (cf. Sahuguet et al.
(Sahuguet  Azavant, 2000)) and wrapper in-
duction (cf. Kushmerick (Kushmerick, 2000))
have considerably facilitated the task of wrap-
per construction. In order to wrap directly into
OntoAnnotate we have developed our own wrap-
per approach that directly alignes regularities
in semi-structured documents with their corre-
sponding ontological meaning.
Annotation
Inference
Server
Distributed RDF stored
document annotations
(according to the domain
ontology)
Document Server
with RDF statements
crawl
Access 
annotate
web pages
versioned
domain
ontology
validate
local copy
WWW
Onto
Annotate
GUI
Text
Analysis
Component
Figure 2: OntoAnnotate — The Semantic Annotation Environment.
Pattern Matching. We use a very simple
mechanism for recognizing patterns in HTML
documents. We use OroMatcher2 based on Perl
5.003 regular expressions. Patterns are devel-
oped and tested in our regular expression work-
bench.
Information Extraction. At the highest level
of processing we conceive an information
extraction-based approach for semi-automatic
annotation, which has been implemented on top
of SMES (Saarbrücken Message Extraction Sys-
tem), a shallow text processor for German (cf.
(Neumann et al., 1997)). This is a generic com-
ponent that adheres to several principles that are
crucial for our objectives. (i), it is fast and robust,
(ii), it realizes a mapping from terms to ontolog-
ical concepts, (iii) it yields dependency relations
between terms, and, (iv), it is easily adaptable to
new domains.3
2
OroMatcher 1.1 is freely available at
http://www.savarese.org/oro/software/
OROMatcher1.1.html
3
The interlinkage between the information extraction
system SMES and domain ontologies is described in fur-
ther detail in (Staab et al., 1999).
We here give a short survey on SMES in or-
der to provide the reader with a comprehensive
picture of what underlies our system. The ar-
chitecture of SMES comprises a tokenizer based
on regular expressions, a lexical analysis compo-
nent including a word and a domain lexicon, and
a chunk parser. The tokenizer scans the text in
order to identify boundaries of words and com-
plex expressions like “$20.00” or “Mecklenburg-
Vorpommern”4, and to expand abbreviations.
The lexicon contains more than 120,000 stem
entries and more than 12,000 subcategorization
frames describing information used for lexical
analysis and chunk parsing. Furthermore, the
domain-specific part of the lexicon associates
word stems with concepts that are available in
the concept taxonomy. Lexical Analysis uses the
lexicon to perform, (1), morphological analysis,
i.e., the identification of the canonical common
stem of a set of related word forms and the anal-
ysis of compounds, (2), recognition of name en-
tities, (3), retrieval of domain-specific informa-
tion, and, (4), part-of-speech tagging. While the
4
Mecklenburg-Vorpommern is a region in the north east
of Germany.
steps (1),(2) and (4) can be a viewed as standard
for information extraction approaches (cf. (Ap-
pelt et al., 1993; Neumann et al., 1997)), the step
(3) is of specific interest for our annotation task.
This step associates single words or complex ex-
pressions with a concept from the ontology if a
corresponding entry in the domain-specific part
of the lexicon exists. E.g., the expression “Hotel
Schwarzer Adler” is associated with the concept
HOTEL.
SMES includes a chunk parser based on
weighted finite state transducers to efficiently
process phrasal and sentential patterns. The
parser works on the phrasal level, before it
analyzes the overall sentence. Grammatical
functions (such as subject, direct-object) are
determined for each dependency-based senten-
tial structure on the basis of subcategorizations
frames in the lexicon. Our primary output de-
rived from SMES consists of dependency rela-
tions (Hudson, 1990) found through lexical anal-
ysis (compound processing) and through parsing
at the phrase and sentential level. Thereby, the
grammatical dependency relation need not even
hold directly between two conceptually mean-
ingful entities. For instance, in the sentence
”The Hotel Schwarzer Adler in Rostock
celebrates Christmas.“, “Hotel Schwarzer Adler”
and “Rostock”, the concepts of which appear in
the ontology as HOTEL and CITY, respectively,
are not directly connected by a dependency re-
lation. However, the preposition “in” acts as a
mediator that incurs the conceptual pairing of
HOTEL with CITY.
4 Related Work
This paper is motivated by the urgent need for
adding metadata to existing web pages in an ef-
ficient and flexible manner that takes advantage
of the rich possibilities offered by RDF (Lassila
 Swick, 1999) and RDF-Schema (Brickley 
Guha, 1999). Tools and practices so far have not
reflected the new possibilities.
There are only a few tools that support adding
metadata to existing web pages. We will present
related work in this area and show how our ap-
proach and our implemented tool described in
section 2 compares to the existing work. Ad-
ditionally, our paper introduces semantic anno-
tation as a continuous process. We therefore
shortly review existing work in this area.
Related Work on Annotation Tools.
Koivunen et. al. (Koivunen et al., 2000)
introduce a framework for categorizing annota-
tion tools distinguishing between a proxy–based
and a browser–based approach. The proxy-based
approach stores and merges the annotation and
therefore preprocesses the annotated documents
to be viewable for a standard web-browser.
Within the browser–based approach the browser
is modified to merge the document with the
annotation data just prior to presenting the
content to the user.
Many of the annotation tools rely on special-
ized browsers to offer a better user interface. One
of them is Amaya. Amaya (Guetari et al., 1998;
Vatton, 2000) is a web-browser that acts both
as an editor and as a browser. It has been de-
signed at W3C with the primary purpose of be-
ing a testbed for experimenting and demonstrat-
ing new languages, protocols and formats for
the Web. It includes a WYSIWYG editor for
HTML and XML. It can publish documents re-
motely, through the HTTP protocol. It han-
dles Cascading Style Sheets (CSS) and the new
MathML language, for representing mathemat-
ical expressions. An experiment for including
vector graphics into Web documents is also de-
scribed. Amaya is the primary browser /editor
for the annotation approach in (Koivunen et al.,
2000). The annotation data itself is exchanged in
RDF/XML form to provide other clients access
to the annotation database. Currently, however,
it does not provide comprehensive support with
annotation inference server and crawling.
ComMentor (Roescheisen et al., 1994) is an-
other browser-based tool as part of the Stanford
Integrated Digital Library Project. It manages
the meta-information independently of the docu-
ments on separate meta-information servers. The
research prototype implementation was com-
pleted in 1994, the code of the tool is no longer
maintained.
ThirdVoice5 is a commercial product that uses
plug-ins to enhance web browsers. This en-
hancement allows the access to the annotation
stored at the ThirdVoice database located on a
centralized server from the company. The an-
5
http://www.thirdvoice.com
notated text parts will appear in the browser as
underlined links. These links point to the infor-
mation on the database that will be presented on
the user request in a separate viewer. Most of
the annotation stored there seems to be links to
further information, so that ThirdVoice is mainly
used as a kind of an extended link-list. Along the
same lines, JotBot (Vasudevan  Palmer, 1999)
follows a browser–based approach that uses Java
applets to modify the browsers behavior.
Yawas (Denoue  Vignollet, 2000) is an anno-
tation tool that is based on the Document Object
Model (DOM) and Dynamic HTML. It codes the
annotations into an extended URL format and
uses local files similar to bookmark files to store
and retrieve the annotations. A modified browser
can then transform the URL format into DOM
events. Locally stored annotation files can be
sent to other users.
The CritLink (Yee, 1998) annotation tool fol-
lows the proxy approach. This approach has
the advantage that it works with any exist-
ing browser. The system is simply used by
prefixing the URL with http://crit.org e.g. to
see the annotated version of semanticweb.org
someone can access the system with the URL
http://crit.org/http://semanticweb.org.
The approach closest to OntoAnnotate is the
SHOE Knowledge Annotator. The Knowledge
Annotator is a Java program that allows users
to mark-up web pages with the SHOE ontology.
The SHOE system (Luke et al., 1997) defines ad-
ditional tags that can be embedded in the body of
HTML pages. In SHOE there is no direct rela-
tionship between the new tags and the original
text of the page, i.e. SHOE tags are not annota-
tions in a strict sense.
According to the above mentioned classifi-
cation OntoAnnotate follows the browser-based
approach with the exception that it is not devel-
oped as an web-browser extension. OntoAnno-
tate can be regarded as a workbench for semantic
annotation of documents using domain-specific
ontologies and this enriching HTML pages with
semantics that an software agent is capable to au-
tomatically process the content of the page and
reason about it.
Related Work on Semantic Annotation as a
Continuous Process. There is only little re-
search that considers the maintenance of ontolo-
gies or more general the maintenance of knowl-
edge bases. In (Menzies, 1998) an overview over
knowledge maintenance is given. Menzies re-
views systems that contribute to different types
of knowledge maintenance. The paper analyzes
the AI and software engineering literature ac-
cording to 35 different knowledge maintenance
tasks. It concludes that there is no overall strat-
egy that covers all 35 tasks.
The phenomenon of dynamic ontologies has
nicely been described in (Heflin  Hendler,
2000). In their work they discuss the prob-
lems associated with managing ontologies in dis-
tributed environments such as the web. The
underlying representation language is SHOE, a
web-based representation language that supports
multiple versions of ontologies. Foo (Foo, 1995)
has published some initial, theoretical thoughts
on ontology revision. Foo outlines the main
ideas on the topic of ontology revision and con-
stitutes ontology change as a frontier of knowl-
edge systems research.
Related Work on Semi-Automatic Annota-
tion. Pustejovsky et al. (Pustejovsky et al.,
1997) describe their approach for semantic in-
dexing and typed hyperlinking. As in our ap-
proach finite state technologies support lexical
acquisition as well as semantic tagging. The
goal of the overall process is the generation of
so called lexical webs that can be utilized to en-
able automatic and semi-automatic construction
of web-based texts.
In (Bod et al., 1997) approaches for learn-
ing syntactic structures from syntactically tagged
corpus has been transferred to the semantic
level. In order to tag a text corpus with type-
logical formulae, they created tool environment
called SEMTAGS for semi-automatically enrich-
ing trees with semantic annotations. SEMTAGS
incrementally creates a first order markov model
based on existing annotations and proposes a se-
mantic annotation of new syntactic trees. The
authors report promising results: After the first
100 sentences of the corpus had been annotated,
SEMTAGS already produced the correct annota-
tions for 80% of the nodes for the immediately
subsequent sentences.
5 Conclusion
This paper presents an approach for creating
meta data by (semi-automatic) annotating web
pages. Starting from our ontology-based anno-
tation environment OntoAnnotate, we have col-
lected experiences in an actual evaluation study.
Future work will have to start on current stud-
ies that have looked at the feasibility of auto-
matic building of knowledge bases from the web
(cf. (Craven et al., pear)). In our future work,
we want to integrate such methods into an even
more comprehensive annotation environment —
including e.g. the learning of ontologies from
web documents (Maedche  Staab, 2000) and
(semi-)automatic ontology-based semantic an-
notation. The general task of knowledge mainte-
nance, including evolving ontologies and seman-
tic annotation knowledge bases, remains a topic
for much further research in the near future.
Acknowledgements. We thank Stefan Decker
for initiating the first version of an annotation
tool. We thank our students Mika Maier-Collin
and Jochen Klotzbuecher for building the Anno-
tation Tool; Dirk Wenke for the Ontology En-
gineering Environment OntoEdit; and our col-
league Kalvis Apsitis for the RDF Crawler. We
gratefully acknowledge the dedication of our stu-
dents who annotated web pages for our exper-
iments. This work was partially supported by
DARPA in the project DAML/OntoAgents and
by Ontoprise GmbH.
References
Appelt, D., Hobbs, J., Bear, J., Israel, D.,  Tyson,
M. (1993). FASTUS: A finite state processor
for information extraction from real world text.
In Proceedings of IJCAI-93, Chambery, France.
Bod, R., Bonnema, R.,  Scha, R. (1997). Data-
oriented semantic interpretation. In In Pro-
ceedings of the Second International Workshop
on Computational Semantics (IWCS), Tilburg,
1997.
Brickley, D.  Guha, R. (1999). Resource description
framework (RDF) schema specification. Tech-
nical report, W3C. W3C Proposed Recommen-
dation. http://www.w3.org/TR/PR-rdf-schema/.
Craven, M., DiPasquo, D., Freitag, D., McCallum,
A., Mitchell, T., Nigam, K.,  Slattery, S. (to
appear). Learning to construct knowledge bases
from the world wide web. Artificial Intelligence.
Denoue, L.  Vignollet, L. (2000). An annota-
tion tool for web browsers and its applications
to information retrieval. In In Proceedings of
RIAO2000, Paris.
Erdmann, M., Maedche, A., Schnurr, H.-P.,  Staab,
S. (2000). From manual to semi-automatic se-
mantic annotation: About ontology-based text
annotation tools. In P. Buitelaar  K. Hasida
(eds). Proceedings of the COLING 2000 Work-
shop on Semantic Annotation and Intelligent
Content, Luxembourg.
Foo, N. (1995). Ontology Revision. In Proceedings of
the 3rd International Conference on Conceptual
Structures. Springer Lecture Notes in Artificial
Intelligence. Springer.
Guetari, R., Quint, V.,  Vatton, I. (1998). Amaya:
an Authoring Tool for the Web. In Maghrebian
Conference on Software Engineering and Artifi-
cial Intelligence. Tunis, Tunsia.
Heflin, J.  Hendler, J. (2000). Dynamic Ontolo-
gies on the Web. In Proceedings of Ameri-
can Association for Artificial Intelligence Con-
ference (AAAI-2000). Menlo Park, California,
AAAI Press.
Hudson, R. (1990). English Word Grammar. Basil
Blackwell, Oxford.
Koivunen, M.-R., Brickley, D., Kahan, J., Hom-
meaux, E. P.,  Swick, R. R. (2000). The W3C
Collaborative Web Annotation Project ... or how
to have fun while building an RDF infrastruc-
ture.
Kushmerick, N. (2000). Wrapper Induction: Effi-
ciency and Expressiveness. Artificial Intelli-
gence, 118(1).
Lassila, O.  Swick, R. (1999). Resource description
framework (RDF) model and syntax specifica-
tion. Technical report, W3C. W3C Recommen-
dation. http://www.w3.org/TR/REC-rdf-syntax.
Leonard, L. (1977). Inter-Indexer Consistence Stud-
ies, 1954-1975: A Review of the Literature
and Summary of the Study Results. Graduate
School of Library Science, University of Illi-
nois. Occasional Papers No.131.
Luke, S., Spector, L., Rager, D.,  Hendler, J. (1997).
Ontology-based Web agents. In Johnson, W. L.
(Ed.), Proceedings of the 1st International Con-
ference on Autonomous Agents, pages 59–66.
ACM.
Maedche, A.  Staab, S. (2000). Discovering con-
ceptual relations from text. In ECAI-2000 -
European Conference on Artificial Intelligence.
Proceedings of the 13th European Conference
on Artificial Intelligence. IOS Press, Amster-
dam.
Martin, P.  Eklund, P. (1999). Embedding Knowl-
edge in Web Documents. In Proceedings of
the 8th Int. World Wide Web Conf. (WWW‘8),
Toronto, May 1999, pages 1403–1419. Elsevier
Science B.V.
Menzies, T. (1998). Knowledge maintenance: The
state of the art. The Knowledge Engineering Re-
view, 10(2).
Neumann, G., Backofen, R., Baur, J., Becker, M.,
 Braun, C. (1997). An information extraction
core system for real world german text process-
ing. In In Proceedings of ANLP-97, pages 208–
215, Washington, USA.
Pustejovsky, J., Boguraev, B., Verhagen, M., Buite-
laar, P.,  Johnston, M. (1997). Semantic in-
dexing and typed hyperlinking. In Proceedings
of AAAI Spring Symposium, NLP for WWW.
Roescheisen, M., Mogensen, C.,  Winograd, T.
(1994). Shared Web Annotations as a Plat-
form for Third-Party Value-Added Information
Providers: Architecture, Protocols, and Usage
Examples. Technical report stan-cs-tr-97-1582,
Computer Science Dept., Stanford University.
Sahuguet, A.  Azavant, F. (2000). Building In-
telligent Web Applications Using Lightweight
Wrappers. to appear in: Data and Knowledge
Engineering.
Staab, S., Angele, J., Decker, S., Erdmann, M.,
Hotho, A., Maedche, A., Studer, R.,  Sure, Y.
(2000a). Semantic Community Web Portals. In
Proceedings of the 9th World Wide Web Confer-
ence (WWW-9), Amsterdam, Netherlands.
Staab, S., Braun, C., Düsterhöft, A., Heuer, A., Klet-
tke, M., Melzig, S., Neumann, G., Prager, B.,
Pretzel, J., Schnurr, H.-P., Studer, R., Uszkoreit,
H.,  Wrenger, B. (1999). GETESS — search-
ing the web exploiting german texts. In Pro-
ceedings of the 3rd Workshop on Cooperative
Information Agents, LNCS, Berlin. Springer.
Staab, S.  Maedche, A. (2000). Ontology engineer-
ing beyond the modeling of concepts and rela-
tions. In Benjamins, V., Gomez-Perez, A., 
Guarino, N. (Eds.), Proceedings of the ECAI-
2000 Workshop on Ontologies and Problem-
Solving Methods. Berlin, August 21-22, 2000.
Staab, S., Maedche, A.,  Handschuh, S. (2000b).
Creating Metadata for the Semantic Web - A
Annotation Environment and its Evaluation.
Technical Report 412, Institute AIFB, Karl-
sruhe University.
Vasudevan, V.  Palmer, M. (1999). On Web Annota-
tions: Promises and Pittfalls of Current Web In-
frastucture. In Proceedings of HICSS’99, pages
5–8, Maui, Hawaii.
Vatton, I. (2000). W3C’s Amaya 4.0 Edi-
tor/Browser. Technical report, W3C.
http://w3c1.inria.fr/Amaya/.
Yee, K.-P. (1998). CritLink: Better Hyperlinks for
the WWW. http://crit.org/ ping/ht98.html.

More Related Content

Similar to An Annotation Framework For The Semantic Web

Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
Web of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesWeb of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesSabin Buraga
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalgowthamnaidu0986
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep WebSamiul Hoque
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...dannyijwest
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospectsGuus Schreiber
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technologyStanley Wang
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Databaseijbuiiir1
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
A Metamodel For Web Page Design
A Metamodel For Web Page DesignA Metamodel For Web Page Design
A Metamodel For Web Page DesignJoe Osborn
 
A Schema-Based Approach To Modeling And Querying WWW Data
A Schema-Based Approach To Modeling And Querying WWW DataA Schema-Based Approach To Modeling And Querying WWW Data
A Schema-Based Approach To Modeling And Querying WWW DataLisa Garcia
 
ITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 EssayITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 EssaySheena Crouch
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.pptNaglaaFathy42
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 

Similar to An Annotation Framework For The Semantic Web (20)

Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
Web of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesWeb of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case Studies
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Spotlight
SpotlightSpotlight
Spotlight
 
SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep Web
 
Artificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain OntologiesArtificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain Ontologies
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Database
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
A Metamodel For Web Page Design
A Metamodel For Web Page DesignA Metamodel For Web Page Design
A Metamodel For Web Page Design
 
A Schema-Based Approach To Modeling And Querying WWW Data
A Schema-Based Approach To Modeling And Querying WWW DataA Schema-Based Approach To Modeling And Querying WWW Data
A Schema-Based Approach To Modeling And Querying WWW Data
 
ITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 EssayITEC 610 Assingement 1 Essay
ITEC 610 Assingement 1 Essay
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.ppt
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 

More from Andrea Porter

Taking Notes With Note Cards
Taking Notes With Note CardsTaking Notes With Note Cards
Taking Notes With Note CardsAndrea Porter
 
How To Write A TOK Essay 15 Steps (With Pictures) - WikiHow
How To Write A TOK Essay 15 Steps (With Pictures) - WikiHowHow To Write A TOK Essay 15 Steps (With Pictures) - WikiHow
How To Write A TOK Essay 15 Steps (With Pictures) - WikiHowAndrea Porter
 
How To Write A Good Topic Sentence In Academic Writing
How To Write A Good Topic Sentence In Academic WritingHow To Write A Good Topic Sentence In Academic Writing
How To Write A Good Topic Sentence In Academic WritingAndrea Porter
 
Narrative Essay Argumentative Essay Thesis
Narrative Essay Argumentative Essay ThesisNarrative Essay Argumentative Essay Thesis
Narrative Essay Argumentative Essay ThesisAndrea Porter
 
A Good Introduction For A Research Paper. Writing A G
A Good Introduction For A Research Paper. Writing A GA Good Introduction For A Research Paper. Writing A G
A Good Introduction For A Research Paper. Writing A GAndrea Porter
 
How To Find The Best Paper Writing Company
How To Find The Best Paper Writing CompanyHow To Find The Best Paper Writing Company
How To Find The Best Paper Writing CompanyAndrea Porter
 
Primary Handwriting Paper - Paging Supermom - Free Printable Writing
Primary Handwriting Paper - Paging Supermom - Free Printable WritingPrimary Handwriting Paper - Paging Supermom - Free Printable Writing
Primary Handwriting Paper - Paging Supermom - Free Printable WritingAndrea Porter
 
Write An Essay On College Experience In EnglishDesc
Write An Essay On College Experience In EnglishDescWrite An Essay On College Experience In EnglishDesc
Write An Essay On College Experience In EnglishDescAndrea Porter
 
Ghost Writing KennyLeeHolmes.Com
Ghost Writing KennyLeeHolmes.ComGhost Writing KennyLeeHolmes.Com
Ghost Writing KennyLeeHolmes.ComAndrea Porter
 
How To Write A Good Concluding Observation Bo
How To Write A Good Concluding Observation BoHow To Write A Good Concluding Observation Bo
How To Write A Good Concluding Observation BoAndrea Porter
 
How To Write An Argumentative Essay
How To Write An Argumentative EssayHow To Write An Argumentative Essay
How To Write An Argumentative EssayAndrea Porter
 
7 Introduction Essay About Yourself - Introduction Letter
7 Introduction Essay About Yourself - Introduction Letter7 Introduction Essay About Yourself - Introduction Letter
7 Introduction Essay About Yourself - Introduction LetterAndrea Porter
 
Persuasive Essay Essaypro Writers
Persuasive Essay Essaypro WritersPersuasive Essay Essaypro Writers
Persuasive Essay Essaypro WritersAndrea Porter
 
Scholarship Personal Essays Templates At Allbusinesst
Scholarship Personal Essays Templates At AllbusinesstScholarship Personal Essays Templates At Allbusinesst
Scholarship Personal Essays Templates At AllbusinesstAndrea Porter
 
How To Create An Outline For An Essay
How To Create An Outline For An EssayHow To Create An Outline For An Essay
How To Create An Outline For An EssayAndrea Porter
 
ESSAY - Qualities Of A Good Teach
ESSAY - Qualities Of A Good TeachESSAY - Qualities Of A Good Teach
ESSAY - Qualities Of A Good TeachAndrea Porter
 
Fountain Pen Writing On Paper - Castle Rock Financial Planning
Fountain Pen Writing On Paper - Castle Rock Financial PlanningFountain Pen Writing On Paper - Castle Rock Financial Planning
Fountain Pen Writing On Paper - Castle Rock Financial PlanningAndrea Porter
 
Formatting A Research Paper E
Formatting A Research Paper EFormatting A Research Paper E
Formatting A Research Paper EAndrea Porter
 
Business Paper Examples Of Graduate School Admissio
Business Paper Examples Of Graduate School AdmissioBusiness Paper Examples Of Graduate School Admissio
Business Paper Examples Of Graduate School AdmissioAndrea Porter
 
Impact Of Poverty On The Society - Free Essay E
Impact Of Poverty On The Society - Free Essay EImpact Of Poverty On The Society - Free Essay E
Impact Of Poverty On The Society - Free Essay EAndrea Porter
 

More from Andrea Porter (20)

Taking Notes With Note Cards
Taking Notes With Note CardsTaking Notes With Note Cards
Taking Notes With Note Cards
 
How To Write A TOK Essay 15 Steps (With Pictures) - WikiHow
How To Write A TOK Essay 15 Steps (With Pictures) - WikiHowHow To Write A TOK Essay 15 Steps (With Pictures) - WikiHow
How To Write A TOK Essay 15 Steps (With Pictures) - WikiHow
 
How To Write A Good Topic Sentence In Academic Writing
How To Write A Good Topic Sentence In Academic WritingHow To Write A Good Topic Sentence In Academic Writing
How To Write A Good Topic Sentence In Academic Writing
 
Narrative Essay Argumentative Essay Thesis
Narrative Essay Argumentative Essay ThesisNarrative Essay Argumentative Essay Thesis
Narrative Essay Argumentative Essay Thesis
 
A Good Introduction For A Research Paper. Writing A G
A Good Introduction For A Research Paper. Writing A GA Good Introduction For A Research Paper. Writing A G
A Good Introduction For A Research Paper. Writing A G
 
How To Find The Best Paper Writing Company
How To Find The Best Paper Writing CompanyHow To Find The Best Paper Writing Company
How To Find The Best Paper Writing Company
 
Primary Handwriting Paper - Paging Supermom - Free Printable Writing
Primary Handwriting Paper - Paging Supermom - Free Printable WritingPrimary Handwriting Paper - Paging Supermom - Free Printable Writing
Primary Handwriting Paper - Paging Supermom - Free Printable Writing
 
Write An Essay On College Experience In EnglishDesc
Write An Essay On College Experience In EnglishDescWrite An Essay On College Experience In EnglishDesc
Write An Essay On College Experience In EnglishDesc
 
Ghost Writing KennyLeeHolmes.Com
Ghost Writing KennyLeeHolmes.ComGhost Writing KennyLeeHolmes.Com
Ghost Writing KennyLeeHolmes.Com
 
How To Write A Good Concluding Observation Bo
How To Write A Good Concluding Observation BoHow To Write A Good Concluding Observation Bo
How To Write A Good Concluding Observation Bo
 
How To Write An Argumentative Essay
How To Write An Argumentative EssayHow To Write An Argumentative Essay
How To Write An Argumentative Essay
 
7 Introduction Essay About Yourself - Introduction Letter
7 Introduction Essay About Yourself - Introduction Letter7 Introduction Essay About Yourself - Introduction Letter
7 Introduction Essay About Yourself - Introduction Letter
 
Persuasive Essay Essaypro Writers
Persuasive Essay Essaypro WritersPersuasive Essay Essaypro Writers
Persuasive Essay Essaypro Writers
 
Scholarship Personal Essays Templates At Allbusinesst
Scholarship Personal Essays Templates At AllbusinesstScholarship Personal Essays Templates At Allbusinesst
Scholarship Personal Essays Templates At Allbusinesst
 
How To Create An Outline For An Essay
How To Create An Outline For An EssayHow To Create An Outline For An Essay
How To Create An Outline For An Essay
 
ESSAY - Qualities Of A Good Teach
ESSAY - Qualities Of A Good TeachESSAY - Qualities Of A Good Teach
ESSAY - Qualities Of A Good Teach
 
Fountain Pen Writing On Paper - Castle Rock Financial Planning
Fountain Pen Writing On Paper - Castle Rock Financial PlanningFountain Pen Writing On Paper - Castle Rock Financial Planning
Fountain Pen Writing On Paper - Castle Rock Financial Planning
 
Formatting A Research Paper E
Formatting A Research Paper EFormatting A Research Paper E
Formatting A Research Paper E
 
Business Paper Examples Of Graduate School Admissio
Business Paper Examples Of Graduate School AdmissioBusiness Paper Examples Of Graduate School Admissio
Business Paper Examples Of Graduate School Admissio
 
Impact Of Poverty On The Society - Free Essay E
Impact Of Poverty On The Society - Free Essay EImpact Of Poverty On The Society - Free Essay E
Impact Of Poverty On The Society - Free Essay E
 

Recently uploaded

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...Nguyen Thanh Tu Collection
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 

Recently uploaded (20)

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
TỔNG ÔN TáșŹP THI VÀO LỚP 10 MÔN TIáșŸNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGở Â...
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 

An Annotation Framework For The Semantic Web

  • 1. An Annotation Framework for the Semantic Web Steffen Staab, Alexander Maedche and Siegfried Handschuh Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany fsst, ama, shag@aifb.uni-karlsruhe.de http://www.aifb.uni-karlsruhe.de/WBS Abstract Creating metadata by annotating documents is one of the major techniques for putting machine understandable data on the Web. Though there exist many tools for annotating web pages, few of them fully support the creation of semanti- cally interlinked metadata, such as necessary for a truely Semantic Web. In this paper, we present an ontology-based annotation environment, On- toAnnotate, which offers comprehensive support for the creation of semantically interlinked meta- data by human annotators. 1 Introduction With the upswing of metadata on the Semantic Web for means like semantic web portals (Staab et al., 2000a), there comes the urgent need for adding semantic metadata to existing web pages such that they are digestible for humans and ma- chines. Though there exists a wide range of so- phisticated, even professional, annotation tools (cf. Section 4 on related work), none of the ones that we know of has yet fully exploited the new wealth of possibilities that come with RDF (Las- sila & Swick, 1999) and RDF-Schema (Brickley & Guha, 1999) as metadata formats. In partic- ular, semantic annotation has so far mostly re- stricted to describing documents or items in doc- uments in isolation of each other. In light of the Semantic Web, what intelligent agents crave for are web pages and items on web pages that are not only described in isolation from each other, but that are also semantically interlinked. We have used semantically interlinked in- formation for gathering knowledge relevant in a particular community of users (Staab et al., 2000a). The underlying idea was that for that domain a group of users would provide seman- tic metadata about the content of relevant web pages. Thus, our Community Web Portal could present all this knowledge, taking great advan- tage of semantic structures: personalization by semantic bookmarks (“Fred is interested in RDF research”), conceptual browsing, or the deriva- tion of implicit knowledge (e.g., if John works in a project, which is about XML then he knows something about XML), have been some of the features that thrived by having semantically in- terlinked information. Similarly, we envision that intelligent agents may profit from semanti- cally interlinked information on the Web in the future.1 Building the Community Web Portal we found that there were a number of tricky issues with providing semantic annotation in this manner: First of all, the semantic annotation task does not adhere to a strict template structure, such as Dublin Core to name one of the more sophis- ticated ones in use. Rather it needs to follow the structure given by schema definitions that may vary with, e.g., domain and purpose. In fact, our intelligent agents rely on domain on- tologies. Semantic annotations need to be con- gruent with ontology definitions in order to al- low for the advantages we have indicated above. Secondly, semantically interlinked metadata is labor-intensive to produce and, hence, expen- sive. Therefore duplicate annotation must be avoided. Because semantic annotation is a con- tinuous process in a distributed setting there are several sources for duplication. There is knowl- edge generated by other annotators. In order to allow for the reuse of their annotations it is im- portant that one does not start from scratch when 1 Similar projects like WebKB (Martin & Eklund, 1999), SHOE (Heflin & Hendler, 2000), and, more recently, DAML (http://www.daml.org) point in the same di- rection.
  • 2. annotating sources, but that one builds on oth- ers efforts (in particular their creation of IDs). Then, there is a multitude of schema descrip- tions (ontologies) that also change over time to reflect changes in the world. Because manual re-annotation of old web pages seems practically infeasible, one needs an annotation framework that allows to handle ontology creation, map- pings and versioning. Thirdly, purely manual an- notation is very expensive. Therefore, only very valuable information will be annotated and it is necessary to help the human annotator with his task. What is needed is support for automatic — or at the least, semi-automatic — semantic an- notation of web pages. Finally, there is a lack of experience in creating semantically interlinked metadata for web pages. It is not clear how hu- man annotators perform overall and, hence, it is unclear what can be assumed as a baseline for the machine agent. Though there are correspond- ing investigations for only indexing documents, e.g. in library science (Leonard, 1977), a corre- sponding richer assignment of interlinked meta- data that takes advantage of the object structures of RDF is lacking. In this paper, we deal with the first three of the above mentioned issues. Regarding the fourth problem on evaluating semantic annotation we refer the interested reader to a companion paper (Staab et al., 2000b). We first (Section 2) present our basic tool for ontology-based semantic annotation and, then, consider the issue of semantic annotation as an ongoing process. In particular, interlinkage be- tween objects and evolving metadata schema need to be managed to avoid redundant annota- tions and re-annotating, respectively. Section 3 describes the different layers for semi-automatic semantic annotation and introduces the tech- niques that we use in our tool. Before con- cluding, we discuss related work in the areas of evolving schemata and crawling, semi-automatic semantic annotation — prerequisite experiences and techniques for useful, semantically inter- linked metadata on the Semantic Web. 2 Ontology-based Semantic Annotation An ontology is commonly defined as an explicit, formal specification of a shared conceptualiza- tion of an domain of interest. This means that an ontology describes some application-relevant part of the world in a machine-understandable way. The concepts and concept definitions that are part of the ontology have been agreed upon by a community of people who have an interest in the corresponding ontology. The core “ingre- dients” of an ontology are its set of concepts, its set of properties, and the relationships between the elements of these two sets. Ontological structures may give additional value to semantic annotations. They allow for additional possibilities on the resulting semantic annotations, such as inferencing or conceptual navigation that we have mentioned before. But also the reference to a commonly agreed set of concepts by itself constitutes an additional value through its normative function. Furthermore, an ontology directs the attention of the annotator to a predefined choice of semantic structures and, hence, gives some guidance about what and how items residing in the documents may be anno- tated. Besides of these advantages that ontology- based semantic annotation yields in compari- son to “free text metadata generation”, the ex- tended set of capabilities also entails some new problems that need to be solved. In partic- ular, semantic interlinkage between document items incurs the difficulty to adequately manage these interlinkages. Essentially, this means that an ontology-based annotation tool must address the issue of object identity and its management across many documents. Also, ontologies may have elaborate definitions of concepts. When their meaning changes, when old concepts need to be erased, or when new concepts come up, the ontology changes. Because updating previ- ous annotations is generally too expensive, one must deal with change management of ontolo- gies in relation to their corresponding annota- tions. Finally, one must prevent redundant an- notation which stem from duplicate pages on the web or annotation work done by fellow annota- tors. Hence, we provide two basic mechanisms for recognizing document identity. In the re- mainder of this section we embed these require- ments into a coherent framework.
  • 3. 2.1 OntoAnnotate — The Core Tool While most annotation tools implicitly subscribe to a particular ontology (e.g., Dublin Core), our tool, OntoAnnotate, makes the relationship be- tween particular ontologies and their parts, i.e. concepts and properties, explicit. OntoAnnotate, presents to the user an interface that dynami- cally adapts to the given ontology. It has been developed based on our earlier experiences with manual ontology-based semantic annotation that have been described in (Erdmann et al., 2000). As principal language for semantic annota- tions and ontologies, OntoAnnotate relies on RDF and RDF Schema. RDF Schema can be seen as a language for lightweight ontology de- scriptions, allowing to define the interlinkage between different concepts (called “classes” in RDFS), properties, and objects (i.e “class mem- bers”, also called “instances”). To name but a few other possible formats, WebKB uses Con- ceptual Graphs (Martin & Eklund, 1999), SHOE employs horn logic rules (Heflin & Hendler, 2000), and we have formerly exploited F-Logic (Staab et al., 2000a). RDF and RDF Schema, however, provide completely web compatible common denominator that everyone agrees on now. Therefore we have replaced proprietory formats we have used originally. OntoAnnotate allows for the easy annotation of HTML documents. One may create objects with URIs and relate them to text passages, which are then highlighted. The semantic mean- ing of the objects and the text passages is given by four semantic categories: 1. Object identification: New objects are cre- ated by asserting the existence of an object with a unique identifier. The annotation tool supports the creation of object identifiers from text passages. This is a mostly syntactic operation, the only semantic consequence is that the set of existing objects is augmented by one. 2. Object–class relationships: Each object is assigned to a class of objects by the human annotator. In general, objects may be as- serted to belong to multiple classes. To keep the user interface and the evaluation sim- pler, OntoAnnotate only allows single clas- sification. 3. Object–attribute relationships: Each ob- ject may be related to attribute values by an attribute. Each attribute value is either a text passage chosen by highlighting or a string typed in by the annotator. For a given object the annotator can only create object–attribute relationships if the object’s class definition allows its creation, i.e. if the class definition includes a corresponding at- tribute. An attribute is a property the domain of which is a literal. 4. Object–object relationships: Each object may be related to all existing objects (in- cluding itself) via an (object) relation. For a given object the annotator can only create object–object relationships if the object’s class definition allows its creation, i.e. if the class definition includes a correspond- ing (object) relation. An relation is a property the domain of which is a resource. Figure 1 shows the screen for navigating the ontology and creating annotations in OntoAnno- tate. The left pane displays the document and the right panes show the ontological structures contained in the ontology, namely classes, at- tributes and relations. In addition, the right pane shows the current semantic annotation knowl- edge base, i.e. existing objects, their classifi- cation, object–attribute relationships and object– object relationships created during the semantic annotation. To illustrate the annotation process with On- toAnnotate, we sketch a small annotation sce- nario using our tool: Annotation typically starts with identifying a new object. The user pro- vides a new object identifier and selects the ap- propriate class of this object from a tree view. In our example, the object identifier RStuder is typed in and the class FULLPROFESSOR is selected from the ontology. Upon categoriza- tion of a new object into a class C, OntoAn- notate shows the possible attributes of C (cf. the attributes ADDRESS, NAME, PHONE, etc. of FULLPROFESSOR in the right upper pane of
  • 4. Figure 1: Screenshot of the OntoAnnotate GUI. Figure 1) and the actual attributes of the cho- sen object (cf. Karlsruhe, Rudi Studer, etc. in right upper pane of Figure 1). In addition, one may look at the object rela- tions of C (cf. affiliation, cooperate- With, etc. in the right lower pane of Fig- ure 1) and the actual relations of the chosen ob- ject. In order to dynamically display the prop- erties of classes and their instances, OntoAn- notate queries the annotation inference server. The annotator continues with marking text pas- sages and drags them into empty fields of the attribute table, thereby creating new attributes relationships between the currently chosen ob- ject and the currently marked text passage (e.g., between RStuder and studer@aifb.uni- karlsruhe.de in Figure 1). The annotator may create metadata describing new object– object relationships by choosing an object re- lation and, then, either creating a new object on the fly or by choosing one of the objects, pre-selected by OntoAnnotate according to the range restriction of the chosen relation. For in- stance, the AFFILIATION of a PERSON must be an ORGANIZATION. Therefore, only organizations are offered as potential fillers for the affiliation relation of RStuder. 2.2 Object Identity The first version of OntoAnnotate already relied on ontology structures to guide annotation, but it did not consider annotation as being a pro- cess carried through in a complex environment. The general problem stems from the fact that without corresponding tool support, annotators would too often create new objects rather than re- use existing ones. Therefore new properties were not attached to existing objects, but to new enti-
  • 5. ties. In case studies, like the Community Web Portal (Staab et al., 2000a) annotators came up with many different object identifiers for single persons, which made it impossible to combine all the data about these persons. Considering semantic annotation as a contin- uous process, we came up with two new require- ments: 1. The annotation inferencing server needs to maintain object identifiers during the anno- tation process. 2. A crawler needs to gather relevant object identifiers for the start of the annotation. The first requirement is solved by the annota- tion inference server, by adding objects to and querying objects from the server during actual annotation as described in the previous subsec- tion. The second requirement has been solved by al- lowing the annotator to start a focused crawl of RDF facts — covering the document and annota- tion server, but also relevant parts of the Web — which provides the annotation inference server with an initial set of object identifiers, categories, attributes and relations. Thus, the metadata pro- vided by other annotaters may be used as the starting point that one may contribute additional data to. Currently, RDF data is comparatively weakly interlinked. Hence, it is sufficient to restrict the focus of the crawl by web server restrictions and depth of the crawl. With more metadata on the Web, one needs to employ more sophisticated techniques in the future. 2.3 Ontology Changes There exists a tight interlinkage between evolv- ing ontologies and the semantic document anno- tation. In any realistic application scenario, in- coming information that is to be annotated does not only require some more annotating, but also continuous adaptation to new semantic terminol- ogy and relationships. Heflin and Hendler (Heflin & Hendler, 2000) have elaborated in great detail on how ontology revisioning may influence semantic annotations. Therefore, we here only sketch one example re- vision and its effects: When an existing class definition is refined, the maintainer of the semantic annotations may explore the objects that belong to this class. He may decide individually or for all objects that the objects stay in the class and, hence, the semantic meaning of the annotations is extended by additional semantic con- straints; that the objects are categorized to belong only to the superclasses of the re-defined class and, hence the semantic meaning of the annotations is reduced by cutting away semantic constraints; that the objects are moved to another class. Along similar lines, other cases of ontology revi- sions are treated. The annotation maintainer may explore all the possibilities in the ontology engineering tool, OntoEdit (Staab Maedche, 2000) and may define mapping rules to bridge between differ- ent ontology revisions. Later on, querying may take advantage of these mappings to also retrieve “old” annotations. 2.4 Document Identity In order to avoid duplicate annotation, existing semantic annotations of documents should be recognized. Because interesting semantic an- notations will eventually refer to external web pages that change, the annotator needs some hints when he encounters a document that has been annotated before, but that may have slightly changed since. Finally, the annotator also needs to recognize that this may be a duplication of another document seen before (e.g. on a mirror site). For these recognition tasks we provide the fol- lowing mechanisms: In our local setting we have a document management system where anno- tated documents and their metadata are stored. OntoAnnotate uses the URI to detect the re- encounter of previously annotated documents and highlights annotations in the old document for the user. Then the user may decide to ignore or even delete the old annotations and create new metadata, he may augment existing data, or he may just be satisfied with what has been anno- tated before.
  • 6. In order to recognize that a document has been annotated before, but now appears under a differ- ent URI, OntoAnnotate searches in the document management system computing similarity with existing documents by document vector models. If there appear documents the similarity which to the currently viewed document is near 1, then these are indicated to the annotator such that he may check for congruency. These two techniques for recognizing docu- ment identity are very basic, but effective for maintaining document identity in OntoAnnotate, given a dynamic environment such as the Web. 2.5 OntoAnnotate — The Semantic Annotation Environment The overall annotation environment as outlined in this section is depicted in Figure 2: The core OntoAnnotate is used for viewing web pages and actually providing annotations. It also stores an- notated documents in the document management system and adds new metadata to the annota- tion inference server. The latter is also queried for providing conceptual restrictions given by the ontology. Thus, the annotator’s view is restricted to conceptual structures that are congruent with the given ontology. The annotation process is started either with an annotation inference server without objects, or the server process is fed with metadata crawled from the Web and the document server. The an- notation inference server supports multiple on- tologies. Annotations refer to the classes and properties that were used for their creation by namespaces. F-Logic rules are finally used to map between different namespaces, thus allow- ing to keep track of semantic annnotations (at least to some degree) even when the currently used ontology is replaced by an update. The user additionally has the possibility to use semi-automatic means for recognizing class instances and properties between them. In the subsequent section we will further describe the text analysis component that supports semi- automatic semantic text annotation. 3 Semi-Automatic Annotation Based on our experiences and the existing anno- tation tool for supporting ontology-based seman- tic annotation of texts, we now approach semi- automatic annotation of documents. In gen- eral, one may distinguish between different kinds of semi-automatic annotation mechanisms, that have already researched in existing work: Wrapper Generation: Especially in the case of annotating web pages that mainly consist of HTML tables, one may annotate the first row of the table and automatically enumerate over the residual rows of the ta- ble. Pattern Matching: Regularity of word expressions may be captured by regular expression based patterns. For example given the pattern fwordg*fGmbHg yields for the german language to generic pattern for company names, and, thus, successfully recognize instances of the class COMPANY of the ontology. Patterns are stored with the concepts of the domain ontology. Information Extraction: The most com- plex mechanism for semi-automatic anno- tation is full fledged ontology-based infor- mation extraction based on a shallow text processing strategy. Depending in the structure given in the docu- ments one may apply one of the methods listed above. We here only shortly describe the mecha- nisms that we currently use in our tool OntoAn- notate. In real-world documents typically all three methods (and more) will have to be applied in combination. In our future work we will ana- lyze the structures contained in the documents to derive a suitable processing strategy for the doc- uments and document parts. Wrapper Generation. Recently, several ap- proaches have been proposed for wrapping semi- structured documents, such as HTML docu- ments. Wrapper factories (cf. Sahuguet et al. (Sahuguet Azavant, 2000)) and wrapper in- duction (cf. Kushmerick (Kushmerick, 2000)) have considerably facilitated the task of wrap- per construction. In order to wrap directly into OntoAnnotate we have developed our own wrap- per approach that directly alignes regularities in semi-structured documents with their corre- sponding ontological meaning.
  • 7. Annotation Inference Server Distributed RDF stored document annotations (according to the domain ontology) Document Server with RDF statements crawl Access annotate web pages versioned domain ontology validate local copy WWW Onto Annotate GUI Text Analysis Component Figure 2: OntoAnnotate — The Semantic Annotation Environment. Pattern Matching. We use a very simple mechanism for recognizing patterns in HTML documents. We use OroMatcher2 based on Perl 5.003 regular expressions. Patterns are devel- oped and tested in our regular expression work- bench. Information Extraction. At the highest level of processing we conceive an information extraction-based approach for semi-automatic annotation, which has been implemented on top of SMES (Saarbrücken Message Extraction Sys- tem), a shallow text processor for German (cf. (Neumann et al., 1997)). This is a generic com- ponent that adheres to several principles that are crucial for our objectives. (i), it is fast and robust, (ii), it realizes a mapping from terms to ontolog- ical concepts, (iii) it yields dependency relations between terms, and, (iv), it is easily adaptable to new domains.3 2 OroMatcher 1.1 is freely available at http://www.savarese.org/oro/software/ OROMatcher1.1.html 3 The interlinkage between the information extraction system SMES and domain ontologies is described in fur- ther detail in (Staab et al., 1999). We here give a short survey on SMES in or- der to provide the reader with a comprehensive picture of what underlies our system. The ar- chitecture of SMES comprises a tokenizer based on regular expressions, a lexical analysis compo- nent including a word and a domain lexicon, and a chunk parser. The tokenizer scans the text in order to identify boundaries of words and com- plex expressions like “$20.00” or “Mecklenburg- Vorpommern”4, and to expand abbreviations. The lexicon contains more than 120,000 stem entries and more than 12,000 subcategorization frames describing information used for lexical analysis and chunk parsing. Furthermore, the domain-specific part of the lexicon associates word stems with concepts that are available in the concept taxonomy. Lexical Analysis uses the lexicon to perform, (1), morphological analysis, i.e., the identification of the canonical common stem of a set of related word forms and the anal- ysis of compounds, (2), recognition of name en- tities, (3), retrieval of domain-specific informa- tion, and, (4), part-of-speech tagging. While the 4 Mecklenburg-Vorpommern is a region in the north east of Germany.
  • 8. steps (1),(2) and (4) can be a viewed as standard for information extraction approaches (cf. (Ap- pelt et al., 1993; Neumann et al., 1997)), the step (3) is of specific interest for our annotation task. This step associates single words or complex ex- pressions with a concept from the ontology if a corresponding entry in the domain-specific part of the lexicon exists. E.g., the expression “Hotel Schwarzer Adler” is associated with the concept HOTEL. SMES includes a chunk parser based on weighted finite state transducers to efficiently process phrasal and sentential patterns. The parser works on the phrasal level, before it analyzes the overall sentence. Grammatical functions (such as subject, direct-object) are determined for each dependency-based senten- tial structure on the basis of subcategorizations frames in the lexicon. Our primary output de- rived from SMES consists of dependency rela- tions (Hudson, 1990) found through lexical anal- ysis (compound processing) and through parsing at the phrase and sentential level. Thereby, the grammatical dependency relation need not even hold directly between two conceptually mean- ingful entities. For instance, in the sentence ”The Hotel Schwarzer Adler in Rostock celebrates Christmas.“, “Hotel Schwarzer Adler” and “Rostock”, the concepts of which appear in the ontology as HOTEL and CITY, respectively, are not directly connected by a dependency re- lation. However, the preposition “in” acts as a mediator that incurs the conceptual pairing of HOTEL with CITY. 4 Related Work This paper is motivated by the urgent need for adding metadata to existing web pages in an ef- ficient and flexible manner that takes advantage of the rich possibilities offered by RDF (Lassila Swick, 1999) and RDF-Schema (Brickley Guha, 1999). Tools and practices so far have not reflected the new possibilities. There are only a few tools that support adding metadata to existing web pages. We will present related work in this area and show how our ap- proach and our implemented tool described in section 2 compares to the existing work. Ad- ditionally, our paper introduces semantic anno- tation as a continuous process. We therefore shortly review existing work in this area. Related Work on Annotation Tools. Koivunen et. al. (Koivunen et al., 2000) introduce a framework for categorizing annota- tion tools distinguishing between a proxy–based and a browser–based approach. The proxy-based approach stores and merges the annotation and therefore preprocesses the annotated documents to be viewable for a standard web-browser. Within the browser–based approach the browser is modified to merge the document with the annotation data just prior to presenting the content to the user. Many of the annotation tools rely on special- ized browsers to offer a better user interface. One of them is Amaya. Amaya (Guetari et al., 1998; Vatton, 2000) is a web-browser that acts both as an editor and as a browser. It has been de- signed at W3C with the primary purpose of be- ing a testbed for experimenting and demonstrat- ing new languages, protocols and formats for the Web. It includes a WYSIWYG editor for HTML and XML. It can publish documents re- motely, through the HTTP protocol. It han- dles Cascading Style Sheets (CSS) and the new MathML language, for representing mathemat- ical expressions. An experiment for including vector graphics into Web documents is also de- scribed. Amaya is the primary browser /editor for the annotation approach in (Koivunen et al., 2000). The annotation data itself is exchanged in RDF/XML form to provide other clients access to the annotation database. Currently, however, it does not provide comprehensive support with annotation inference server and crawling. ComMentor (Roescheisen et al., 1994) is an- other browser-based tool as part of the Stanford Integrated Digital Library Project. It manages the meta-information independently of the docu- ments on separate meta-information servers. The research prototype implementation was com- pleted in 1994, the code of the tool is no longer maintained. ThirdVoice5 is a commercial product that uses plug-ins to enhance web browsers. This en- hancement allows the access to the annotation stored at the ThirdVoice database located on a centralized server from the company. The an- 5 http://www.thirdvoice.com
  • 9. notated text parts will appear in the browser as underlined links. These links point to the infor- mation on the database that will be presented on the user request in a separate viewer. Most of the annotation stored there seems to be links to further information, so that ThirdVoice is mainly used as a kind of an extended link-list. Along the same lines, JotBot (Vasudevan Palmer, 1999) follows a browser–based approach that uses Java applets to modify the browsers behavior. Yawas (Denoue Vignollet, 2000) is an anno- tation tool that is based on the Document Object Model (DOM) and Dynamic HTML. It codes the annotations into an extended URL format and uses local files similar to bookmark files to store and retrieve the annotations. A modified browser can then transform the URL format into DOM events. Locally stored annotation files can be sent to other users. The CritLink (Yee, 1998) annotation tool fol- lows the proxy approach. This approach has the advantage that it works with any exist- ing browser. The system is simply used by prefixing the URL with http://crit.org e.g. to see the annotated version of semanticweb.org someone can access the system with the URL http://crit.org/http://semanticweb.org. The approach closest to OntoAnnotate is the SHOE Knowledge Annotator. The Knowledge Annotator is a Java program that allows users to mark-up web pages with the SHOE ontology. The SHOE system (Luke et al., 1997) defines ad- ditional tags that can be embedded in the body of HTML pages. In SHOE there is no direct rela- tionship between the new tags and the original text of the page, i.e. SHOE tags are not annota- tions in a strict sense. According to the above mentioned classifi- cation OntoAnnotate follows the browser-based approach with the exception that it is not devel- oped as an web-browser extension. OntoAnno- tate can be regarded as a workbench for semantic annotation of documents using domain-specific ontologies and this enriching HTML pages with semantics that an software agent is capable to au- tomatically process the content of the page and reason about it. Related Work on Semantic Annotation as a Continuous Process. There is only little re- search that considers the maintenance of ontolo- gies or more general the maintenance of knowl- edge bases. In (Menzies, 1998) an overview over knowledge maintenance is given. Menzies re- views systems that contribute to different types of knowledge maintenance. The paper analyzes the AI and software engineering literature ac- cording to 35 different knowledge maintenance tasks. It concludes that there is no overall strat- egy that covers all 35 tasks. The phenomenon of dynamic ontologies has nicely been described in (Heflin Hendler, 2000). In their work they discuss the prob- lems associated with managing ontologies in dis- tributed environments such as the web. The underlying representation language is SHOE, a web-based representation language that supports multiple versions of ontologies. Foo (Foo, 1995) has published some initial, theoretical thoughts on ontology revision. Foo outlines the main ideas on the topic of ontology revision and con- stitutes ontology change as a frontier of knowl- edge systems research. Related Work on Semi-Automatic Annota- tion. Pustejovsky et al. (Pustejovsky et al., 1997) describe their approach for semantic in- dexing and typed hyperlinking. As in our ap- proach finite state technologies support lexical acquisition as well as semantic tagging. The goal of the overall process is the generation of so called lexical webs that can be utilized to en- able automatic and semi-automatic construction of web-based texts. In (Bod et al., 1997) approaches for learn- ing syntactic structures from syntactically tagged corpus has been transferred to the semantic level. In order to tag a text corpus with type- logical formulae, they created tool environment called SEMTAGS for semi-automatically enrich- ing trees with semantic annotations. SEMTAGS incrementally creates a first order markov model based on existing annotations and proposes a se- mantic annotation of new syntactic trees. The authors report promising results: After the first 100 sentences of the corpus had been annotated, SEMTAGS already produced the correct annota- tions for 80% of the nodes for the immediately subsequent sentences.
  • 10. 5 Conclusion This paper presents an approach for creating meta data by (semi-automatic) annotating web pages. Starting from our ontology-based anno- tation environment OntoAnnotate, we have col- lected experiences in an actual evaluation study. Future work will have to start on current stud- ies that have looked at the feasibility of auto- matic building of knowledge bases from the web (cf. (Craven et al., pear)). In our future work, we want to integrate such methods into an even more comprehensive annotation environment — including e.g. the learning of ontologies from web documents (Maedche Staab, 2000) and (semi-)automatic ontology-based semantic an- notation. The general task of knowledge mainte- nance, including evolving ontologies and seman- tic annotation knowledge bases, remains a topic for much further research in the near future. Acknowledgements. We thank Stefan Decker for initiating the first version of an annotation tool. We thank our students Mika Maier-Collin and Jochen Klotzbuecher for building the Anno- tation Tool; Dirk Wenke for the Ontology En- gineering Environment OntoEdit; and our col- league Kalvis Apsitis for the RDF Crawler. We gratefully acknowledge the dedication of our stu- dents who annotated web pages for our exper- iments. This work was partially supported by DARPA in the project DAML/OntoAgents and by Ontoprise GmbH. References Appelt, D., Hobbs, J., Bear, J., Israel, D., Tyson, M. (1993). FASTUS: A finite state processor for information extraction from real world text. In Proceedings of IJCAI-93, Chambery, France. Bod, R., Bonnema, R., Scha, R. (1997). Data- oriented semantic interpretation. In In Pro- ceedings of the Second International Workshop on Computational Semantics (IWCS), Tilburg, 1997. Brickley, D. Guha, R. (1999). Resource description framework (RDF) schema specification. Tech- nical report, W3C. W3C Proposed Recommen- dation. http://www.w3.org/TR/PR-rdf-schema/. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S. (to appear). Learning to construct knowledge bases from the world wide web. Artificial Intelligence. Denoue, L. Vignollet, L. (2000). An annota- tion tool for web browsers and its applications to information retrieval. In In Proceedings of RIAO2000, Paris. Erdmann, M., Maedche, A., Schnurr, H.-P., Staab, S. (2000). From manual to semi-automatic se- mantic annotation: About ontology-based text annotation tools. In P. Buitelaar K. Hasida (eds). Proceedings of the COLING 2000 Work- shop on Semantic Annotation and Intelligent Content, Luxembourg. Foo, N. (1995). Ontology Revision. In Proceedings of the 3rd International Conference on Conceptual Structures. Springer Lecture Notes in Artificial Intelligence. Springer. Guetari, R., Quint, V., Vatton, I. (1998). Amaya: an Authoring Tool for the Web. In Maghrebian Conference on Software Engineering and Artifi- cial Intelligence. Tunis, Tunsia. Heflin, J. Hendler, J. (2000). Dynamic Ontolo- gies on the Web. In Proceedings of Ameri- can Association for Artificial Intelligence Con- ference (AAAI-2000). Menlo Park, California, AAAI Press. Hudson, R. (1990). English Word Grammar. Basil Blackwell, Oxford. Koivunen, M.-R., Brickley, D., Kahan, J., Hom- meaux, E. P., Swick, R. R. (2000). The W3C Collaborative Web Annotation Project ... or how to have fun while building an RDF infrastruc- ture. Kushmerick, N. (2000). Wrapper Induction: Effi- ciency and Expressiveness. Artificial Intelli- gence, 118(1). Lassila, O. Swick, R. (1999). Resource description framework (RDF) model and syntax specifica- tion. Technical report, W3C. W3C Recommen- dation. http://www.w3.org/TR/REC-rdf-syntax. Leonard, L. (1977). Inter-Indexer Consistence Stud- ies, 1954-1975: A Review of the Literature and Summary of the Study Results. Graduate School of Library Science, University of Illi- nois. Occasional Papers No.131. Luke, S., Spector, L., Rager, D., Hendler, J. (1997). Ontology-based Web agents. In Johnson, W. L. (Ed.), Proceedings of the 1st International Con- ference on Autonomous Agents, pages 59–66. ACM. Maedche, A. Staab, S. (2000). Discovering con- ceptual relations from text. In ECAI-2000 - European Conference on Artificial Intelligence. Proceedings of the 13th European Conference on Artificial Intelligence. IOS Press, Amster- dam.
  • 11. Martin, P. Eklund, P. (1999). Embedding Knowl- edge in Web Documents. In Proceedings of the 8th Int. World Wide Web Conf. (WWW‘8), Toronto, May 1999, pages 1403–1419. Elsevier Science B.V. Menzies, T. (1998). Knowledge maintenance: The state of the art. The Knowledge Engineering Re- view, 10(2). Neumann, G., Backofen, R., Baur, J., Becker, M., Braun, C. (1997). An information extraction core system for real world german text process- ing. In In Proceedings of ANLP-97, pages 208– 215, Washington, USA. Pustejovsky, J., Boguraev, B., Verhagen, M., Buite- laar, P., Johnston, M. (1997). Semantic in- dexing and typed hyperlinking. In Proceedings of AAAI Spring Symposium, NLP for WWW. Roescheisen, M., Mogensen, C., Winograd, T. (1994). Shared Web Annotations as a Plat- form for Third-Party Value-Added Information Providers: Architecture, Protocols, and Usage Examples. Technical report stan-cs-tr-97-1582, Computer Science Dept., Stanford University. Sahuguet, A. Azavant, F. (2000). Building In- telligent Web Applications Using Lightweight Wrappers. to appear in: Data and Knowledge Engineering. Staab, S., Angele, J., Decker, S., Erdmann, M., Hotho, A., Maedche, A., Studer, R., Sure, Y. (2000a). Semantic Community Web Portals. In Proceedings of the 9th World Wide Web Confer- ence (WWW-9), Amsterdam, Netherlands. Staab, S., Braun, C., Düsterhöft, A., Heuer, A., Klet- tke, M., Melzig, S., Neumann, G., Prager, B., Pretzel, J., Schnurr, H.-P., Studer, R., Uszkoreit, H., Wrenger, B. (1999). GETESS — search- ing the web exploiting german texts. In Pro- ceedings of the 3rd Workshop on Cooperative Information Agents, LNCS, Berlin. Springer. Staab, S. Maedche, A. (2000). Ontology engineer- ing beyond the modeling of concepts and rela- tions. In Benjamins, V., Gomez-Perez, A., Guarino, N. (Eds.), Proceedings of the ECAI- 2000 Workshop on Ontologies and Problem- Solving Methods. Berlin, August 21-22, 2000. Staab, S., Maedche, A., Handschuh, S. (2000b). Creating Metadata for the Semantic Web - A Annotation Environment and its Evaluation. Technical Report 412, Institute AIFB, Karl- sruhe University. Vasudevan, V. Palmer, M. (1999). On Web Annota- tions: Promises and Pittfalls of Current Web In- frastucture. In Proceedings of HICSS’99, pages 5–8, Maui, Hawaii. Vatton, I. (2000). W3C’s Amaya 4.0 Edi- tor/Browser. Technical report, W3C. http://w3c1.inria.fr/Amaya/. Yee, K.-P. (1998). CritLink: Better Hyperlinks for the WWW. http://crit.org/ ping/ht98.html.