SlideShare a Scribd company logo
1 of 8
Download to read offline
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1936
Semantic Annotation of Documents: A Compar-
ative Study
Hasna Abioui, Ali Idarrou, Ali Bouzit, Driss Mammass
IRF-SIC Laboratory, Ibn Zohr University, Agadir, Marocco
Abstract—Semantic annotation, which is considered one
of the semantic web applicative aspects, has been adopted
by researchers from different communities as a para-
mount solution that improves searching and retrieval of
information by promoting the richness of the content.
However, researchers are facing challenges concerning
both the quality and the relevance of the semantic annota-
tions attached to the annotated document against its con-
tent as well as its semantics, without ignoring those re-
garding automation process which is supposed to ensure
an optimal system for information indexing and retrieval.
In this article, we will introduce the semantic annotation
concept by presenting a state of the art including defini-
tions, features and a classification of annotation systems.
Systems and proposed approaches in the field will be
cited, as well as a study of some existing annotation tools.
This study will also pinpoint various problems and limita-
tions related to the annotation in order to offer solutions
for our future work.
Keywords— Document, Semantic Annotation, Meta-
data, Information Retrieval.
I. INTRODUCTION
Nowadays, the amount of digital data reaches an unprece-
dented stage and keeps growing each day. However, the
volume is not the only factor that leads digital data to
explosion; evidently, the richness of contents and also the
remarkable heterogeneity of structures have a significant
impact. So, faced with that explosion, the challenge is
how we can provide an improved management, a better
exploitation and an efficient understanding of the infor-
mation contained in these digital sources. Well, semantic
annotation became one of the treated alternatives in this
regard, that may handle these requirements by ensuring a
good comprehension of document contents, thus allowing
an easy exploitation, exchange, and shareability.
Otherwise, semantic annotation is the practical solution
that may promote the information retrieval either on the
web or in large databases by calling upon ontologies,
thesauri and data extraction and segmentation algorithms,
in order to enhance the richness of contents through the
attachment of descriptive notes by considering different
contexts from which these notes may come, the various
structures that can represent them and the relationship that
links both of them while continuously safeguarding strict
separation between the resources and their annotations
[1].
Different application domains appeal the semantic anno-
tation as a recent research field to evaluate its efficiency
on various content types and structures. For instance, the
electronic Medical Records (EMR) used in the healthcare
industry [2] [3] is one of these application fields. As a
matter of fact, an EMR can be created by a set of experts
in different medical sub-domains and who may have va-
rying levels of expertise, which explains the abundance
and the diversity of contained information in a patient’s
record, such as epidemiological studies, medical history,
lab work, etc. Practically, the collaborative annotation of
the EMR by those practitioners facilitates the access to
the needed information without having to study or ana-
lyze the entire record.
This paper is organized as follows: in the second section,
we shall survey different definitions of annotation in the
literature and its different features. We shall also distin-
guish between annotation and metadata. We then present
and compare three types of annotation, and complete the
section by listing and comparing various annotation tools
available nowadays. The third section will present exam-
ples from recent studies of several annotation systems.
They shall be categorized according to the types of docu-
ments they aim to annotate. Finally, in the fourth section,
we present a synthesis of the survey.
II. STATE OF THE ART
2.1 Annotation: what does it mean?
In the literature, annotation is defined as a critical or ex-
planatory note accompanying the text. In computer sci-
ence, different definitions are often cited. Authors in [4]
define the annotation as graphical or textual information
attached to a document or simply placed on it. Evidently,
when we talk about an electronic document, it may be
mono-media, multimedia or a web page within the seman-
tic web. Hence, the forms of annotation differ depending
on the interest and the purpose of the annotator. It might,
for instance, be that a resource user or reader who wishes
to annotate a paragraph in order to simplify its next read-
ing, or facilitate the access to the information according to
the intended use.
In this case, the annotation will probably be non-textual
i.e. graphical and expressed by underlining, highlighting
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1937
or pinpointing elements, so as to create a personalized re-
segmentation and reorganization of the document or the
paragraph. However, if the annotator is the resource’s
author, an expert or even a simple user wishing to enrich
the document by associating other information, this latter
will certainly be textual information selected according to
a well-defined context.
Similarly, the author in [5], believes that the form of
the annotation depends on its function, distinguishing
between two forms of textual annotation: (i) personal
annotations – called also individual annotations– that are
used to translate a term, highlight it by providing more
definitions, rephrase a passage, etc; and (ii) collective
annotations that introduce the notion of sharing and ex-
changing which allow annotators – readers and/or experts
analyzing a document – to ask questions, provide an-
swers, and give feedback in the form of annotations.
2.2 From Annotation to Semantic Annotation
With the appearance of the semantic web and during its
evolution, an annotation treats and concerns more specifi-
cally semantics, since it remains one of the applicative
aspects of the semantic web, then called semantic annota-
tion. In the same context, the W3C (World Wide Web
Consortium) defines a semantic annotation as a comment,
an explanation or any other external note that can be at-
tached to a web document or a part of it. According to
[22], semantic annotation designates both the activity
which consists of affixing a note regarding a part of a
document and also that resulting note; and it’s reflected in
the definition of a semantic information layer that gives
meaning and sense to the text. In [8] [7], authors added
that these notes assigned to the entities to annotate have a
well-structured and a formally described semantic which
is already defined within the ontology and presented as a
set of concepts, instances of concepts and relations be-
tween them, by indicating that these annotations and
metadata can be stored within the document itself or in a
separate one by using the URI (Universal Resource Iden-
tifier) referencing the annotated entity.
2.3 Semantic annotation versus Metadata
However, when analyzing different definitions presented
in the literature, we note that semantic annotation is often
related to another term which is metadata. True to its
name, the latter is a data about other data i.e. a data about
the document. Metadata is considered as a label attached
to a document that describes sufficiently its content, even-
tually its main features without needing to open it for
consultation. According to [6], metadata help to identify,
describe and localize electronic resources. Not too far
from this, [23] links metadata to an easy access, share,
and reusability of information.
Nevertheless, a distinction is made in [1] between seman-
tic annotation and metadata. The metadata term is more
independent than annotations and can itself be a resource
attached to the annotated resource as additional informa-
tion. A good example would be a summary prepared
separately for an article, a publication date for a docu-
ment, the duration of an audio or video clip, an artist's
name, or even the names of the instruments played in a
musical piece. While metadata is external, though possi-
bly attached, to the document, annotations lie within the
annotated resource and are written during the read-
ing/annotation process. In this case, it may be the lyrics of
a song, a sound played at a certain point in a presentation,
a textual string of words, sentences or even a paragraph.
2.4 A Semantic Annotation Systems Classification
A semantic annotation process, regardless of the anno-
tated resources type, can be manual, semi-automatic or
automatic. In other words, it concerns the annotation sys-
tem's automation level. In the following, we shall intro-
duce those three categories, weighing their pros and cons,
then concluding the subsection by comparing the most
commonly used annotation tools.
2.4.1 Manual Annotation
As its name indicates, manual annotation requires the
intervention of a human annotator who must provide de-
scriptive keywords. This intervention may be made at
several levels:
(i) In the resource drafting phase: in this case, the
author represents the annotator.
(ii) At the time of resource loading/tagging: in this
instance, generally, an appeal to an expert or group
of experts performing semantic, contextual and
collaborative annotation is needed; as in the case
of the patient records example that requires a col-
laborative and manual annotation by a group of
experts.
(iii) During consultation or final resource-use: here, it
concerns a complementary annotation for personal
use by adding notes and comments. As in the con-
text of E-learning or video-conferences, annota-
tions can be used in order to
facilitate note-taking, navigation platform, se-
quencing courses, etc.
2.4.2 Semi-automatic Annotation
Semi-automatic annotation is the combination of both:
manual and automatic annotation. It’s based on a human
intervention during an automatic process, by taking ad-
vantage of this latter’s efficiency on one side and the
accuracy of the manual annotation on the other. The user
may intervene at the beginning of the process as a re-
source annotator, by providing keywords and basic anno-
tations that the system must exploit to produce final anno-
tations. (Figure1.a). The annotator can intervene by the
end as well, but this time, as an expert that must validate
or cancel the annotations automatically proposed by the
system before generating final annotations (Figure 1.b).
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1938
Fig.1: Semi-Automatic Annotation
2.4.3 Automatic Annotation
Automatic annotation is performed by automated annotation tools without human intervention and relies on information
extraction heuristics and techniques (e.g. by exploiting redundancy), indexing and simple segmentation (data stream process-
ing, character strings, etc.) [7][8].
Table.1: Compares the three types cited above by summarizing their cons and pros:
TABLE I. A Comparison of Annotation Types
Annotation type Advantages Disadvantages
Manual
Annotation
- Superior in terms of quality: well targeted, relevant
and semantically precise.
- Selected keywords based on a human determination
of the resource semantic content.
Very time-consuming.
Being done by humans, it is highly
error-prone and can easily result in
syntactic errors and incorrect refer-
ences.
Semi-automatic
Annotation
- Ensures accurate and relevant annotation by combin-
ing human understanding with systematic efficiency.
- More efficient and less time consuming compared to
the manual annotation.
- More precise and leads to relevant and accurate
results compared to automatic annotation.
- Recommended for dynamic databases.
Still dependent on human interven-
tion, what makes it arduous and ex-
pensive process in terms of time and
used resources.
Automatic An-
notation
- Suitable for resources that have a large amount of
data to make a decision according to that to deduce
the qualified information to be annotations.
Lowering of quality and accuracy of
annotation if compared to manual
and semi-automatic annotation.
By weighing cons and pros of the three system categories
cited above, we notice that each one can be evoked in a
well-defined operating environment regardless of the
resource type. However, choosing the appropriate system
automation level depends on others criteria that we briefly
mention:
- Volume and number of resources to annotate: we can
only think about manual or semi-automatic annotation
as long as these two factors are reasonable and don’t
increase over time.
- Expected goal from these annotations: sometimes, we
look for detailed and more subtle semantic annotations
but other times, just simple and general annotations
are sufficient. Moreover, data and annotations granu-
larity is still believed to be the major limit for auto-
matic annotation since it’s hard to annotate data using
a high and fine enough level of granularity.
- If the resource to annotate already has minimal seman-
tic information on which an automatic annotation sys-
tem would be based to produce semantic annotations:
Here, we're going beyond the resource itself by treat-
ing its structures, in order to extract all useful deemed
information for a profitable automatic annotation
process in terms of efficiency and quality. In the op-
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1939
posite case, human intervention providing this infor-
mation would be needed either individually or collec-
tively.
- Sometimes, even the resource can provide semantic
information by using information extraction tools and
treated through TAL (the natural Language's Auto-
matic Treatment) tools, the process may be limited or
even blocked during data's operational phase. That
blockage is often linked to the resource domain speci-
ficity, for example, the lack of domain ontologies or
semantic networks, etc. Several related works faced
this problem, such as [24] having as objective to anno-
tate images semantically by using keywords in gastro-
enterology, so the domain specificity had prompted
researchers to conceive and develop their own polyps
ontology in conjunction with standard reasonings in
description logic SHOIQ+. The same problem was
mentioned in [25], when researchers were forced to
create a lexico-semantic network linked to radiology
in the absence of a french version of radiology an-
thologies as RadLex, a bilingual one offering just
English and German versions. So, in the two and other
similar cases, the annotator has to intervene to validate
the suggested annotations since that concerns new rea-
sonings and semantic models.
2.4.4 Annotation tools comparison
In this subsection, we shall compare the most commonly
used annotation tools and often cited in the literature. We
compare them using different criteria, namely the automa-
tion level, the resource types that they can annotate, and
the languages and schemas used to generate annotations.
In fact, the majority of these tools are used in the context
of the semantic web, as they treat web pages as resources.
Some tools are limited to adding free text as comments or
notes. Others on the other hand aim to semantically clar-
ify the resource content. All of them, however, use RDF
(Resource Description Framework) as a basic language
for the formal description of resources. Still, the formats
used in domain ontologies differ from one tool to another.
Generally, they vary between predetermined ontologies
implemented in RDFS (RDF Schema) and DAML + OIL
(DARPA Agent Markup Language and Ontology Infer-
ence Layer) later superseded by OWL (Web Ontology
Language).
Other tools are reserved for the annotation of audio-visual
documents. For these, the annotation is manual and in-
volves adding comments to documentary content, usually
collaboratively. Note that manual annotation systems
despite their precision are far from ideal, especially in the
context of the web, where we need to treat a huge amount
of resources and information. On the other hand, semi-
automatic and automatic systems may yield relatively
inferior results, depending on the information extraction
tools and the learning techniques they use.
In fact, these information extraction tools and the differ-
ent learning algorithms cited in this regard – which are
the basis of semi-automatic and automatic annotation
tools and systems – will be the object of our future re-
search in which we try to explore the possibilities in view
of improving the annotation process. But at this point, we
have restricted our attention to the related works on anno-
tation to elucidate the problem of annotation and informa-
tion retrieval in various types of resources, adopting vari-
ous forms of architectures.
Table.2 shows a detailed comparison of annotation tools
according to already mentioned criteria.
Soft-
ware
tool
Type
Of
annota-
tion
Types of
re-
sources
anno-
tated
Informa-
tion
extraction
tool
Metadata
schema and
ontology used
Language used
for generating
annotations
Observations
Annotea
[9]
Manual
annotatio
n
HTML,
XML
documen
ts
―
- uses
elements/terms
from
DublinCore
(author, date,
title, corpus…).
- free-text
annotation
(comments,
notes).
- domain
ontology
formatted in
RDFS.
Annotations
output as RDF
triplets and
linked to
documents with
XPointers
- Designed for W3C.
Allows simple text
annotations of web page
content, structured and
non-structured.
- No explicit semantics
on content or the
concepts in it.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1940
TABLE I. A COMPARISON OF ANNOTATION TOOLS
V-
Annotea
[10]
Audiovis
ual
documen
ts
―
- Uses simple
elements/terms
from
DublinCore.
- free-text
annotations:
comments,
remarks, etc.
Description of
document
elements in
MPEG-7
- Allows collaborative
indexation, navigation,
annotation and
discussion of video
contents amongst
several groups in
geographically
dispersed locations
The
CREAM
Model
[11]
HTML,
XML
documen
ts
―
- Uses domain
ontologies
DAML+OIL /
OWL.
Annotations
saved in
document file as
RDF
tags/attachments
.
- It's a general open and
complete model
allowing the
development of
annotation tools from
ontologies.
Ontomat
Annotize
r
Web
page
fragment
s
(HTML,
XML)
―
- Uses domain
ontologies
DAML+OIL /
OWL.
Annotations
written out in
RDF.
-A tool implementing
the CREAM model.
- M-Ontomat-Annotizer
is also based on this
model but it's meant for
multimedia documents.
MnM
same
principle
as
Mellita
Semi-
automati
c
annotatio
n
HTML/
XML
documen
ts
Amilcare
- RDF.
- uses
predefined
ontologies
DAML+OIL
and OCML
Generates
annotations in
RDF, XML
- Learning and
information extraction
is done by Amilcare,
while correction and
validation of the
annotations are manual
S-
CREAM
[13]
HTML
documen
ts
Amilcare
- uses ontologies
in DAML+OIL DAML+OIL
- manual annotation
using CREAM.
Armadill
o [12]
Automati
c
annotatio
n
HTML
documen
ts
Amilcare
- uses domain
ontologies in
RDFS.
Annotations in
RDF triplets
- Uses redundancy on
the web to establish
relationships between
instances.
AeroDA
ML [14]
HTML
documen
ts
AeroText
- uses ontologies
in DAML+OIL.
Generation of
automatic
annotations in
DAML.
- The web version uses
a generic ontology. The
client/server version
supports annotations
with custom-built
ontologies.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1941
III. RELATED WORKS
As stated previously, annotation is becoming an essential
activity for an easy resources exploitation and manage-
ment, thanks to its simple data manipulation and its low
time-consuming and that what makes it more and more
use in various research fields, such as e-learning, health-
care, etc. In the following, we examine some annotation
related works from different categories in order to sum-
marize various adopted techniques and concepts.
3.1 Annotation in the semantic web
In the context of the semantic web, [15] proposes an an-
notation system in three phases for web pages.
A web page annotation system is proposed in [15] by
using three phases: First, it starts with the marking and
automatic identification of relevant elements brought
together in a corpus in order to associate them with prede-
termined ontology concepts. Then comes the learning
process, which exploits the tree structure of the web page
provided by DOM (Document Object Model) in order to
deduce for each ontology concept and role an assimilated
parallel path from the web page. Finally, it comes up with
an annotation based on the generation of ontology in-
stances by direct application of the obtained assimilated
paths. One of the major advantages of this approach is
that it generates not only instances of concepts but also
instances of the roles of concepts in ontology. However,
the use of the tree structure is usually limited by the de-
gree of regularity in the web page and the extent of its
conformity with the hierarchy represented in the ontol-
ogy. In the same context, [16] opts for a semi-automatic
annotation always according to the semantics of the web
page, but without the use of its DOM structure. Instead,
[16] opts for the use of the Semantic Radar tool, which
performs automatic metadata extraction, defining the
semantics of the page with descriptor types FOAF (Friend
of a friend), SIOC (Semantically-Interlinked Online
Communities), DOAP (Description of A Project) and
RDFa (Resource Description Framework in attributes)
with the purpose of generating an initial source of analy-
sis for each page. The next step is to reinforce the rele-
vance of the annotation by pairing the obtained results
with others offered by domain ontology, according to the
rules of equivalence and an annotation model. At this
point, two types of ontologies are considered: (i) an en-
riched domain ontology (OWL) and (ii) a FOAF and
SIOC ontology (.RDF). Finally, we get for each web
page, a file containing metadata in RDF format. The ex-
perimental results are quite satisfactory according to [16]:
45% of web pages are annotated especially in a vast envi-
ronment, containing heterogeneous resources such as the
web.
In [17], another annotation system is presented:
'DYNOSYS'. It has a distributed architecture that supports
collaborative work and it is platform-independent. Unlike
the system in [16], generated annotations in [17] are in
XML format.
3.2 Annotation of multimedia documents
As for the area of multimedia documents, we look at vari-
ous studies that are interested in very specific fields, and
which, through their proposals and approaches, have been
able to satisfactorily meet the varying needs of the practi-
tioners in these fields.
For instance, [19] presents an approach that aims to au-
tomating manual annotation tools dealing with the annota-
tion of sign language videos, by proposing architecture
capable of providing a distributed system hosting auto-
matic annotation assistants on remote servers. Data de-
scription is in XML format and is structured in the form
of annotation graphs that annotation tools must be able to
import and export.
In the medical field, [18] proposes a method of know-
ledge creation, that is based on the cooperative annotation
of surgical videos by practitioners and domain experts, by
building and sharing objects, results and observations
which will be useful for the extraction of the video's se-
mantics. To this end, two processes are put in place: one
for creating concepts and the other for learning them.
Each of the two processes goes through three phases: (i)
an individual observation phase which allows the partici-
pants to annotate videos, each according to his/her level
of expertise, his viewpoint etc. (ii) a group negotiation
phase, and finally (iii) a concept elaboration phase. Each
process iterates as many times as necessary to achieve a
set of common and coherent annotations between group
members, which makes the task difficult and complicated
despite the proven improvement in practices according to
experimental studies. Regarding the semantic data, on-
tologies and annotations are structured in RDFS / RDF
format.
Two other works deal with the annotation and the descrip-
tion of multimedia documents using XML graphs similar-
ly to the previous work. The first system, proposed in [6]
is a meta-modeling of semi-structured data by defining a
set of metadata for each different medium – text, image,
audio and video – in order to have a unified presentation
of annotations for multimedia documents based on the
content and its semantics. The metadata structuring is
done in XML documents called meta-documents contain-
ing textual metadata or links to information in the case of
non-textual metadata. The second is a video annotation
system presented in [20], which proposes to organize and
present the annotations in the form of graphs by introduc-
ing the description schemas and dimensions of analysis.
The aim is to ensure (i) easy management of an extended
vocabulary, and (ii) an easy, fast, customized query-and-
response construction system when searching for infor-
mation.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1942
3.3 Annotation as a part of E-learning
In the e-learning field, some approaches are directed to-
wards resources annotation. They focus on platforms and
systems that allow instructors to prepare course materials
and make them available to students. [21] is one of works
coming in that context. It proposes an annotation model
dedicated to teachers based on three facets of learning:
cognitive, semantic and contextual. This model, translated
into an architecture, gives birth to two subsystems. One of
them allows the management of learning contexts while
the other takes care of annotations management that are
created and transmitted in XML, to ensure seamless
communication and exchange between different modules
of the model.
In the following, we present a discussion in summary
form drawn from the aforementioned works.
IV. SUMMARY AND DISCUSSION
By surveying the literature of semantic annotation, some
indispensable requirements come to mind in order, to
create a complete and highly performed annotation sys-
tem that fits the needs and expectations of practitioners.
From the perspective of automation level, we notice that
manual annotation systems offer a high accuracy. More-
over, they are outdated and consuming in terms of time
and manpower. For entirely automated systems, the qual-
ity of results remains inferior, which explain the use of
semi-automatic annotation systems that combines the
strengths of both worlds. In other words, the more a sys-
tem combines principles of manual annotation – collabo-
rative annotation, manually-made ontologies (generic
domain-specific, predefined, or custom-made for a spe-
cific need), the definition of concepts, models and rules
by human agents – with a well-optimized automation, the
better the efficiency and the quality of the annotation. The
better integrated the extensions and tools of descriptor
semantics and data extraction, the more satisfactory and
efficient the annotations are. Regarding the physical ar-
chitecture, a variety of works emphasize on the distribu-
tion concept. A distributed architecture improves the sys-
tem effectiveness by, (i) guaranteeing the high availability
of services against the interruption, (ii) minimizing the
critical failure points, (iii) parallelizing tasks and (iv)
load-balancing the work over cluster nodes.
In terms of data and resources description, collaborative
annotation stands out as the most preferred solution when
creating abundant, reliable and very specific annotations
for partially or entirely document annotation regardless of
the resource type. Certain works suggest to begin with a
single annotation before a negotiation step between anno-
tators, to discuss and choose the most semantically appro-
priate concepts. Others choose the creation of collabora-
tive annotation sessions as forums, to exchange all-out
quantity of information and knowledge related to the
same resource from different points of view and annota-
tors with different level of expertise, possibly even by
using various annotation forms. Concerning the serializa-
tion, storage and transformation of semantic data, ontolo-
gies are often encoded in RDFS and OWL, while annota-
tions are generated using RDF format. In fact, RDF is
defined as the semantic web’s greatest achievement. It is
by a margin the best resource description standard that fits
most significant needs. Its formal description of web re-
sources guarantees efficient machine-readability and
automatic processing of metadata and annotation.
As for the annotation of multimedia documents, we no-
ticed that most practitioners opt for an XML graphs based
description, due to the need to provide some structure to
the document to ensure data communication and ex-
change while still successfully maintaining document’s
description.
V. CONCLUSION
Document annotation is becoming more and more crucial
and indispensable for a high-performance document man-
aging and exploitation. Although in the current study, we
highlight some problems related to both document anno-
tation tools and methods. So, first, we define broadly the
concept of annotation, by providing an overview of the
state of the art while surveying the main annotation tools
and related works in the literature, then we end with a
synthesis. Based on that, our own semantic annotation
system will be developed according to high quality and
performance annotation, having in mind multiples types
of media, document structures and the system automation
level.
REFERENCES
[1] Y.Prié, S.Garlatti, “Annotations et métadonnées
dans le Web sémantique”, in Revue I3 Information-
Interaction - Intelligence, Numéro Hors-série Web
sémantique, 2004, 24 pp.
[2] S. Bringay, C. Barry, J. Charlet, “Un modèle pour
les annotations du dossier patient in formatisé”,
P. Salembier, M. Zacklad (eds.), Vol. Annotations
dans les documents pour l'action, pp. 47-
67, Hermespublishing, 2006.
[3] S. Bringay, C. Barry, J. Charlet, “Les annotations
pour gérer les connaissances du dossier patient”,
Actes IC 2005, Nice (France): 73-84.
[4] E. Desmontils, C. Jacquin, “Annotations sur le Web
: notes de lecture”, in Journées Scientifiques Web
Sémantique, (Action Spécifique STIC CNRS), Paris,
France, 2002, pp 10-11.
[5] G.Cabanac, “Annotation de ressources électroniques
sur le Web : formes et usages”, Master Report, Paul
Sabatier University, June 2005.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 1938
Fig.1: Semi-Automatic Annotation
2.4.3 Automatic Annotation
Automatic annotation is performed by automated annotation tools without human intervention and relies on information
extraction heuristics and techniques (e.g. by exploiting redundancy), indexing and simple segmentation (data stream process-
ing, character strings, etc.) [7][8].
Table.1: Compares the three types cited above by summarizing their cons and pros:
TABLE I. A Comparison of Annotation Types
Annotation type Advantages Disadvantages
Manual
Annotation
- Superior in terms of quality: well targeted, relevant
and semantically precise.
- Selected keywords based on a human determination
of the resource semantic content.
Very time-consuming.
Being done by humans, it is highly
error-prone and can easily result in
syntactic errors and incorrect refer-
ences.
Semi-automatic
Annotation
- Ensures accurate and relevant annotation by combin-
ing human understanding with systematic efficiency.
- More efficient and less time consuming compared to
the manual annotation.
- More precise and leads to relevant and accurate
results compared to automatic annotation.
- Recommended for dynamic databases.
Still dependent on human interven-
tion, what makes it arduous and ex-
pensive process in terms of time and
used resources.
Automatic An-
notation
- Suitable for resources that have a large amount of
data to make a decision according to that to deduce
the qualified information to be annotations.
Lowering of quality and accuracy of
annotation if compared to manual
and semi-automatic annotation.
By weighing cons and pros of the three system categories
cited above, we notice that each one can be evoked in a
well-defined operating environment regardless of the
resource type. However, choosing the appropriate system
automation level depends on others criteria that we briefly
mention:
- Volume and number of resources to annotate: we can
only think about manual or semi-automatic annotation
as long as these two factors are reasonable and don’t
increase over time.
- Expected goal from these annotations: sometimes, we
look for detailed and more subtle semantic annotations
but other times, just simple and general annotations
are sufficient. Moreover, data and annotations granu-
larity is still believed to be the major limit for auto-
matic annotation since it’s hard to annotate data using
a high and fine enough level of granularity.
- If the resource to annotate already has minimal seman-
tic information on which an automatic annotation sys-
tem would be based to produce semantic annotations:
Here, we're going beyond the resource itself by treat-
ing its structures, in order to extract all useful deemed
information for a profitable automatic annotation
process in terms of efficiency and quality. In the op-

More Related Content

What's hot

Subject Indexing & Techniques
Subject Indexing  & TechniquesSubject Indexing  & Techniques
Subject Indexing & TechniquesDr. Utpal Das
 
Assessment of the use of indexing and abstracting service by patrons of feder...
Assessment of the use of indexing and abstracting service by patrons of feder...Assessment of the use of indexing and abstracting service by patrons of feder...
Assessment of the use of indexing and abstracting service by patrons of feder...Alexander Decker
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebEditor IJCATR
 
Definition of index
Definition of indexDefinition of index
Definition of indexrajib2
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...IJwest
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3alaa223
 
Syntactic Indexes for Text Retrieval
Syntactic Indexes for Text RetrievalSyntactic Indexes for Text Retrieval
Syntactic Indexes for Text RetrievalITIIIndustries
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1CorinaF
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document ClassificationIDES Editor
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based SegmentationIRJET Journal
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
 
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTSOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTARAVINDRM2
 
TOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAIN
TOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAINTOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAIN
TOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAINhiij
 

What's hot (20)

Subject Indexing & Techniques
Subject Indexing  & TechniquesSubject Indexing  & Techniques
Subject Indexing & Techniques
 
Assessment of the use of indexing and abstracting service by patrons of feder...
Assessment of the use of indexing and abstracting service by patrons of feder...Assessment of the use of indexing and abstracting service by patrons of feder...
Assessment of the use of indexing and abstracting service by patrons of feder...
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
 
Definition of index
Definition of indexDefinition of index
Definition of index
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
 
Syntactic Indexes for Text Retrieval
Syntactic Indexes for Text RetrievalSyntactic Indexes for Text Retrieval
Syntactic Indexes for Text Retrieval
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Data
 
Lec 2
Lec 2Lec 2
Lec 2
 
[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET V2I3P13] Authors: Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1
 
Indexing
IndexingIndexing
Indexing
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
Text classification
Text classificationText classification
Text classification
 
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTSOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
 
TOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAIN
TOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAINTOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAIN
TOWARDS A STANDARD OF MODELLING ANNOTATIONS IN THE E-HEALTH DOMAIN
 

Viewers also liked

Hybrid vs native mobile app development platform which one to choose
Hybrid vs native mobile app development platform which one to chooseHybrid vs native mobile app development platform which one to choose
Hybrid vs native mobile app development platform which one to chooseSolution Analysts
 
Collaboration travelmed ณ ทชร
Collaboration travelmed ณ ทชรCollaboration travelmed ณ ทชร
Collaboration travelmed ณ ทชรSarawuth Noliam
 
3 comportamiento en insectos y vertebrados
3 comportamiento en insectos y vertebrados3 comportamiento en insectos y vertebrados
3 comportamiento en insectos y vertebradoshectortorreslima
 
Tutela laboral empleados publicos
Tutela laboral empleados publicosTutela laboral empleados publicos
Tutela laboral empleados publicosvidasindical
 
Mapas progreso matematica_estadisticaprobabilidad
Mapas progreso matematica_estadisticaprobabilidadMapas progreso matematica_estadisticaprobabilidad
Mapas progreso matematica_estadisticaprobabilidadEsther Segovia
 
EastNets en.SafeWatch Profiling in Spanish - Brochure
EastNets en.SafeWatch Profiling in Spanish - BrochureEastNets en.SafeWatch Profiling in Spanish - Brochure
EastNets en.SafeWatch Profiling in Spanish - BrochureEastNets
 

Viewers also liked (13)

Hybrid vs native mobile app development platform which one to choose
Hybrid vs native mobile app development platform which one to chooseHybrid vs native mobile app development platform which one to choose
Hybrid vs native mobile app development platform which one to choose
 
Collaboration travelmed ณ ทชร
Collaboration travelmed ณ ทชรCollaboration travelmed ณ ทชร
Collaboration travelmed ณ ทชร
 
Comando y su uso
Comando y su usoComando y su uso
Comando y su uso
 
Print House Company Profile
Print House Company ProfilePrint House Company Profile
Print House Company Profile
 
주보11 20
주보11 20주보11 20
주보11 20
 
Cellebelar
CellebelarCellebelar
Cellebelar
 
F 15 Strick Eagle
F 15 Strick EagleF 15 Strick Eagle
F 15 Strick Eagle
 
3 comportamiento en insectos y vertebrados
3 comportamiento en insectos y vertebrados3 comportamiento en insectos y vertebrados
3 comportamiento en insectos y vertebrados
 
Administrativoskn
AdministrativosknAdministrativoskn
Administrativoskn
 
Logroño aca
Logroño acaLogroño aca
Logroño aca
 
Tutela laboral empleados publicos
Tutela laboral empleados publicosTutela laboral empleados publicos
Tutela laboral empleados publicos
 
Mapas progreso matematica_estadisticaprobabilidad
Mapas progreso matematica_estadisticaprobabilidadMapas progreso matematica_estadisticaprobabilidad
Mapas progreso matematica_estadisticaprobabilidad
 
EastNets en.SafeWatch Profiling in Spanish - Brochure
EastNets en.SafeWatch Profiling in Spanish - BrochureEastNets en.SafeWatch Profiling in Spanish - Brochure
EastNets en.SafeWatch Profiling in Spanish - Brochure
 

Similar to semantic annotation of documents a compar ative study

A Lightweight Approach To Semantic Annotation Of Research Papers
A Lightweight Approach To Semantic Annotation Of Research PapersA Lightweight Approach To Semantic Annotation Of Research Papers
A Lightweight Approach To Semantic Annotation Of Research PapersScott Bou
 
Automatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News ArticlesAutomatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News ArticlesJoaquin Hamad
 
Aigaion A Web-Based Open Source Software For Managing The Bibliographic Refe...
Aigaion  A Web-Based Open Source Software For Managing The Bibliographic Refe...Aigaion  A Web-Based Open Source Software For Managing The Bibliographic Refe...
Aigaion A Web-Based Open Source Software For Managing The Bibliographic Refe...Carrie Romero
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperDBOnto
 
Annotated Corpus For Citation Context Analysis
Annotated Corpus For Citation Context AnalysisAnnotated Corpus For Citation Context Analysis
Annotated Corpus For Citation Context AnalysisDeja Lewis
 
published article
published articlepublished article
published articlehiij
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...ijcsity
 
Towards a Standard of Modelling Annotations in the E-Health Domain
Towards a Standard of Modelling Annotations in the E-Health DomainTowards a Standard of Modelling Annotations in the E-Health Domain
Towards a Standard of Modelling Annotations in the E-Health Domainhiij
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine LearningIRJET Journal
 
Exploiting rhetorical relations to
Exploiting rhetorical relations toExploiting rhetorical relations to
Exploiting rhetorical relations toIJNSA Journal
 
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONEXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONIJNSA Journal
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
 
09 9241 it co-citation network investigation (edit ari)
09 9241 it  co-citation network investigation (edit ari)09 9241 it  co-citation network investigation (edit ari)
09 9241 it co-citation network investigation (edit ari)IAESIJEECS
 
"Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment ""Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment "ijtsrd
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
 

Similar to semantic annotation of documents a compar ative study (20)

A Lightweight Approach To Semantic Annotation Of Research Papers
A Lightweight Approach To Semantic Annotation Of Research PapersA Lightweight Approach To Semantic Annotation Of Research Papers
A Lightweight Approach To Semantic Annotation Of Research Papers
 
Automatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News ArticlesAutomatic Annotation Approach Of Events In News Articles
Automatic Annotation Approach Of Events In News Articles
 
Aigaion A Web-Based Open Source Software For Managing The Bibliographic Refe...
Aigaion  A Web-Based Open Source Software For Managing The Bibliographic Refe...Aigaion  A Web-Based Open Source Software For Managing The Bibliographic Refe...
Aigaion A Web-Based Open Source Software For Managing The Bibliographic Refe...
 
Spotlight
SpotlightSpotlight
Spotlight
 
Aggregating Semantic Annotators Paper
Aggregating Semantic Annotators PaperAggregating Semantic Annotators Paper
Aggregating Semantic Annotators Paper
 
Annotated Corpus For Citation Context Analysis
Annotated Corpus For Citation Context AnalysisAnnotated Corpus For Citation Context Analysis
Annotated Corpus For Citation Context Analysis
 
published article
published articlepublished article
published article
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Indexing
IndexingIndexing
Indexing
 
Towards a Standard of Modelling Annotations in the E-Health Domain
Towards a Standard of Modelling Annotations in the E-Health DomainTowards a Standard of Modelling Annotations in the E-Health Domain
Towards a Standard of Modelling Annotations in the E-Health Domain
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
 
Exploiting rhetorical relations to
Exploiting rhetorical relations toExploiting rhetorical relations to
Exploiting rhetorical relations to
 
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONEXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATION
 
Annotation Of
Annotation OfAnnotation Of
Annotation Of
 
Digital Library UNIT-3
Digital Library UNIT-3Digital Library UNIT-3
Digital Library UNIT-3
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
 
09 9241 it co-citation network investigation (edit ari)
09 9241 it  co-citation network investigation (edit ari)09 9241 it  co-citation network investigation (edit ari)
09 9241 it co-citation network investigation (edit ari)
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
"Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment ""Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment "
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
 

Recently uploaded

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 

Recently uploaded (20)

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 

semantic annotation of documents a compar ative study

  • 1. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1936 Semantic Annotation of Documents: A Compar- ative Study Hasna Abioui, Ali Idarrou, Ali Bouzit, Driss Mammass IRF-SIC Laboratory, Ibn Zohr University, Agadir, Marocco Abstract—Semantic annotation, which is considered one of the semantic web applicative aspects, has been adopted by researchers from different communities as a para- mount solution that improves searching and retrieval of information by promoting the richness of the content. However, researchers are facing challenges concerning both the quality and the relevance of the semantic annota- tions attached to the annotated document against its con- tent as well as its semantics, without ignoring those re- garding automation process which is supposed to ensure an optimal system for information indexing and retrieval. In this article, we will introduce the semantic annotation concept by presenting a state of the art including defini- tions, features and a classification of annotation systems. Systems and proposed approaches in the field will be cited, as well as a study of some existing annotation tools. This study will also pinpoint various problems and limita- tions related to the annotation in order to offer solutions for our future work. Keywords— Document, Semantic Annotation, Meta- data, Information Retrieval. I. INTRODUCTION Nowadays, the amount of digital data reaches an unprece- dented stage and keeps growing each day. However, the volume is not the only factor that leads digital data to explosion; evidently, the richness of contents and also the remarkable heterogeneity of structures have a significant impact. So, faced with that explosion, the challenge is how we can provide an improved management, a better exploitation and an efficient understanding of the infor- mation contained in these digital sources. Well, semantic annotation became one of the treated alternatives in this regard, that may handle these requirements by ensuring a good comprehension of document contents, thus allowing an easy exploitation, exchange, and shareability. Otherwise, semantic annotation is the practical solution that may promote the information retrieval either on the web or in large databases by calling upon ontologies, thesauri and data extraction and segmentation algorithms, in order to enhance the richness of contents through the attachment of descriptive notes by considering different contexts from which these notes may come, the various structures that can represent them and the relationship that links both of them while continuously safeguarding strict separation between the resources and their annotations [1]. Different application domains appeal the semantic anno- tation as a recent research field to evaluate its efficiency on various content types and structures. For instance, the electronic Medical Records (EMR) used in the healthcare industry [2] [3] is one of these application fields. As a matter of fact, an EMR can be created by a set of experts in different medical sub-domains and who may have va- rying levels of expertise, which explains the abundance and the diversity of contained information in a patient’s record, such as epidemiological studies, medical history, lab work, etc. Practically, the collaborative annotation of the EMR by those practitioners facilitates the access to the needed information without having to study or ana- lyze the entire record. This paper is organized as follows: in the second section, we shall survey different definitions of annotation in the literature and its different features. We shall also distin- guish between annotation and metadata. We then present and compare three types of annotation, and complete the section by listing and comparing various annotation tools available nowadays. The third section will present exam- ples from recent studies of several annotation systems. They shall be categorized according to the types of docu- ments they aim to annotate. Finally, in the fourth section, we present a synthesis of the survey. II. STATE OF THE ART 2.1 Annotation: what does it mean? In the literature, annotation is defined as a critical or ex- planatory note accompanying the text. In computer sci- ence, different definitions are often cited. Authors in [4] define the annotation as graphical or textual information attached to a document or simply placed on it. Evidently, when we talk about an electronic document, it may be mono-media, multimedia or a web page within the seman- tic web. Hence, the forms of annotation differ depending on the interest and the purpose of the annotator. It might, for instance, be that a resource user or reader who wishes to annotate a paragraph in order to simplify its next read- ing, or facilitate the access to the information according to the intended use. In this case, the annotation will probably be non-textual i.e. graphical and expressed by underlining, highlighting
  • 2. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1937 or pinpointing elements, so as to create a personalized re- segmentation and reorganization of the document or the paragraph. However, if the annotator is the resource’s author, an expert or even a simple user wishing to enrich the document by associating other information, this latter will certainly be textual information selected according to a well-defined context. Similarly, the author in [5], believes that the form of the annotation depends on its function, distinguishing between two forms of textual annotation: (i) personal annotations – called also individual annotations– that are used to translate a term, highlight it by providing more definitions, rephrase a passage, etc; and (ii) collective annotations that introduce the notion of sharing and ex- changing which allow annotators – readers and/or experts analyzing a document – to ask questions, provide an- swers, and give feedback in the form of annotations. 2.2 From Annotation to Semantic Annotation With the appearance of the semantic web and during its evolution, an annotation treats and concerns more specifi- cally semantics, since it remains one of the applicative aspects of the semantic web, then called semantic annota- tion. In the same context, the W3C (World Wide Web Consortium) defines a semantic annotation as a comment, an explanation or any other external note that can be at- tached to a web document or a part of it. According to [22], semantic annotation designates both the activity which consists of affixing a note regarding a part of a document and also that resulting note; and it’s reflected in the definition of a semantic information layer that gives meaning and sense to the text. In [8] [7], authors added that these notes assigned to the entities to annotate have a well-structured and a formally described semantic which is already defined within the ontology and presented as a set of concepts, instances of concepts and relations be- tween them, by indicating that these annotations and metadata can be stored within the document itself or in a separate one by using the URI (Universal Resource Iden- tifier) referencing the annotated entity. 2.3 Semantic annotation versus Metadata However, when analyzing different definitions presented in the literature, we note that semantic annotation is often related to another term which is metadata. True to its name, the latter is a data about other data i.e. a data about the document. Metadata is considered as a label attached to a document that describes sufficiently its content, even- tually its main features without needing to open it for consultation. According to [6], metadata help to identify, describe and localize electronic resources. Not too far from this, [23] links metadata to an easy access, share, and reusability of information. Nevertheless, a distinction is made in [1] between seman- tic annotation and metadata. The metadata term is more independent than annotations and can itself be a resource attached to the annotated resource as additional informa- tion. A good example would be a summary prepared separately for an article, a publication date for a docu- ment, the duration of an audio or video clip, an artist's name, or even the names of the instruments played in a musical piece. While metadata is external, though possi- bly attached, to the document, annotations lie within the annotated resource and are written during the read- ing/annotation process. In this case, it may be the lyrics of a song, a sound played at a certain point in a presentation, a textual string of words, sentences or even a paragraph. 2.4 A Semantic Annotation Systems Classification A semantic annotation process, regardless of the anno- tated resources type, can be manual, semi-automatic or automatic. In other words, it concerns the annotation sys- tem's automation level. In the following, we shall intro- duce those three categories, weighing their pros and cons, then concluding the subsection by comparing the most commonly used annotation tools. 2.4.1 Manual Annotation As its name indicates, manual annotation requires the intervention of a human annotator who must provide de- scriptive keywords. This intervention may be made at several levels: (i) In the resource drafting phase: in this case, the author represents the annotator. (ii) At the time of resource loading/tagging: in this instance, generally, an appeal to an expert or group of experts performing semantic, contextual and collaborative annotation is needed; as in the case of the patient records example that requires a col- laborative and manual annotation by a group of experts. (iii) During consultation or final resource-use: here, it concerns a complementary annotation for personal use by adding notes and comments. As in the con- text of E-learning or video-conferences, annota- tions can be used in order to facilitate note-taking, navigation platform, se- quencing courses, etc. 2.4.2 Semi-automatic Annotation Semi-automatic annotation is the combination of both: manual and automatic annotation. It’s based on a human intervention during an automatic process, by taking ad- vantage of this latter’s efficiency on one side and the accuracy of the manual annotation on the other. The user may intervene at the beginning of the process as a re- source annotator, by providing keywords and basic anno- tations that the system must exploit to produce final anno- tations. (Figure1.a). The annotator can intervene by the end as well, but this time, as an expert that must validate or cancel the annotations automatically proposed by the system before generating final annotations (Figure 1.b).
  • 3. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1938 Fig.1: Semi-Automatic Annotation 2.4.3 Automatic Annotation Automatic annotation is performed by automated annotation tools without human intervention and relies on information extraction heuristics and techniques (e.g. by exploiting redundancy), indexing and simple segmentation (data stream process- ing, character strings, etc.) [7][8]. Table.1: Compares the three types cited above by summarizing their cons and pros: TABLE I. A Comparison of Annotation Types Annotation type Advantages Disadvantages Manual Annotation - Superior in terms of quality: well targeted, relevant and semantically precise. - Selected keywords based on a human determination of the resource semantic content. Very time-consuming. Being done by humans, it is highly error-prone and can easily result in syntactic errors and incorrect refer- ences. Semi-automatic Annotation - Ensures accurate and relevant annotation by combin- ing human understanding with systematic efficiency. - More efficient and less time consuming compared to the manual annotation. - More precise and leads to relevant and accurate results compared to automatic annotation. - Recommended for dynamic databases. Still dependent on human interven- tion, what makes it arduous and ex- pensive process in terms of time and used resources. Automatic An- notation - Suitable for resources that have a large amount of data to make a decision according to that to deduce the qualified information to be annotations. Lowering of quality and accuracy of annotation if compared to manual and semi-automatic annotation. By weighing cons and pros of the three system categories cited above, we notice that each one can be evoked in a well-defined operating environment regardless of the resource type. However, choosing the appropriate system automation level depends on others criteria that we briefly mention: - Volume and number of resources to annotate: we can only think about manual or semi-automatic annotation as long as these two factors are reasonable and don’t increase over time. - Expected goal from these annotations: sometimes, we look for detailed and more subtle semantic annotations but other times, just simple and general annotations are sufficient. Moreover, data and annotations granu- larity is still believed to be the major limit for auto- matic annotation since it’s hard to annotate data using a high and fine enough level of granularity. - If the resource to annotate already has minimal seman- tic information on which an automatic annotation sys- tem would be based to produce semantic annotations: Here, we're going beyond the resource itself by treat- ing its structures, in order to extract all useful deemed information for a profitable automatic annotation process in terms of efficiency and quality. In the op-
  • 4. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1939 posite case, human intervention providing this infor- mation would be needed either individually or collec- tively. - Sometimes, even the resource can provide semantic information by using information extraction tools and treated through TAL (the natural Language's Auto- matic Treatment) tools, the process may be limited or even blocked during data's operational phase. That blockage is often linked to the resource domain speci- ficity, for example, the lack of domain ontologies or semantic networks, etc. Several related works faced this problem, such as [24] having as objective to anno- tate images semantically by using keywords in gastro- enterology, so the domain specificity had prompted researchers to conceive and develop their own polyps ontology in conjunction with standard reasonings in description logic SHOIQ+. The same problem was mentioned in [25], when researchers were forced to create a lexico-semantic network linked to radiology in the absence of a french version of radiology an- thologies as RadLex, a bilingual one offering just English and German versions. So, in the two and other similar cases, the annotator has to intervene to validate the suggested annotations since that concerns new rea- sonings and semantic models. 2.4.4 Annotation tools comparison In this subsection, we shall compare the most commonly used annotation tools and often cited in the literature. We compare them using different criteria, namely the automa- tion level, the resource types that they can annotate, and the languages and schemas used to generate annotations. In fact, the majority of these tools are used in the context of the semantic web, as they treat web pages as resources. Some tools are limited to adding free text as comments or notes. Others on the other hand aim to semantically clar- ify the resource content. All of them, however, use RDF (Resource Description Framework) as a basic language for the formal description of resources. Still, the formats used in domain ontologies differ from one tool to another. Generally, they vary between predetermined ontologies implemented in RDFS (RDF Schema) and DAML + OIL (DARPA Agent Markup Language and Ontology Infer- ence Layer) later superseded by OWL (Web Ontology Language). Other tools are reserved for the annotation of audio-visual documents. For these, the annotation is manual and in- volves adding comments to documentary content, usually collaboratively. Note that manual annotation systems despite their precision are far from ideal, especially in the context of the web, where we need to treat a huge amount of resources and information. On the other hand, semi- automatic and automatic systems may yield relatively inferior results, depending on the information extraction tools and the learning techniques they use. In fact, these information extraction tools and the differ- ent learning algorithms cited in this regard – which are the basis of semi-automatic and automatic annotation tools and systems – will be the object of our future re- search in which we try to explore the possibilities in view of improving the annotation process. But at this point, we have restricted our attention to the related works on anno- tation to elucidate the problem of annotation and informa- tion retrieval in various types of resources, adopting vari- ous forms of architectures. Table.2 shows a detailed comparison of annotation tools according to already mentioned criteria. Soft- ware tool Type Of annota- tion Types of re- sources anno- tated Informa- tion extraction tool Metadata schema and ontology used Language used for generating annotations Observations Annotea [9] Manual annotatio n HTML, XML documen ts ― - uses elements/terms from DublinCore (author, date, title, corpus…). - free-text annotation (comments, notes). - domain ontology formatted in RDFS. Annotations output as RDF triplets and linked to documents with XPointers - Designed for W3C. Allows simple text annotations of web page content, structured and non-structured. - No explicit semantics on content or the concepts in it.
  • 5. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1940 TABLE I. A COMPARISON OF ANNOTATION TOOLS V- Annotea [10] Audiovis ual documen ts ― - Uses simple elements/terms from DublinCore. - free-text annotations: comments, remarks, etc. Description of document elements in MPEG-7 - Allows collaborative indexation, navigation, annotation and discussion of video contents amongst several groups in geographically dispersed locations The CREAM Model [11] HTML, XML documen ts ― - Uses domain ontologies DAML+OIL / OWL. Annotations saved in document file as RDF tags/attachments . - It's a general open and complete model allowing the development of annotation tools from ontologies. Ontomat Annotize r Web page fragment s (HTML, XML) ― - Uses domain ontologies DAML+OIL / OWL. Annotations written out in RDF. -A tool implementing the CREAM model. - M-Ontomat-Annotizer is also based on this model but it's meant for multimedia documents. MnM same principle as Mellita Semi- automati c annotatio n HTML/ XML documen ts Amilcare - RDF. - uses predefined ontologies DAML+OIL and OCML Generates annotations in RDF, XML - Learning and information extraction is done by Amilcare, while correction and validation of the annotations are manual S- CREAM [13] HTML documen ts Amilcare - uses ontologies in DAML+OIL DAML+OIL - manual annotation using CREAM. Armadill o [12] Automati c annotatio n HTML documen ts Amilcare - uses domain ontologies in RDFS. Annotations in RDF triplets - Uses redundancy on the web to establish relationships between instances. AeroDA ML [14] HTML documen ts AeroText - uses ontologies in DAML+OIL. Generation of automatic annotations in DAML. - The web version uses a generic ontology. The client/server version supports annotations with custom-built ontologies.
  • 6. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1941 III. RELATED WORKS As stated previously, annotation is becoming an essential activity for an easy resources exploitation and manage- ment, thanks to its simple data manipulation and its low time-consuming and that what makes it more and more use in various research fields, such as e-learning, health- care, etc. In the following, we examine some annotation related works from different categories in order to sum- marize various adopted techniques and concepts. 3.1 Annotation in the semantic web In the context of the semantic web, [15] proposes an an- notation system in three phases for web pages. A web page annotation system is proposed in [15] by using three phases: First, it starts with the marking and automatic identification of relevant elements brought together in a corpus in order to associate them with prede- termined ontology concepts. Then comes the learning process, which exploits the tree structure of the web page provided by DOM (Document Object Model) in order to deduce for each ontology concept and role an assimilated parallel path from the web page. Finally, it comes up with an annotation based on the generation of ontology in- stances by direct application of the obtained assimilated paths. One of the major advantages of this approach is that it generates not only instances of concepts but also instances of the roles of concepts in ontology. However, the use of the tree structure is usually limited by the de- gree of regularity in the web page and the extent of its conformity with the hierarchy represented in the ontol- ogy. In the same context, [16] opts for a semi-automatic annotation always according to the semantics of the web page, but without the use of its DOM structure. Instead, [16] opts for the use of the Semantic Radar tool, which performs automatic metadata extraction, defining the semantics of the page with descriptor types FOAF (Friend of a friend), SIOC (Semantically-Interlinked Online Communities), DOAP (Description of A Project) and RDFa (Resource Description Framework in attributes) with the purpose of generating an initial source of analy- sis for each page. The next step is to reinforce the rele- vance of the annotation by pairing the obtained results with others offered by domain ontology, according to the rules of equivalence and an annotation model. At this point, two types of ontologies are considered: (i) an en- riched domain ontology (OWL) and (ii) a FOAF and SIOC ontology (.RDF). Finally, we get for each web page, a file containing metadata in RDF format. The ex- perimental results are quite satisfactory according to [16]: 45% of web pages are annotated especially in a vast envi- ronment, containing heterogeneous resources such as the web. In [17], another annotation system is presented: 'DYNOSYS'. It has a distributed architecture that supports collaborative work and it is platform-independent. Unlike the system in [16], generated annotations in [17] are in XML format. 3.2 Annotation of multimedia documents As for the area of multimedia documents, we look at vari- ous studies that are interested in very specific fields, and which, through their proposals and approaches, have been able to satisfactorily meet the varying needs of the practi- tioners in these fields. For instance, [19] presents an approach that aims to au- tomating manual annotation tools dealing with the annota- tion of sign language videos, by proposing architecture capable of providing a distributed system hosting auto- matic annotation assistants on remote servers. Data de- scription is in XML format and is structured in the form of annotation graphs that annotation tools must be able to import and export. In the medical field, [18] proposes a method of know- ledge creation, that is based on the cooperative annotation of surgical videos by practitioners and domain experts, by building and sharing objects, results and observations which will be useful for the extraction of the video's se- mantics. To this end, two processes are put in place: one for creating concepts and the other for learning them. Each of the two processes goes through three phases: (i) an individual observation phase which allows the partici- pants to annotate videos, each according to his/her level of expertise, his viewpoint etc. (ii) a group negotiation phase, and finally (iii) a concept elaboration phase. Each process iterates as many times as necessary to achieve a set of common and coherent annotations between group members, which makes the task difficult and complicated despite the proven improvement in practices according to experimental studies. Regarding the semantic data, on- tologies and annotations are structured in RDFS / RDF format. Two other works deal with the annotation and the descrip- tion of multimedia documents using XML graphs similar- ly to the previous work. The first system, proposed in [6] is a meta-modeling of semi-structured data by defining a set of metadata for each different medium – text, image, audio and video – in order to have a unified presentation of annotations for multimedia documents based on the content and its semantics. The metadata structuring is done in XML documents called meta-documents contain- ing textual metadata or links to information in the case of non-textual metadata. The second is a video annotation system presented in [20], which proposes to organize and present the annotations in the form of graphs by introduc- ing the description schemas and dimensions of analysis. The aim is to ensure (i) easy management of an extended vocabulary, and (ii) an easy, fast, customized query-and- response construction system when searching for infor- mation.
  • 7. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1942 3.3 Annotation as a part of E-learning In the e-learning field, some approaches are directed to- wards resources annotation. They focus on platforms and systems that allow instructors to prepare course materials and make them available to students. [21] is one of works coming in that context. It proposes an annotation model dedicated to teachers based on three facets of learning: cognitive, semantic and contextual. This model, translated into an architecture, gives birth to two subsystems. One of them allows the management of learning contexts while the other takes care of annotations management that are created and transmitted in XML, to ensure seamless communication and exchange between different modules of the model. In the following, we present a discussion in summary form drawn from the aforementioned works. IV. SUMMARY AND DISCUSSION By surveying the literature of semantic annotation, some indispensable requirements come to mind in order, to create a complete and highly performed annotation sys- tem that fits the needs and expectations of practitioners. From the perspective of automation level, we notice that manual annotation systems offer a high accuracy. More- over, they are outdated and consuming in terms of time and manpower. For entirely automated systems, the qual- ity of results remains inferior, which explain the use of semi-automatic annotation systems that combines the strengths of both worlds. In other words, the more a sys- tem combines principles of manual annotation – collabo- rative annotation, manually-made ontologies (generic domain-specific, predefined, or custom-made for a spe- cific need), the definition of concepts, models and rules by human agents – with a well-optimized automation, the better the efficiency and the quality of the annotation. The better integrated the extensions and tools of descriptor semantics and data extraction, the more satisfactory and efficient the annotations are. Regarding the physical ar- chitecture, a variety of works emphasize on the distribu- tion concept. A distributed architecture improves the sys- tem effectiveness by, (i) guaranteeing the high availability of services against the interruption, (ii) minimizing the critical failure points, (iii) parallelizing tasks and (iv) load-balancing the work over cluster nodes. In terms of data and resources description, collaborative annotation stands out as the most preferred solution when creating abundant, reliable and very specific annotations for partially or entirely document annotation regardless of the resource type. Certain works suggest to begin with a single annotation before a negotiation step between anno- tators, to discuss and choose the most semantically appro- priate concepts. Others choose the creation of collabora- tive annotation sessions as forums, to exchange all-out quantity of information and knowledge related to the same resource from different points of view and annota- tors with different level of expertise, possibly even by using various annotation forms. Concerning the serializa- tion, storage and transformation of semantic data, ontolo- gies are often encoded in RDFS and OWL, while annota- tions are generated using RDF format. In fact, RDF is defined as the semantic web’s greatest achievement. It is by a margin the best resource description standard that fits most significant needs. Its formal description of web re- sources guarantees efficient machine-readability and automatic processing of metadata and annotation. As for the annotation of multimedia documents, we no- ticed that most practitioners opt for an XML graphs based description, due to the need to provide some structure to the document to ensure data communication and ex- change while still successfully maintaining document’s description. V. CONCLUSION Document annotation is becoming more and more crucial and indispensable for a high-performance document man- aging and exploitation. Although in the current study, we highlight some problems related to both document anno- tation tools and methods. So, first, we define broadly the concept of annotation, by providing an overview of the state of the art while surveying the main annotation tools and related works in the literature, then we end with a synthesis. Based on that, our own semantic annotation system will be developed according to high quality and performance annotation, having in mind multiples types of media, document structures and the system automation level. REFERENCES [1] Y.Prié, S.Garlatti, “Annotations et métadonnées dans le Web sémantique”, in Revue I3 Information- Interaction - Intelligence, Numéro Hors-série Web sémantique, 2004, 24 pp. [2] S. Bringay, C. Barry, J. Charlet, “Un modèle pour les annotations du dossier patient in formatisé”, P. Salembier, M. Zacklad (eds.), Vol. Annotations dans les documents pour l'action, pp. 47- 67, Hermespublishing, 2006. [3] S. Bringay, C. Barry, J. Charlet, “Les annotations pour gérer les connaissances du dossier patient”, Actes IC 2005, Nice (France): 73-84. [4] E. Desmontils, C. Jacquin, “Annotations sur le Web : notes de lecture”, in Journées Scientifiques Web Sémantique, (Action Spécifique STIC CNRS), Paris, France, 2002, pp 10-11. [5] G.Cabanac, “Annotation de ressources électroniques sur le Web : formes et usages”, Master Report, Paul Sabatier University, June 2005.
  • 8. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-11, Nov- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 1938 Fig.1: Semi-Automatic Annotation 2.4.3 Automatic Annotation Automatic annotation is performed by automated annotation tools without human intervention and relies on information extraction heuristics and techniques (e.g. by exploiting redundancy), indexing and simple segmentation (data stream process- ing, character strings, etc.) [7][8]. Table.1: Compares the three types cited above by summarizing their cons and pros: TABLE I. A Comparison of Annotation Types Annotation type Advantages Disadvantages Manual Annotation - Superior in terms of quality: well targeted, relevant and semantically precise. - Selected keywords based on a human determination of the resource semantic content. Very time-consuming. Being done by humans, it is highly error-prone and can easily result in syntactic errors and incorrect refer- ences. Semi-automatic Annotation - Ensures accurate and relevant annotation by combin- ing human understanding with systematic efficiency. - More efficient and less time consuming compared to the manual annotation. - More precise and leads to relevant and accurate results compared to automatic annotation. - Recommended for dynamic databases. Still dependent on human interven- tion, what makes it arduous and ex- pensive process in terms of time and used resources. Automatic An- notation - Suitable for resources that have a large amount of data to make a decision according to that to deduce the qualified information to be annotations. Lowering of quality and accuracy of annotation if compared to manual and semi-automatic annotation. By weighing cons and pros of the three system categories cited above, we notice that each one can be evoked in a well-defined operating environment regardless of the resource type. However, choosing the appropriate system automation level depends on others criteria that we briefly mention: - Volume and number of resources to annotate: we can only think about manual or semi-automatic annotation as long as these two factors are reasonable and don’t increase over time. - Expected goal from these annotations: sometimes, we look for detailed and more subtle semantic annotations but other times, just simple and general annotations are sufficient. Moreover, data and annotations granu- larity is still believed to be the major limit for auto- matic annotation since it’s hard to annotate data using a high and fine enough level of granularity. - If the resource to annotate already has minimal seman- tic information on which an automatic annotation sys- tem would be based to produce semantic annotations: Here, we're going beyond the resource itself by treat- ing its structures, in order to extract all useful deemed information for a profitable automatic annotation process in terms of efficiency and quality. In the op-