Your SlideShare is downloading. ×

Ontology Based Annotation of Text Segments


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Ontology Based Annotation of Text Segments Presented by Ahmed Rafea [email_address] Samhaa R. El-Beltagy Maryam Hazman
  • 2. Agenda
    • Objective
    • Problems related to AGROVOC
    • Requirements of the Proposed Annotation System
    • The Architecture of the Proposed Annotation System
    • Evaluation
    • Conclusion and Future Work
  • 3. Objective
    • To annotate segments of organizational electronic publications and web pages with domain meta data, using the segments headings and an ontology, to enhance the quality and focus of the information retrieved when these publications are searched.
  • 4. Problems related to AGROVOC
    • Some Arabic terms are not available though they are available in the English version e.g. Moziac viruses , Physiological disorders , ..
    • Arabic agricultural terminology differs from one country to another, e.g.. Wheat is: حنطة in AGROVOC while it is قمح in Egypt
    • Agricultural entries that are very specific to a country (for example: country specific crop varieties).
    • Some important concepts are missing from AGROVOC all together. E.g. all instances of viral diseases (the entry exists in AGROVOC but has no narrower terms).
    • Inaccurate Arabic term e.g. the term cultivation is translated to فلاحة الأرض which narrows down the actual meaning of the cultivation term. A more accurate translation for this term could have been زراعية مماراسات
    • Some important relationships were found to be missing e.g.The terms ‘Viral diseases’, and ‘Bacterial diseases’ are not listed as narrower terms of ‘Plant Diseases’
    • Inaccurate relationship The concept نباتات ضارة (Noxious plants) is not listed as a NT of نباتات (Plants), but as a related term to it.
  • 5. Requirements of the Annotation System
    • It is required to build a system that is capable of:
      • Extending or customizing an existing ontology (automatically / semi-automatically)
      • Identifying multiple possible descriptors associated with any single segment.
      • Annotating segments with as specific concepts as possible.
      • Normalizing input text and the ontology through stemming to facilitate matching.
  • 6. The Architecture of The Proposed Annotation System HTML Doc Segmentor Segment 1 Segment 2 ------- Segment n Annotator Ontology Annotated Segment Repository Ontology Extender Annotated Segments user
  • 7. Evaluation
    • An expert was asked to annotate 4088 segments
    • The implemented system run on these segments heading and the results were as follows:-
      • The number of terms added to the ontology was 395 (which is equivalent to 95.6% of the 412 terms added by the expert).
      • Precision was 97%, Recall was 91%, and F-score was 94%.
    • Running the system on another 359 segment headings, without allowing any ontology extension, the results were as follows:
      • the precision was 96%, the recall was 86% and the F-score was 91%.
  • 8. Conclusion
    • The results of experiments carried out to evaluate this work, show that it can be used to annotate document segments with a high degree of accuracy.
    • The problems encountered that led to deficiency in the recall were analyzed and currently we are trying to enhance the results accuracy. Some of these problems are due to Ontology and others are due to processing Arabic text.
    • We plan to investigate the use of the generated annotated segments to build classifiers in order to assign labels to segments that have no headings.
    • We explore ontology extraction from information rich documents so as to be able to apply our approach when an initial ontology does not exist.