To annotate segments of organizational electronic publications and web pages with domain meta data, using the segments headings and an ontology, to enhance the quality and focus of the information retrieved when these publications are searched.
Some Arabic terms are not available though they are available in the English version e.g. Moziac viruses , Physiological disorders , ..
Arabic agricultural terminology differs from one country to another, e.g.. Wheat is: حنطة in AGROVOC while it is قمح in Egypt
Agricultural entries that are very specific to a country (for example: country specific crop varieties).
Some important concepts are missing from AGROVOC all together. E.g. all instances of viral diseases (the entry exists in AGROVOC but has no narrower terms).
Inaccurate Arabic term e.g. the term cultivation is translated to فلاحة الأرض which narrows down the actual meaning of the cultivation term. A more accurate translation for this term could have been زراعية مماراسات
Some important relationships were found to be missing e.g.The terms ‘Viral diseases’, and ‘Bacterial diseases’ are not listed as narrower terms of ‘Plant Diseases’
Inaccurate relationship The concept نباتات ضارة (Noxious plants) is not listed as a NT of نباتات (Plants), but as a related term to it.
The results of experiments carried out to evaluate this work, show that it can be used to annotate document segments with a high degree of accuracy.
The problems encountered that led to deficiency in the recall were analyzed and currently we are trying to enhance the results accuracy. Some of these problems are due to Ontology and others are due to processing Arabic text.
We plan to investigate the use of the generated annotated segments to build classifiers in order to assign labels to segments that have no headings.
We explore ontology extraction from information rich documents so as to be able to apply our approach when an initial ontology does not exist.