Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multilingual Semantic AnnotationEngine for AgriculturalDocumentsBenjamin Chu Min XianArun Anand SadanandanFadzly ZahariDic...
Outline   Introduction   Related Work   System Description: Text Annotation Engine   Challenges   Conclusion         ...
Introduction               3
Related Work• Semantic Annotation techniques are  typically categorized into pattern-based  and machine learning-based• Mo...
Text Annotation Engine (T-ANNE1)• Semantic tagging system    – Semantic web of tags• Knowledge base approach• Scalable sys...
Text Annotation Engine (T-ANNE)Multilingual Semantic Annotation System Overview
Text Annotation Engine (T-ANNE)                 Semantic                Annotation                                       A...
Text Annotation Engine (T-ANNE)Example (Japanese)                             Semantic                            Annotati...
Text Annotation Engine (T-ANNE)• Knowledge-based approach  • The number of languages and domains it can    handle is only ...
Text Annotation Engine (T-ANNE)• Multilingual capability  • Automatically determines the language of the text  • AGROVOC –...
Challenges1. Ambiguity2. Morphological Variations3. Detail / Granularity Level                                11
Challenges1. Ambiguity                                A song or the Himalayan region? “They performed Kashmir, written by ...
Challenges2. Morphological VariationsVariation of entities representing the same concept using:    Plurals    Acronyms /...
Challenges3. Detail / Granularity Level Some annotation system will issue more generic tags while  others issue more spec...
Conclusions Annotation engine uses knowledge based approach  that performs concept entity recognition Application domain...
16
Upcoming SlideShare
Loading in …5
×

Multilingual Semantic Annotation Engine for Agricultural Documents

Presentation held by Benjamin Chu Min Xian, Arun Anand Sadanandan, Fadzly Zahari, Dickson Lukose at the Agricultural Ontology Service (AOS) Workshop 2012 in Kutching, Sarawak, Malaysia from September 3 - 4, 2012

  • Be the first to comment

  • Be the first to like this

Multilingual Semantic Annotation Engine for Agricultural Documents

  1. 1. Multilingual Semantic AnnotationEngine for AgriculturalDocumentsBenjamin Chu Min XianArun Anand SadanandanFadzly ZahariDickson Lukose 04.09.2012 International Symposium on Agricultural Ontology Service (AOS2012)
  2. 2. Outline Introduction Related Work System Description: Text Annotation Engine Challenges Conclusion 2
  3. 3. Introduction 3
  4. 4. Related Work• Semantic Annotation techniques are typically categorized into pattern-based and machine learning-based• Most of the annotation tools can only deal with a single language• Not easily customized to work for different domains 4
  5. 5. Text Annotation Engine (T-ANNE1)• Semantic tagging system – Semantic web of tags• Knowledge base approach• Scalable system – Handles large sets of documents – Web services• Distributed approach – Document Splitter• Multilingual tagging – Language identifier 1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition (2012). (Patent Pending) 5
  6. 6. Text Annotation Engine (T-ANNE)Multilingual Semantic Annotation System Overview
  7. 7. Text Annotation Engine (T-ANNE) Semantic Annotation AGROVOC Engine (T-ANNE) Documents Knowledge Base Semantic Annotations TAGS Knowledge Base
  8. 8. Text Annotation Engine (T-ANNE)Example (Japanese) Semantic Annotation Engine AGROVOC (T-ANNE) Knowledge Base TAGS Knowledge Base
  9. 9. Text Annotation Engine (T-ANNE)• Knowledge-based approach • The number of languages and domains it can handle is only limited by the knowledge base it uses • Easily customized • Utilizes AGROVOC as the knowledge base for recognition and annotation of agriculture related documents 9
  10. 10. Text Annotation Engine (T-ANNE)• Multilingual capability • Automatically determines the language of the text • AGROVOC – multilingual thesaurus more than 40,000 concepts in up to 22 languages 10
  11. 11. Challenges1. Ambiguity2. Morphological Variations3. Detail / Granularity Level 11
  12. 12. Challenges1. Ambiguity A song or the Himalayan region? “They performed Kashmir, written by Page and Plant. Page played unusual chords on his Gibson”. Guitar brand or actor “Mel Gibson”? Guitarist “Jimmy Page” or the Google founder “Larry Page”? 12
  13. 13. Challenges2. Morphological VariationsVariation of entities representing the same concept using:  Plurals  Acronyms / Abbreviations  Different Spellings  Compound Words  Language 13
  14. 14. Challenges3. Detail / Granularity Level Some annotation system will issue more generic tags while others issue more specific tags. For example, a general tag as ‘Cereals’ in contrast to a specific tag as ‘Waxy maize’. It really depends what would be the actual need of the results, whether the system should return coarse-grained or fine-grained annotation tags. It is important to choose the right granularity (detail) level. 14
  15. 15. Conclusions Annotation engine uses knowledge based approach that performs concept entity recognition Application domains and the number of languages it can handle relies on the knowledge base used for the recognition purpose. Future work - Address the challenges (Entity Resolution, Disambiguation) 15
  16. 16. 16

×