The impact of standardized-terminologies and domain-ontologies in multilingual information processing Maruf Hasan, D.Eng. Senior Researcher Thai Computational Linguistics Laboratory, Thailand National Institute of Information and Communication Technology, Japan
So far, Information Retrieval (IR) applications including Search Engines, such as Google, have been largely successful but Machine Translation (MT) systems are not so successful.
Failures in modeling linguistic and extra-linguistic phenomena, context and concepts, etc.
Human tolerance in finding information and in translation quality varies
Human tolerance: [ (low) Written Audio Video (high)]
Case-Study: Telstra Voice-operated Directory Service – a failure from user’s perspective but a successful investment from Telstra’s point of view
Many queries (70%) are repeating and the system can handle them quickly (success from Telstra’s perspective). But when a user enquires about rare entities, the system fails (failure from user’s perspective).
Why : Querying with native language is comfortable, but every now and then, the most valuable information related to our search is probably available in another language
How : Translating the queries or the document-collection (using a simplified MT model) to find information in other languages
Economic Factor : Finding relevant information at a low cost ( using noisy translation ) is possible. And, after receiving a list of documents ( and selecting the relevant ones - as we often do with Google ), we can take the ( costly ) decision of whether or not to translate the information.
That is, even if someone’s foreign language level is not so competitive, we can still make sense of information from other cues (tables, graphs, etc.) and take the right decision.
An “ academic ontology ” about people, project, organisations, project-reports, etc. within an organization (precise knowledge: ontologies are populated semi-automatically, sometimes from databases)
A set of sophisticated “ NLP Tools ” for Tokenizing, Parsing, Text Classifications, etc. (non-precise knowledge: Extracted from text automatically)
A group of users/experts who are inspired to make things better (Tacit Knowledge) by giving feedback.
A Spreading Activation based indexing scheme is used to capture and propagate changes in a bootstrapped fashion
c.f., Hasan, M.M. (2004). Spreading Activation Framework for Ontology-enhanced Effective Information Access within Organisations, In van Elst, L. et al. eds.: "Agent-Mediated Knowledge Management". Springer’s Lecture Notes in Computer Science, Vol. 2926. pp. 288-296. Also published in the proceedings of AAAI Spring Symposium, AMKM-2003, USA.