Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

K Search Al Khawarizmy Language Software


Published on


Published in: Technology
  • Be the first to comment

K Search Al Khawarizmy Language Software

  1. 1. Monday 12/05/2008
  2. 2. <ul><li>Arabic NLP Research </li></ul><ul><li>Arabic Applications based on NLP components </li></ul><ul><li>Stress on software quality (targeting ‘zero defect’ S/W) </li></ul><ul><li>Cooperate with the community; e.g. research students at universities (forming partnerships) </li></ul><ul><li>Promote widespread use of affordable applications that take the special features of the Arabic language into account </li></ul><ul><li>Effectively serve the Arab region by catering for its users’ needs </li></ul>Monday 12/05/2008
  3. 3. <ul><li>1 st Nov. 2007 – 31 st Dec. 2007: </li></ul><ul><li>3 Developers + 1 Product Manager => Small (borrowed) room. </li></ul><ul><li>1 st Jan. 2008 – 31 st Jan. 2008: </li></ul><ul><li>1 Linguist => Home Office. </li></ul><ul><li>1 st Feb. 2008 – 31 st Mar. 2008: </li></ul><ul><li>1 Linguist => Smart Village Incubation. </li></ul><ul><li>1 st Apr. 2008 – Present: </li></ul><ul><li>3 Developers + 1 Linguist + 1 Business Development Manager + 1 Office Manager => Smart Village Incubation. </li></ul>Monday 12/05/2008
  4. 4. <ul><li>The number of Arab Internet Users is growing </li></ul><ul><ul><li>22 million users in 2006 </li></ul></ul><ul><ul><li>43 million expected in 2008 </li></ul></ul><ul><li>The volume of Arabic e-content is increasing (on the web and in companies’ intranets): </li></ul><ul><li>Around 100 million Arabic web pages </li></ul><ul><li>About 5 million Arabic web sites </li></ul>Monday 12/05/2008
  5. 5. <ul><li>Arabic is a highly inflected language </li></ul><ul><li>Arabic morphology has a set of unique features </li></ul><ul><li>Proper Arabic e-content processing is deficient </li></ul><ul><li>Consequently, Arab users are unable to take full advantage of Arabic e-content, compared with other languages </li></ul><ul><li>As an example, considering searching through Arabic content … </li></ul>Monday 12/05/2008
  6. 6. Using : - Search for “ الحائزون على جوائز نوبل ” produces about 238 results Monday 12/05/2008
  7. 7. Using : - Search for “ الحائزون على جائزة نوبل ” produces about 684 results Monday 12/05/2008
  8. 8. Using : - Search for “ حاز على جائزة نوبل ” produces about 16,700 results Monday 12/05/2008
  9. 9. <ul><li>When used for Arabic search, traditional search engines produce </li></ul><ul><ul><li>Incomprehensive results, i.e. not all inflected forms are found => a lot of useful information is missing </li></ul></ul><ul><ul><li>Redundant results, i.e. some results are inaccurate => they ‘bear no relation’ in form or in meaning to the search word(s) </li></ul></ul>Monday 12/05/2008
  10. 10. An Arabic Search Model that: <ul><li>Provides morphological search  Comprehensive </li></ul><ul><li>Differentiates between meanings of Arabic words  Improves Accuracy </li></ul><ul><li>In other words… </li></ul><ul><li>Let us see the same example, using KSearch … </li></ul>Monday 12/05/2008
  11. 11. Monday 12/05/2008
  12. 12. <ul><li>Arabic Morphological Search (to produce comprehensive search results). </li></ul><ul><li>Differentiation between Word Meanings (to increase accuracy of search results, i.e. reduce redundancy). </li></ul><ul><li>Search using Logical Operators ( و – أو - ليس ). </li></ul><ul><li>Adjacency (Proximity) Search. </li></ul><ul><li>Search using Wildcards (for proper nouns and Latin text) . </li></ul><ul><li>Search words are highlighted in the results pages. </li></ul><ul><li>Over 200 document formats are supported, including UNICODE encoded documents. </li></ul><ul><li>Arabic comprehensive dictionary of contemporary Arabic (approximately 78,000 entries). </li></ul><ul><li>Fast Indexing Engine (25,000 - 30,000 words/sec on a PC with AMD Athlon 3800+ CPU, IDE HDD, 1GB RAM). </li></ul><ul><li>Uses 64 bit Technology => Unlimited Index Size. </li></ul><ul><li>Comprehensive Index Management: Capability of deleting, updating and merging indexes. </li></ul>Monday 12/05/2008
  13. 13. Monday 12/05/2008 Arabic ِ Morphological Analyzer Comprehensive + Contemporary Arabic Lexicon Arabic Data Source (Database, Document, etc.) Fast Indexing Engine Meta Data Repository Search Engine Search Results Arabic Lexical Semantic Analyzer
  14. 14. <ul><li>Employs KMorph , a fast Arabic morphological analyzer </li></ul><ul><li>Uses a comprehensive Arabic lexicon of contemporary words </li></ul><ul><li>KSpell Engine: Provides APIs for spelling verification and correction, e.g. may be integrated with content management systems to produce correctly spelled Arabic web content </li></ul>Monday 12/05/2008