Cross language information retrieval (clir)slide

777 views
604 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
777
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cross language information retrieval (clir)slide

  1. 1. Cross Language Information Retrieval (CLIR) INFORMATION SEARCHING AND RETRIEVAL (MLS 712) PREPARED FOR: ASSOC. PROF. HAJAH FUZIAH MOHD NADZAR PREPARED BY: ASYURA BINTI AMINORDIN (2012482362) MOHD IQBAL AL-FARABI B YAHYA (2012253658) DATE: DECEMBER 17, 2012
  2. 2. Introduction Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. For example, a user may pose their query in English but retrieve relevant documents written in French. http://en.wikipedia.org/wiki/Cross-language_information_retrieval
  3. 3. CLIR Purpose Researchers in Cross-Language Information Retrieval (CLIR) seek to support the process of finding documents written in one natural language with automated systems that can accept queries expressed in other languages.
  4. 4. English-Chinese Information Retrieval System (ECIRS) Web-based English-Chinese Information Retrieval System, ECIRS. ECIRS provides a cross-language platform for helping people to retrieve Chinese information without inputting a Chinese query. The web-based client-server architecture allows more users to access ECIRS through the worldwide Internet.
  5. 5. Conts… ECIRS consists of a client side and a server side. The client side is a web-based user interface. The server side includes bilingual dictionaries, contentbased document index files, a Chinese search engine and Chinese document collections.
  6. 6. Conts… Client side Server side Allows a user to input a query in English and send the query to the server side then the result contains an entry list of relevant documents in Chinese An English-Chinese dictionary and a ChineseEnglish dictionary, are used to translate the user's query from English into Chinese key word in ECIRS.
  7. 7. English - Chinese Information retrieval Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  8. 8. English - Chinese Information retrieval Side bar from the System where user can choose any of the button provided EX: On-line English Chinese Dictionary allow user to translate English word into Chinese word
  9. 9. English - Chinese Information retrieval Keyword : computer From the screenshot above we insert any keyword which we want to search Example: Computer Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  10. 10. English - Chinese Information retrieval Translation from English into Chinese Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  11. 11. English Chinese Information retrieval On-Line Chinese Information Retrieval System. The database where all document or information that relate to the information need which is “Computer”
  12. 12. English Chinese Information retrieval The List of document which relate to the computer. There was 294 result Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  13. 13. English Chinese Information retrieval Screen shot of English Chinese Information retrieval System Layout: http://www.cs.nmsu.edu/~sliu/main_frame.html
  14. 14. Big 5 - GB Big 5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters GB (Guojia Biaozhun 国家标准 ) is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters
  15. 15. Cross Language Information Retrieval Layout of the website where people use to book hotel and flight to travel.
  16. 16. Conts… Users can choose any language. Example: Japanese
  17. 17. Conts… Change into Japanese wording. As we can see the language in the layout change into Japanese wording.
  18. 18. Conts… By using Google translate it allow users to identified the meaning of the Japanese word. EXAMPLE: MALAY-to-JAPANESE
  19. 19. Conts… Insert the translation word from the Google translate in search engine of www.easytobook.com
  20. 20. Conts… Click any result A list of result where 131 hotels is available where we can see the wording show is still in Japanese.
  21. 21. Conts… The description of the hotel in Kuala Lumpur is written in Japanese.
  22. 22. CLIR WEBSITE EXAMPLE http://www.cs.nmsu.edu/~sliu/main_frame.html http://www.easytobook.com/
  23. 23. CINDOR (Conceptual Interlingua Document Retrieval) Cross-language text retrieval system capable of accepting a user's query stated in their native language and then seamlessly searching, retrieving, relevance ranking and displaying documents written in a variety of foreign languages CINDOR allows users of the system to state queries in any of the supported languages (currently English, French, Spanish, and Japanese) and search and retrieve documents from any of the supported languages. Adopted ‘Conceptual Interlingua’: unique approach to cross-language information management based on a language-independent conceptual representation
  24. 24. CINDOR ‘Conceptual’ resource of our conceptual interlingua Concept of “elasticity: the tendency of a body to return to its original shape after it has been stretched or compressed”, which has the label 131186, is instantiated in English and French   131186 spring, give, springiness 131186 élasticité, flexibilité, moëlleux
  25. 25. The Eurovision St Andrews Photographic Collection Site presents the collection in a variety of ways: full text search; or browsing a list of 999 pre-defined index terms organised alphabetically and hierarchically via a categories page SAC consists of 28,133 thumbnail images (around 120x76 pixels), larger versions of these images (around 368x234 pixels), and associated captions, giving a total of 84,399 files in the main body of the collection.
  26. 26. Eurovision Photograph metadata:         (1) a unique record number, (2) a short title, (3) a full title, (4) a textual description of the image content, (5) the date when the photograph was taken (most frequently with the day, month and year), (6) the originator, i.e. the name of an individual or company to which the photograph is attributed, (7) the location of the photograph (e.g. the county and the country), and (8) a line for notes to offer additional information about the photograph
  27. 27. Eurovision St Andrews collection has been used for bilingual ad- hoc retrieval where queries typical to this kind of historic collection have been generated in English and translated into languages including a range of Indo-European, Asian and Romance languages Challenges include:   Captions which are short in length increasing the likelihood of vocabulary mismatch, captions with text not directly associated with the visual content of an image (e.g. expressing something in the background), The use of colloquial and domain-specific language in the caption (i.e. British English).
  28. 28. The web interface to the St Andrews collection
  29. 29. The web interface to the St Andrews collection
  30. 30. CLIR University of Indonesia Query expansion techniques: pseudo relevance feedback    Assumption that the top few documents initially retrieved are indeed relevant to the query, and so they must contain other terms that are also relevant to the query To choose the relevant terms from the top ranked documents, we used the tf*idf term weighting formula. We added a certain number of noun terms that have the highest weight scores.
  31. 31. Interface and program demo
  32. 32. Interface and program demo
  33. 33. INFOMAP  Chinese question classification is the process that analyzes a question and labels it based on its question type and expected answer type  Adopt INFOMAP inference engine to support the knowledge-based approach for Chinese questions, which can be formulated as templates and use SVM (Support Vector Machines) as the machine learning approach for large collections of labeled Chinese questions.  INFOMAP is a knowledge representation framework that extracts important concepts from a natural language text  Feature of INFOMAP is its capability to represent and match complicated template structures, such as hierarchical matching, regular expressions, semantic template matching, frame (non-linear relations) matching, and graph matching.  Using INFOMAP, we can identify the question category from a Chinese question.
  34. 34. Example Question  (In which city were the Olympics held in 2004?) INFOMAP can be formulated as a rule or template (four elements (denoted as "HAS-PART") in this rule)   "[5 Time]:[3 Organization]:[7 Q_Location]: ([9 LocationRelatedEvent])“ 2004
  35. 35. Searching Demo
  36. 36. Searching demo
  37. 37. Searching demo
  38. 38. Thank You

×