Information retrieval s


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Information retrieval s

  1. 1. Presented By Sadhana Patra MLIS, 3rd Semester
  2. 2.      Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs. User queries are matched against the database information. Depending on the application the data objects may be, for example, text documents, images, audio, mind maps or videos. Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.
  3. 3.  Every online database, every search engine, everything that is searched online is based in some way or another on principles developed in IR ◦ IR is at the heart of searching used in systems such as DIALOG, LexisNexis & others  Understanding the basics of IR is a prerequisite for understanding how searching of online systems works.
  4. 4. “Information retrieval embraces the intellectual aspects of the description of information and its specification for search, and also whatever systems, techniques, or machines are employed to carry out the operation.” Calvin Mooers, 1951 Objective: Provide the users with effective access to & interaction with information resources.
  5. 5. 1. 2. 3. Document subsystem a) Acquisition b) Representation c) File organization User sub system a) Problem b) Representation c) Query Searching /Retrieval subsystem a) Matching b) Retrieved objects
  6. 6. Acquisition (Document subsystem)   Selection of documents & other objects from various web resources Mostly text based documents ◦ full texts, titles, abstracts ... ◦ but also other objects:  data, statistics, images, maps, trade marks, sounds ...  The data are collected by web crawler and stored in data base.
  7. 7. Representation of documents, objects(document subsystem)  Indexing – many ways : ◦ free text terms (even in full texts) ◦ controlled vocabulary - thesaurus ◦ manual & automatic techniques   Abstracting; summarizing Bibliographic description: ◦ author, title, sources, date… ◦ metadata   Classifying, clustering Organizing in fields & limits ◦ Basic Index, Additional Index. Limits
  8. 8. File organization (Document subsystem)  Sequential ◦ record (document) by record  Inverted ◦ term by term; list of records under each term     Combination indexes inverted, documents sequential When citation retrieved only, need for document files Large file approaches ◦ for efficient retrieval by computers
  9. 9. Problem (user subsystem)  Related to user‟s task, situation ◦ vary in specificity, clarity  Produces information need ◦ ultimate criterion for effectiveness of retrieval  how well was the need met?  Information need for the same problem may change, evolve, shift during the IR process adjustment in searching ◦ often more than one search for same problem over time  you will experience this in your term project
  10. 10. Representation ( user subsystem)      Converting a concept to query. What we search for. These are stemmed and corrected using dictionary. Focus toward a good result Subject to feedback changes
  11. 11. Query - search statement (user & system)  Translation into systems requirements & limits ◦ start of human-computer interaction  query is the thing that goes into the computer   Selection of files, resources Search strategy - selection of: ◦ ◦ ◦ ◦  search terms & logic possible fields, delimiters controlled & uncontrolled vocabulary variations in effectiveness tactics Reiterations from feedback ◦ several feedback types: relevance feedback, magnitude feedback.. ◦ query expansion & modification
  12. 12.     Question is what user asks and what you may then have elaborated Query is what is asked of computer to match – what is put in Question is transformed into query Question: ◦ I am interested in major historical developments in the area of information retrieval?  Query ◦ history information retrieval (in Google)
  13. 13. Matching - searching (Searching subsystem)  Process of matching, comparing ◦ search: what documents in the file match the query as stated?  Various search algorithms: ◦ exact match - Boolean  still available in most, if not all systems ◦ best match - ranking by relevance  increasingly used e.g. on the web ◦ hybrids incorporating both  e.g. Target, Rank in DIALOG  Each has strengths, weaknesses ◦ no „perfect‟ method exists  and probably never will
  14. 14. Retrieved documents -from system to user (IR Subsystem)  Various order of output: ◦ Last In First Out (LIFO); sorted ◦ ranked by relevance ◦ ranked by other characteristics     Various forms of output When citations only: possible links to document delivery Base for relevance, utility evaluation by users Relevance feedback What a user (or you) sees, gets, judges – can be specified
  15. 15.    Described three parts: Document subsystem, User sub system, Searching /Retrieval subsystem There are many search engine like Google, Bing and Yahoo etc., but they never disclose their methods of Information Retrieval. Lot more to know about Information Retrieval.
  16. 16.     ieval