Your SlideShare is downloading. ×
0
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
IR tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

IR tutorial

157

Published on

IR tutorial

IR tutorial

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
157
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Digital libraries: video recordings, ppt slides, presentations, audio recordings, …The electronic content may be stored locally, or accessed remotely via computer networksEnterprise search is how your organization helps people seek the information they need from anywhere, in any format, from anywhere inside their company – in databases, document management systems, on paper, wherever. Just because there are powerful search tools available, does not mean that you should not organize your content. Desktop search all pc + internet browsing + mails
  • Result : (Toronto, 18) (Whitby, 27) (New York, 32) (Rome, 37)(Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38)(Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30)(Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38)
  • Transcript

    • 1. Information Retrieval Systems By: Hussein Hazimeh Lebanese University.
    • 2. Main points         Introduction Text operations and Indexing Performance evaluation Search engines as IR tools Metasearch engines IR Applications Some current researches in IRS Current conferences in information retrieval
    • 3. Introduction  Information Retrieval (IR) is the discipline that deals with retrieval of unstructured data, especially textual documents, in response to a query . User Interface User need Text Operations Indexing Inverted file Documents Similarity Computation (Searching) Retrieved docs Ranking Ranked docs Index
    • 4. Text operation and Indexing  Text operations: reduce the complexity of the document representation Q=List of the European countries  List , Europe , country Indexing: A simple alternative is to search the whole text sequentially Vocabular y beautiful flowers garden house 70 45, 58 18, 29 6 Occurrences
    • 5. Retrieval Performance Evaluation Recall=|Ra|/|R| Relevant Docs In Answer Set |Ra| Precision=|Ra|/|A| collection Relevant Docs |R| Answer Set |A|
    • 6. Popular search engines  Google Yahoo Bing …  Google search engine       Google search is based on priority Priority rank used “PageRank” algorithm Search Google can be using Boolean operators such as : exclusion ( -aa ) , alternatives ( aa OR bb)
    • 7. PageRank algorithm  PageRank is an algorithm used by Google search engine to rank websites in their search engine results. PR(B) = PR(E) + PR(F) + PR(D) + P(C)
    • 8. Googlebot : Google’s Web Crawler  Googlebot is Google’s web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer.  Googlebot finds pages in two ways:   Through an add URL form, www.google.com/addurl.html Finding links by crawling the web.
    • 9. How Google process a query
    • 10. Facebook as intelligent IR tool (Graph search)  Google vs. Facebook
    • 11. Facebook as intelligent IR tool (continued..)  Google vs. Facebook
    • 12. Metasearch engines  A meta search engine is a search tool that send user requests to several other search engines and/or databases and aggregate results into a single list or displays them according to their source.  Metasearch engines enable users to enter search criteria once and access several search engines simultaneously.
    • 13. Metasearch engine
    • 14. IR Applications Mobile IR Digital Libraries IR Application s Enterpris e Search Desktop Search (Puggle)
    • 15. Some current research topics in IRS  Visual Indexing   Indexing of (video, images, audio). Visual content extraction  Machine learning in information retrieval  Web information retrieval (including blogs)  Mobile computing related information retrieval issues  Performance measures  Query languages and optimization
    • 16. What is MapReduce ?  MapReduce is a programming model for processing large data sets  The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs)  The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples.
    • 17. Motivations of MapReduce  Data processing > 1 TB  Massively parallel  Easy to use
    • 18. Programming Model  Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(v3)  Ex: 5 files    Toronto, 20 Whitby, 25 New York, 22 Rome, 32 Toronto, 4 Rome, 33 New York, 18 File 1
    • 19. Programming Model (continued..)  we want to find the maximum tem-perature for each city across all of the data files  Break this into 5 Map tasks  Each mapper work on 1 file and return the Max tem in each city  All five of these output streams would be fed into the reduce tasks, which combine the input results and output a single value for each city, producing a final result.
    • 20. Programming Model(continued..)  Map(output) : (Toronto, 18) (Whitby, 27) (New York, 32) (Rome, 37)(Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38)(Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30)  Reduce(output):(Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38)
    • 21. MapReduce uses  MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, and machine learning  Moreover, the MapReduce model has been adapted to several computing environments like multi-core systems, desktop grids, dynamic cloud environments, and mobile environments.  At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses.
    • 22. Current conferences in information retrieval  3rd Spanish Conference on Information Retrieval    The European Conference on Information Retrieval    2014 , June 20 Spain 2014, April 17 Netherland 7th International Workshop on Information Filtering and Retrieval   2013, Dec 6 Italy
    • 23. Search… groph theories

    ×