MetaSearch vs Harvesting and Indexing

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

  • + blenkle Martin Blenkle 1 month ago
    What a fine & comprehensive compendium - very nice work - THX
Post a comment
Embed Video
Edit your comment Cancel

5 Favorites

MetaSearch vs Harvesting and Indexing - Presentation Transcript

  1. MetaSearch vs Harvesting and Indexing Lukas Koster Library of the University of Amsterdam -- http://commonplace.net 2009 http://www.flickr.com/photos/donpezzano/3044975399/
  2. So many databases to search
  3. MetaSearch – Federated Search Z39.50 SRU Proprietary Search Translate search syntax MARC21 MARCXML DC Conversion Merging Deduplication Ranking (First 30 per DB) Results Database Connectors MetaSearch tool Databases Searching and Data fetching: One integrated interdependent on-the-fly procedure Search Engine
  4. Technical bottlenecks Z39.50 SRU Proprietary Search Translate search syntax MARC21 MARCXML DC Conversion Merging Deduplication Ranking (First 30 per DB) Results Database Connectors MetaSearch tool Databases Connection Access Authorisation Search Engine
  5. Technical bottlenecks
    • Changes in
      • Remote database server IP address
      • Remote database server hostname
      • Remote database server configuration
      • Remote database authentication
      • Firewall
      • Database system
      • Network
  6. MetaSearch limitations
    • Differences in searches, indexes
      • Author
      • Subject
      • Multiple languages
    • Speed (slowness)
    • Limited number of searchable databases
    • Not all results in first set
    • Relevance
  7. Author searches
    • Variations in author name storage formats
      • Henry James
      • James, Henry
      • James, H.
      • H.James
      • Which Henry James?
      • Or is it: Henry, J./James Henry ?
    • Variations in supported search formats
      • Only one?
      • All of the above?
  8. Variations in author names
  9. Subject searches
    • Different qualification, keyword schemes per database
      • LoC subject Headings
      • Dutch Basic Classification
      • Local subject schemes
    • Different use of subjects per database
      • Cooking
      • Cookery
      • Food
    • Different use of subjects within one database
    • Errors
  10. Multilingual searches
    • All words searches
    • Subject searches
      • English “cooking”
      • Japanese “???”
    • Title searches
      • Translations (We need FRBR!)
    • Author searches (historical names)
      • See: Erasmus
  11. All processing on the fly
    • Issues, dependent on each other:
      • Speed (slowness)
      • Limited number of searchable databases
      • Not all results in first set
      • Relevance
  12. Speed (slowness)
    • Dependent on
      • Search term transformation
      • Response time of external databases
      • Speed of internet connection
      • Conversion of results to presentation format
      • Merging of results
      • Deduplication of results
      • Relevance ranking
  13. Limited number of databases
    • Searching too many databases takes too long
    • Local processing time influenced by
      • Merging ( takes time )
      • Deduplication ( takes time )
      • Ranking ( takes time )
  14. Not all results in first set
    • Merging, deduplication, ranking of all results takes too long
    • Only first 30 or so of each database are processed initially
    • Get more: next 30 per database are fetched and processed
  15. Relevance
    • Dependent on default sort order (relevance?, date?) of each external database
    • Dependent on default ranking mechanism of each database
    • Local ranking initially performed on first batches of 30 records per database
    • After additional fetching records, ranking is done again:
      • Initial top results may go down
  16. Solution?
    • Don’t rank
    • Don’t deduplicate
    • Don’t merge (in advance)
    • If you don’t merge, there is no point in deduplicating or ranking!!
    • “ Does not make much sense anyway”
    • “ Does not work always anyway”
    • “ So, you have separate lists that you can merge later on”
  17. Search with MetaSearch
  18. Translate search syntax on the fly
  19. Fetching results
  20. Conversion of results on the fly
  21. Conversion of results on the fly
  22. Conversion of results on the fly
  23. Results with MetaSearch
  24. Results with MetaSearch
  25. Harvesting and indexing Search Normalising Indexing Ranking Results Central index H&I tool Databases Harvesting Searching and Data fetching: Two completely separate procedures Search Engine
  26. Advantages of H&I
    • Speed
    • No maximum number of searchable databases
    • All results in first set
    • No differences in searches, indexes
    • Relevance
    • Fewer technical bottlenecks
      • Central index always available in case of connection problem
  27. H&I: Aquabrowser
  28. H&I: Primo
  29. MetaSearch = “Just in time”
    • Bookshop – Central Book Deposit
    • Always order on request
    • Risk of logistics problems
    http://www.flickr.com/photos/stijnnieuwendijk/125159282/
  30. H&I = “just in case”
    • Bookshop with large stock
    • Customers always find something
    • Maybe not the most recent stuff
    http://www.flickr.com/photos/brewbooks/2131521680/
  31. Images
    • http://www.flickr.com/photos/donpezzano/3044975399/
    • http://www.flickr.com/photos/halighalie/663414371/
    • http://www.flickr.com/photos/notionscapital/2280408255/
    • http://www.flickr.com/photos/giveawayboy/2691195763/
    • http://www.flickr.com/photos/stijnnieuwendijk/125159282/
    • http://www.flickr.com/photos/brewbooks/2131521680/
    • http://www.flickr.com/photos/joshb/444529511/
    • http://www.flickr.com/photos/eaglelover2006/3168378578/
    • http://www.flickr.com/photos/robbie73/3387189144/
    • http://www.flickr.com/photos/saralparker/2602254206/
    • http://www.flickr.com/photos/manchesterlibrary/2034771121/
    • http://www.flickr.com/photos/bk/158637798/
    • http://www.flickr.com/photos/saamiam/3802869384/
    • http://www.flickr.com/photos/roboppy/37024023/
    • http://www.flickr.com/photos/stijnnieuwendijk/125159282/
    • http://www.flickr.com/photos/brewbooks/2131521680/

+ Lukas KosterLukas Koster, 1 month ago

custom

390 views, 5 favs, 3 embeds more stats

A comparison between metasearch/federated search an more

More info about this document

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Go to text version

  • Total Views 390
    • 301 on SlideShare
    • 89 from embeds
  • Comments 1
  • Favorites 5
  • Downloads 14
Most viewed embeds
  • 82 views on http://commonplace.net
  • 6 views on http://www.bibnet.lu
  • 1 views on http://commonplace.net.

more

All embeds
  • 82 views on http://commonplace.net
  • 6 views on http://www.bibnet.lu
  • 1 views on http://commonplace.net.

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories