Zoveel informatie, zo weinig tijd
Upcoming SlideShare
Loading in...5
×
 

Zoveel informatie, zo weinig tijd

on

  • 2,484 views

Zoveel informatie, zo weinig tijd

Zoveel informatie, zo weinig tijd
Paul Nieuwenhuysen, VUB
Informatie aan Zee
10 september 2009
Kursaal Oostende
Zaal Delvaux

Statistics

Views

Total Views
2,484
Views on SlideShare
2,394
Embed Views
90

Actions

Likes
0
Downloads
12
Comments
1

6 Embeds 90

http://www.vvbad.be 81
http://preview.vvbad.be 3
http://www.slideshare.net 2
http://oud.vvbad.be 2
http://translate.googleusercontent.com 1
http://vvbad.wieni.be 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Zoveel informatie, zo weinig tijd Zoveel informatie, zo weinig tijd Presentation Transcript

  • 1 Zo veel informatie Zo weinig tijd Paul.Nieuwenhuysen@vub.ac.be Created to support a presentation at the bi-annual 2-day conference series “Informatie” organised by VVBAD, in Oostende, Belgium September 10-11, 2009 “Informatie aan zee”
  • 2 0. Introduction with problem statements contents 1. Methods to make = summary information retrieval = structure efficient in a world of scattered sources = overview 2. Applications of those methods of this presentation 3. Comparison of the methods 4. Conclusions
  • 3 These slides should be available from the WWW site http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ (note: BIBLIO and not biblio) and also from the WWW site of the organisers of the conference = VVBAD
  • 4 Information Retrieval in a World of Scattered Information Sources 0. Introduction and problem statements
  • 5 Introduction: scattering of sources • Users want to exploit information sources fast and effectively. • This is hindered by the fact that digital, electronic information sources that may contain relevant information are created and scattered, distributed on numerous computers all over the intranet of the user’s organization AND over the Internet and the WWW.
  • 6 Introduction: scattering of sources • In other words: integration / aggregation is still far from perfect.
  • 7 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 1. They must be used one after the other which requires many decisions and actions
  • 8 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 2. They offer different user interfaces in the retrieval phase, which is confusing
  • 9 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 3. They offer found information items in various data formats
  • 10 Introduction: scattering of sources difficulties • Using many information retrieval systems costs time: 4. They display found items in different ways on a computer screen
  • 11 Introduction: scattering of sources difficulties Small = BEAUTIFUL
  • 12 Introduction: scattering of sources difficulties
  • 13 Introduction: problem statements 1. Which methods have been developed and applied to cope with this reality?
  • 14 Introduction: problem statements 2. Which concrete applications are available and how can an end-user exploit systems created in this domain?
  • 15 Introduction: problem statements 3. How can information intermediaries evaluate and apply these methods to bring information more efficiently to end-users?
  • 16 Information Retrieval in a World of Scattered Information Sources 1. Methods to make information retrieval efficient in a world of scattered sources
  • 17 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 18 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 19 Both methods offer benefits to the users + Saves the users time executing queries to various servers or browsing through various systems. ☺
  • 20 Both methods offer benefits to the users + Offers a uniform / consistent display of results in the output phase. ☺
  • 21 Both methods offer benefits to the users + Some systems offer tools to refine display of the results; for instance + to deduplicate very similar items in the result set, + to sort the results, + to rank the results, + to visualize the results in a more graphical way, + to search within the result set, +… ☺
  • 22 Both methods bring difficulties / challenges / problems - In many cases there are differences among the merged sources in the formatting/structuring of their database records in fields. This hinders - searching limited to a field - displaying selected fields only (such as title) - sorting of the displayed records on the contents of a particular selected field (such as author or date)
  • 23 Both methods bring difficulties / challenges / problems - In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as »classifications »taxonomies »thesaurus systems »ontologies This hinders the exploitation of the added value of such metadata.
  • 24 Both methods bring difficulties / challenges / problems - How to deduplicate/dedupe/cluster very similar entries/results/items = near-duplicates, from various target sources? When is similar similar enough? Which entry/result/item to choose/select as the representative of a cluster of similar entries?
  • 25 Both methods bring difficulties / challenges / problems - When some special, non-standard, dedicated retrieval software is made available by a specific target source database, to offer special features to the user to exploit the database better than with a more classical standard retrieval interface, then this may be lost in the new retrieval system. Searches are reduced to the lowest common denominator. Examples: - clustering of results - deduplication of results…
  • 26 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 27 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) user user Data Service Providers Search Provider Client & request computer Metadata metadata + retrieval client database server PMH software http metadata http protocol protocol metadata Digital objects
  • 28 Merging into a searchable database offers benefits for the users + Applicable even in the absence of data communication to remote servers (whereas federated searching needs good, fast data communication.) Therefore this is the relatively ‘old’ method. ☺
  • 29 Merging into a searchable database brings difficulties / challenges - The contents of the aggregated database is less up to data than the original information sources. The importance of this aspect depends of course - on the particular application - on the time delay
  • 30 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 31 Federated searching: terminology / vocabulary / synonyms federated searching = meta-searching = metasearching = cross-database searching = multi-database searching = multi-threaded searching = one-stop searching = poly-searching = polysearching = broadcast searching = searching through a portal / gateway
  • 32 Federated searching through scattered databases: why? The perfect trip: The perfect trip: ☺ 1. A cheap and nice flight 1. A cheap and nice flight 2. A cheap and nice hotel 2. A cheap and nice hotel 3. A visit to a nice museum 3. A visit to a nice museum 4. Something nice to read (free via your library) 4. Something nice to read (free via your library)
  • Example 33 Federated searching: application: finding a suitable flight Example: • http://CheapTickets.com/ for the USA
  • Example 34 Federated searching: application: finding a hotel room in some city
  • Example 35 Federated searching: searching in a museum
  • Example 36 Federated searching: searching in a library
  • 37 Federated searching: integrating access Intranet Intranet Articles Articles WWW WWW search engines search engines Journals Journals Catalog Catalog Publishers Publishers database(s) database(s) of other libraries of other libraries Databases Databases (full-text or bibliographic) (full-text or bibliographic) Local library catalog Local library catalog database(s) database(s) Meta-searching system Meta-searching system
  • 38 Federated searching: benefits for the users + The system can help the user to select appropriate sources. ☺
  • 39 Federated searching: benefits for the users + The system can help in the process of authentication and authorization when this involves not only a simple recognition of IP-address of the user’s client computer, but when it involves user-id’s and passwords. ☺
  • 40 Federated searching: benefits for the users + The need to know which particular database is suitable for a particular search is reduced, because several ones can be searched in one action. ☺
  • 41 Federated searching: benefits for the users + The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database! ☺
  • 42 Federated searching: benefits for the users + Can make users search and exploit databases that they would never use otherwise, that is without federated search system! ☺
  • 43 Federated searching: benefits for the users + Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases! This is mainly beneficial in the case of interdisciplinary subjects/topics. ☺
  • 44 Federated searching: benefits for the users + Some systems offer tools to refine display of the results; for instance »to dedupe very similar items in the result set, »to sort the results, »to rank the results, »to search within the result set, »… ☺
  • 45 Federated searching: benefits for the users + Some systems offer interesting links from a retrieval result to various related sources or services (such as the full text or a document delivery service), using a link generator based on the OpenURL standard. ☺
  • 46 Federated searching: benefits for the users + Some systems check for each retrieved bibliographic description if the corresponding full text is immediately available online and indicate this immediately to the user, on the fly. ☺
  • 47 Federated searching: benefits for the users + Some systems further process the retrieved results and display them in an interesting way that is not offered by the searched original systems. For instance: » Clustering of results according to subject or age or availability of full text » Displaying the results in a graphical way ☺
  • 48 Federated searching: benefits for the users So far so good ! ☺
  • 49 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 50 Federated searching: difficulties / challenges / problems - How to provide some useful relevance ranking of search results/entries, even when the target databases can be quite different in type and quality, and even when no index is created in advance, just-in-case, well before the search action, like Google and other Internet search engines do.
  • 51 Federated searching: difficulties / challenges / problems - Powerful / sophisticated / refined forms of searching may not be applicable in a federated search. Example: limiting to a particular type of document, such as a therapy (in medicine). This may cause a LOSS of time, instead of winning time.
  • 52 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 53 Federated searching: difficulties / challenges / problems - Differences among target sources in the Internet application protocols that are applied normally, by default, for connection/communication and retrieval, such as »(telnet) HTTP »proprietary, non-standard protocols »Z39.50, ISO239.50, SRU, and related protocols that are developed for federated-searching!
  • 54 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 55 Federated searching: difficulties / challenges / problems - Various search engines may act in different ways! For instance: Is truncation of a word in a search query possible? Is limitation to a particular field possible? How can a federated search engine take these differences into account?
  • 56 Federated searching: difficulties / challenges / problems - A query with several words and without explicit Boolean operators can be interpreted in various ways by the various database retrieval systems. For instance, the retrieval software may apply the Boolean operator AND to combine all the query words, but it may also use OR. In the case that the federated search system does not take care of this well, then this may lead to lower recall and precision.
  • 57 Federated searching: difficulties / challenges / problems - When some special, non-standard, dedicated retrieval software is made available by a specific target source databases to offer special features to the user to exploit the database better than with a standard retrieval interface, then the source can probably not be exploited as well by the federated search system. Searches are reduced to the lowest common denominator.
  • 58 Federated searching: difficulties / challenges / problems - Differences in response time among the target sources. A slow response of a target source can hinder the final analysis and presentation of the results to the user.
  • 59 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 60 Federated searching: difficulties / challenges / problems - Some databases can NOT be included as a target database in a federated searching engine, because their owners/producers do not allow this. This is an important difficulty, because in this way interesting / valuable databases are perhaps not exploited by users who rely on federated searching.
  • 61 Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 62 Federated searching: difficulties / challenges / problems - Users may be less impressed by a federated searching system than by the simple, common, familiar, famous Internet / WWW search engines, as response time is in most cases less impressive, due to differences as follows: - The computer hardware used by the systems - Slower distributed searching through several computer systems, versus faster searching through a more centralised computer database of a priori compiled records
  • 63 Federated searching: difficulties / challenges / problems - The evaluation of the quality of each search result from a federated search action may be more difficult than when each database is searched separately, because the user may be less aware of the limitations, strengths, selection criteria and aims of the individual, separate databases that offer each result. For instance, peer-reviewed articles from reputable scientific journals may be mixed with more popular and more biased, unscientific texts from trade literature.
  • 64 Federated searching: conclusion Federated searching - is a continuous challenge for developers of the sophisticated software and for the implementers in libraries and information centers - offers benefits for those end-users who are not enthusiastic to work with separate target source databases - does not eliminate the need for access to individual databases
  • 65 Hybrid method: merging data + federated searching User User User User Search engine Federated search engine Aggregated database Search engine Search engine Database Database Database or web site or web site or web site Database Database or… or… or…
  • 66 Information Retrieval in a World of Scattered Information Sources 2. Applications of methods for efficient information retrieval
  • 67 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 68 Internet global subject directories: introduction • They are virtual libraries with open shelves, for browsing. • They are manually generated, man-made by many people. • They can be browsed following a tree structure or a more complicated variation.
  • Example 69 Internet global subject directories: Yahoo!: screenshot of home page
  • Example 70 Internet global subject directories: BUBL LINK • A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found at http://bubl.ac.uk/link/ [accessed 2008] • Accessible free of charge. • The categories are based on the well-known general Dewey classification system.
  • Example 71 Internet global subject directories: dmoz: screenshot of the starting page
  • Example 72 Internet global subject directories: Librarians' Internet Index: screenshot
  • Example 73 Internet global subject directories: IPL: screenshot
  • Example 74 Internet global subject directories: Intute: screenshot
  • 75 Internet indexes: scheme of the mechanism User searching for Internet based information Internet client hardware and software user interface to a search engine Internet information source Internet index search engine Internet crawler and indexing system database of Internet files, including an index
  • Example 76 Internet indexes: Google • http://www.google.com/ • Available since 2001 with most of its features. • The most popular search system since 2003.
  • Example 77 Internet indexes: Google Scholar • Google Scholar allows us to search for more scholarly information sources, including journal articles. • A beta (test) version has been available since November 2004. • The system is accessible starting from the home page of Google as one of the additional services, or more directly from http://scholar.google.com/
  • Example 78 Internet indexes: Google Scholar: screenshot
  • Example 79 Internet indexes: Bing • http://www.bing.com/ • Available in 2009 in beta = test version. • Replaces Microsoft Live as well as Yahoo Web Search ?
  • Example 80 Internet indexes: Scirus • The search interface: http://www.scirus.com/ • Since 2001. • Offers not only access to files in html format, but also to files in PDF. • Allows you to search for more or less “manually” selected »scientific WWW pages, plus »the contents of some scientific, bibliographic databases. • In the sense that Scirus is dedicated to scientific information, it is similar to Google Scholar.
  • Example 81 Internet indexes: Ask • Available from: http://www.ask.com/ • Offers a feature that is not offered by most other search systems: categorization = classification = refinement = clustering of search results, to help the user coping with the problem of ambiguity of meaning of the search query that was made
  • 82 Internet indexes cover only a part of the Internet: metaphore The “visible” part of Internet The “deep, hidden, invisible” part of Internet and the WWW, (that is not searchable using a global index like Google Web Search)
  • Example 83 Databases accessible over the Internet: example: OAISTER • http://oaister.umdl.umich.edu/ • “Our goal is to create a collection of freely available, previously difficult-to-access, academically-oriented digital resources that are easily searchable by anyone.”
  • Example 84 Databases accessible over the Internet: example: OAISTER • OAISTER makes searching possible in millions of digital documents that form part of institutional repositories all over the world. • OAISTER covers this kind of documents better than Google Web Search (according to independent academic investigations in 2006 and 2008).
  • Example 85 Databases accessible over the Internet: example: scientificcommons • http://www.scientificcommons.org/ • Since 2007 • Similar to OAISTER: Allows you to search the full texts in scientific open access repositories all over the world. ☺
  • Example 86 Databases accessible over the Internet: example: Medline • Medline/PubMed offers bibliographic descriptions of publications on medicine, free of charge. ☺
  • 87 Current awareness services focusing on WWW pages: Google Alerts • Available at http://www.google.com/ and then see the page with additional services or more directly from http://www.google.com/alerts/ • Since 2004. • Can discover relevant changed or new WWW pages for you in the future. • Is based on the popular Internet index Google. • Works with search queries given by you that are stored on their server computer.
  • 88 Internet with WWW and printed books • Since a few years, Internet with the WWW have become the primary information source for many people. • However: »A lot of information is still distributed only in the form of printed books »The content of old printed books can still be interesting. »The content of most printed books is (still) not available on the Internet.
  • 89 Public access book databases: introduction • Most general WWW search engines do NOT allow you to find out about the existence of books that may be interesting for you, at least not in a systematic and efficient way. • So, specific search tools to find books can be useful.
  • 90 Public access book databases provided by bookshops • To find currently available books, the bibliographic databases assembled by big bookshops are interesting. • Several offer a good coverage. • Many are accessible free of charge. • The added price information can be useful for the acquisition and accounting department of a library or if an individual user wants to buy a book. • Some provide a current awareness service, also free of charge. • Take into account delivery costs: postage + import tax
  • Examples 91 Book databases accessible free of charge: examples in U.S.A. • Amazon.com (US): http://www.amazon.com/ • This company offers also different, more local versions that offer books in other languages, such as http://www.amazon.co.uk/ http://www.amazon.fr/ • note: amazon, NOT amazone • Subject description is poor. • Take into account delivery costs: postage + import tax
  • Examples 92 Book databases accessible free of charge: examples in U.S.A. • Barnes and Noble (US): http://www.barnesandnoble.com/ or http://www.bn.com/
  • Examples 93 Book databases accessible free of charge: examples in U.S.A. • http://www.completebook.com/cbmsi/bookaction.do
  • Examples 94 Book databases accessible free of charge: examples in U.S.A. • http://www.overstock.com/
  • Examples 95 Book databases accessible free of charge: examples in U.S.A. • http://www.powells.com/ • Specialised in books only.
  • Examples 96 Book databases accessible free of charge: examples in Europe • Blackwell’s on the Internet (International, academic books): http://www.blackwell.co.uk/ • VLB for books in German http://www.buchhandel.de/ • For books in French http://www.chapitre.com • Boeknet - De Nederlandse Internet Boekhandel (Dutch) http://www.boeknet.nl/
  • 97 Search systems for books that are made available by dealers User Book dealer catalog database descriptions of books & real books for sale
  • 98 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 99 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 100 Search systems for books that are made available by dealers User Multi-dealer database = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 101 Search systems for books that are made available by dealers User Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 102 Search systems for books that are made available by dealers User Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 103 Free public access multi-dealer book databases: examples • http://www.abebooks.com/ [accessed 2008] • http://www.abebooks.fr/ offers a user interface in French • Covers > 10 000 bookshops. • The company has been acquired by Amazon in 2008.
  • 104 Free public access multi-dealer book databases: examples • http://www.alibris.com/ [accessed 2008]
  • 105 Free public access multi-dealer book databases: examples • Amazon Marketplace: http://www.amazon.com/ [accessed 2009] • In synergy with the online bookshop Amazon on 1 WWW site: Used books are displayed alongside Amazon’s new books. • “the world’s biggest online book bazaar” • Subject description is poor. • Take into account delivery costs: postage + tax
  • 106 Free public access multi-dealer book databases: examples
  • 107 Free public access multi-dealer book databases: examples • http://www.biblio.com/ or http://biblio.com/ [accessed 2008]
  • 108 Free public access multi-dealer book databases: examples • http://www.boekenverkoper.nl [accessed in 2007]
  • 109 Free public access multi-dealer book databases: examples • http://www.choosebooks.com/ [accessed 2008]
  • 110 Free public access multi-dealer book databases: examples • http://www.tomfolio.com/ [accessed 2008]
  • 111 Full-text databases of books: introduction • Some organisations have scanned the contents of thousands of books, to make them full-text searchable through the Internet.
  • 112 Full-text databases of books: Amazon • http://www.amazon.com/ and choose BOOKS • Since 2004 • Also incorporated in the search engine A9
  • 113 Full-text databases of books: Google Book Search • http://www.books.google • Since 2005
  • Example 114 Online Public Access Catalogues: union catalogues of libraries • Some systems offer access to the merged catalogues of several libraries, so-called ‘union catalogues’. • Example: Copac http://www.copac.ac.uk/ is accessible free of charge.
  • Examples 115 Online Public Access Catalogues: union catalogues: examples • European National Libraries, catalogues harvested: http://www.theeuropeanlibrary.org/portal/index.html
  • Examples 116 Online Public Access Catalogues: union catalogues: examples • Europeana: documents on European culture. http://www.europeana.eu/portal/ Metadata are harvested from co-operating organisations.
  • 117 Online access databases about journal articles: overview • Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers. • Many publishers offer searchable bibliographies, but only of their own publications. (for instance Elsevier, Emerald, Sage) • Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.
  • Example 118 Online access databases about journal articles: Ingenta • Available from: http://www.ingentaconnect.com/ • Ingenta allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts. • The organisation claims to be “The most comprehensive collection of academic and professional publications”
  • Example 119 Online access databases about journal articles: Infotrieve ArticleFinder • Available from: http://www.infotrieve.com/ • Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text. • Payment is required to receive the full text of a document.
  • Example 120 Online access databases about journal articles: Scirus • The search interface: http://www.scirus.com • This is a specialised Internet index that allows you to search for selected scientific information (only) on the WWW. • This includes the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier. • Offered free of charge by Elsevier. • An article can be downloaded in full-text format only when a fee has been paid to the publisher.
  • Example 121 Online access databases about journal articles: Google Scholar • Google Scholar allows us to search for more scholarly information sources, including journal articles. • A beta (= test) version has been available since November 2004. • The system is accessible starting from the home page of Google as one of the additional services besides the normal, classical WWW search.
  • Example 122 Online access databases about journal articles: DOAJ screenshot
  • Example 123 Online access databases about journal articles: Eric • http://ericir.syr.edu/Eric/ • Eric allows searching a bibliographic database of articles and other documents in the fields of information science and education. + Available in open access, free of charge - Payment is required to receive the full text of a document.
  • Example 124 Online access databases about journal articles: LISTA • http://www.libraryresearch.com/ • Bibliographic database; covers libraries and information management, with subjects such as librarianship, classification, cataloging, bibliometrics, online information retrieval, information management and more, from more than 600 periodicals plus books, research reports, and proceedings • Offered since 2005 • Delivered via the EBSCOhost platform + Free of charge
  • Example 125 Online access databases about journal articles: Teacher Reference Center • http://www.TeacherReference.com/ • Teacher Reference Center (TRC) Journal Information for Teachers allows to search popular teacher and administrator trade journals, periodicals, and books • via the EBSCOhost platform • since 2006 + offered free of charge
  • Example 126 Online access databases: Web of Science • One of the bibliographic databases in Web of Knowledge is the Web of Science. • This is a bibliographic database that covers the articles published in the most important scientific journals. Web of Knowledge Web of Science
  • 127 Finding images on the Internet: introduction + Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet. + When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).
  • Examples 128 Finding images on the Internet: screen shot of a Google image search
  • Example 129 Finding images on the Internet: examples of search engines • http://images.google.com/ ! or through http://www.google.com/ [accessed in 2009] • The largest database in this category (at least in 2002…2008). For each result, not only a thumbnail is offered, but also directly the origin with the readable URL; this makes it easier to guess the relevance of the document.
  • Eample 130 Finding images on the Internet: examples of search engines • http://www.bing.com/ • Available in 2009 in beta = test version. • Replacing Microsoft Live and Yahoo Search ?
  • 131 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 132 Federated searching through scattered databases: why? • Applications: »Finding information in bibliographic databases »Finding the availability of rooms in various hotels »Finding flights to a particular destination offered by various airline companies »Finding scientific data that are made available by various computers all over the world
  • Example 133 Federated searching: application: finding a hotel room in some city
  • Example 134 Federated searching: application: finding scientific data • OBIS = Ocean Biogeographic Information System • http://www.iobis.org/ • Gateway to scientific data on living systems in the oceans. • The data reside on many computers all over the world.
  • 135 Hybrid method: merging data + federated searching User User User User Search engine Federated search engine Aggregated database Search engine Search engine Database Database Database or web site or web site or web site Database Database or… or… or…
  • Example 136 Databases accessible over the Internet: example • http://WorldWideScience.org/ • “A global science gateway connecting you to national and international scientific databases and portals. Accelerates scientific discovery and progress by providing one-stop searching of global science sources.”
  • 137 Meta WWW search systems on a server computer in the WWW Client Internet computer WWW + WWW WWW server client program computer User WWW server computers with Internet search systems In Out
  • 138 Meta-search systems: terminology / vocabulary / synonyms “multi-threaded search systems” = “multiple search systems” = “multi-search systems” = “meta-search systems” = “intelligent search agents” = “federated search systems” = “portals”
  • Examples 139 Meta-search systems on a server computer • http://aftervote.com/ • http://draze.com/ • http://www.all4one.com • http://www.bytesearch.com • http://clusty.com/ • http://www.cyber411.com • http://www.dogpile.com = http://dogpile.com/ • http://www.go2net.com = http://www.metacrawler.com • http://jux2.com • http://www.kartoo.com • http://www.mamma.com • http://www.museseek.com • http://www.profusion.com • http://www.search.com • http://www.vivisimo.com = http://vivisimo.com/
  • 140 Meta-search systems: server-based: example: Vivisimo
  • 141 Meta-search systems: server-based: example: Vivisimo • Vivisimo adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster. • Vivisimo can accomplish this on the fly, that is WITHOUT pre-processing the documents before the search.
  • Example 142 Meta-search systems: server-based: example: Clusty • Adds value by analysing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster. • Can accomplish this on the fly, that is WITHOUT pre- processing the documents before the search.
  • Example 143 Meta-search systems: server-based: example: Clusty screenshot in 2006
  • 144 Meta-search systems: disadvantages - It is not always clear through which Internet indexes the meta-search system will search. - Not all meta-search systems can search all the major primary search systems; for instance the famous Google Internet index is NOT included in most systems. - Only a limited number of the results that can be obtained from the various Internet indexes are shown.
  • 145 Free public access book meta-search systems: types We can make the following distinction between various types of meta-systems for searching: 1. Database resulting from merging several existing smaller databases = aggregator database In this case of books: multi-dealer database = “listing service” 2. Federated search system = cross-database search system
  • 146 Free public access search systems: federated search systems • Each of the searched target databases can be »a catalogue database managed by the owner/dealer/shop/seller, as well as »a multi-dealer database
  • 147 Search systems for books that are made available by dealers User Book dealer catalog database descriptions of books & real books for sale
  • 148 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 149 Search systems for books that are made available by dealers User Book dealer catalog databases descriptions of books & real books for sale
  • 150 Search systems for books that are made available by dealers User Multi-dealer database = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 151 Search systems for books that are made available by dealers User Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 152 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 153 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • 154 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale
  • - 155 Free public access federated search systems for books: examples
  • 156 Free public access federated search systems for books: examples • http://www.allbookstores.com/ [accessed 2006]
  • 157 Free public access federated search systems for books: examples
  • 158 Free public access federated search systems for books: examples • http://www.BookFinder.com/ [accessed 2009]
  • 159 Free public access federated search systems for books: examples • http://www.bookfinder4u.com/ [accessed 2007]
  • 160 Free public access federated search systems for books: examples • http://www.bookpursuit.com/ [accessed 2006]
  • 161 Free public access federated search systems for books: examples
  • 162 Free public access federated search systems for books: examples • http://www.dealtime.com/ [accessed 2006]
  • 163 Free public access federated search systems for books: examples • http://www.epinions.com/Books [accessed 2006]
  • 164 Free public access federated search systems for books: examples • http://www.fetchbook.info/ [accessed 2006]
  • 165 Free public access federated search systems for books: examples • http://www.gallileus.info/search/ [accessed 2006]
  • 166 Free public access federated search systems for books: examples • http://www.priceminister.com/livres-bd [accessed 2007] • Can search not only books but also other products in various shops.
  • 167 Free public access federated search systems for books: examples • http://www.usedbooksearch.co.uk/books.htm [accessed 2008] • Specialised in used books, not in new books.
  • 168 Free public access federated search systems for books: examples • http://www.vialibri.net/ [accessed 2008]
  • 169 Free public access federated search systems for books are interesting • Knowledge about their quality is interesting » for end users as well as for librarians who buy books, » for librarians who serve their users by performing searches for books, » for librarians who propose databases to their users, for instance on their library WWW site or who want to include one or several book search engines in their own local system for federated searching through several targets in one action.
  • 170 Online Public Access Catalogues: simultaneous searching • Some meta-search services allow simultaneous, parallel searching in one search action over several databases of libraries.
  • Example 171 Online Public Access Catalogues: simultaneous searching: examples • Simultaneous access to catalogues of libraries related to water, organised by IAMSLIC, using Z39.50
  • 172 Information Retrieval in a World of Scattered Information Sources 3. Comparison of methods for efficient information retrieval
  • 173 Method 1: Merging = aggregating into a searchable database User User User User Search engine Aggregated database Database Database Database D or web site or web site or web site or or… or… or…
  • 174 Comparison of methods for efficient information retrieval • Merged=aggregated databases react faster than federated search systems (in most cases). »Explanation: They do not need several simultaneous Internet connections & they do not have to merge raw intermediate results into the result that is finally shown to the user. ☺
  • 175 Method 2: Federated searching through scattered databases User User User User Federated search engine Search engine Search engine Search engine Database Database Database
  • 176 Hybrid method: merging data + federated searching User User User User Search engine Federated search engine Aggregated database Search engine Search engine Database Database Database or web site or web site or web site Database Database or… or… or…
  • 177 Comparison of methods for efficient information retrieval • Federated search systems offer a higher coverage than direct searching of databases or merged databases (in most cases). »Explanation: They can exploit many databases and even merged=aggregated databases in one search action. For example, in 1 search, they can cover more than 100 million descriptions of physical books = couples of book and dealer (not book titles). ☺
  • 178 Comparison of methods for efficient information retrieval • Federated search systems offer results that are more up to date than when an aggregated database is searched with contents that is (only) a snapshot made in the past. This is important when data should be very fresh = up-to-date. Examples: booking=reservation systems for flights, hotel rooms ☺
  • 179 Information Retrieval in a World of Scattered Information Sources Conclusions
  • 180 Conclusions: 2 methods • A single, simple, standard method = approach = solution does not (yet) exist. • Two basic methods are common. • They have their own »advantages and »disadvantages.
  • 181 Conclusions: 1 dimension • Up to now we have made primarily the distinction » Merging records in 1 database on 1 computer & searching this database » Federated searching in one action of databases on various computers
  • 182 Conclusions: more dimensions • However, the location of the databases is only 1 aspect / dimension of possible methodological approaches. • Other dimensions / aspects are for instance: 2. Unification / standardization of database record structures in fields according to a standard, for better interoperability. 3. Unification / standardization of subject descriptions, for better interoperability. • This bring us to 3 aspects / dimensions so we can visualize this as a cube.
  • 183 Conclusions: the cube of interoperability 1. One computer 2. One database field structure 3. One subject description system BEST CASE Inter- operability 1. Various computers 2. Various database field structures 3. Various subject description systems WORST CASE
  • 184 Methods for efficient information retrieval: conclusions • For end users, the underlying methods of most information systems are either “not clear” (= negative formulation) “transparent” (= positive formulation)
  • 185 Methods for efficient information retrieval: conclusions • The examples given show at least that progress in this field is impressive. ☺
  • 186 Questions? Suggestions? Remarks?
  • 187 • You are free to copy, distribute, display this work under the following conditions: »Attribution: You must mention the author. »Noncommercial: You may not use this work for commercial purposes. »No Derivative Works: You may not change, modify, alter, transform, or build upon this work. • For any reuse or distribution, you must make clear to others the license terms of this work.