Search Systems
Upcoming SlideShare
Loading in...5

Search Systems






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Search Systems Search Systems Presentation Transcript

    • Information Architecture Search Systems
    • Does your site need search? ▫ Does your site have enough contents? ▫ Will this divert resources from navigation systems? ▫ Do you have time and knowledge to optimize the search system? ▫ Are there alternatives? ▫ Will your users bother with search?
    • Before you add a search system • Do not assume that a search engine alone will satisfy all users information needs • Should be used in addition to well structured navigation, not replacing navigation
    • Need a search system if… • When you have too much content to browse or content warrants it ▫ Eg – course catalog, research site, large site like Microsoft, real estate site • Fragmented subsites – Eg – UB • Site is a learning tool – Eg – web coding tutorials online • Dynamic site like a newspaper where articles are archived and only way to access them is to search
    • Search System Anatomy • Indexing by SE • Web Sites need to be SEO • Spiders • What is indexed – url, title, headings, keywords, content • Search interface • Boolean operators (and, or, not)
    • The Retrieval Process Search Search User Query Content Results Interface Engine Query Ranked Docs Operations Retrieved Docs DB Manager Module Text Database
    • Search Systems • Types of searches: ▫ Basic Search (also known as “keyword search” ▫ Advanced search: Use of search refinement and metadata search. • Search Engines are the software applications and foundation of search systems
    • Choosing what to search • Don’t have to index everything • If you conduct an inventory and analysis of your content you should have a good idea of what content is “good” • Silos – staff directories, sub sites, tech articles, books, etc… • Content components – title, author, etc..
    • Search Zones • Subsets of the site that have been indexed separately. ▫ Example ▫ Amazon does a great job of this • Can be: content type, audience, role, topic, geography, chronology, department
    • Types of Pages • Navigation pages – pages that help you browse a site • Destination pages – contain actual information • Want to make sure search results contain mostly destination pages
    • Search Systems • Selecting content components to index ▫ Take advantage of the site structure ▫ Components to index: • Body • Image Link • Title • Image alt text • URL • Description • Site name • Keywords • Link • Remote anchor text
    • Search Algorithms • There are many types of algorithms available. • The bottom line is to select the one that is appropriate for the type of search capabilities required by the user.
    • Set Theoretic •Fuzzy Classic Models •Extended Boolean •Boolean U •Vector space s Retrieval: •Probabilistic Algebraic e Adhoc •Generalized Vector r Filtering •Lat. Semantic Index T •Neural Networks Structured Models a •Non Overlapping s Lists Probabilistic k •Proximal nodes s •Inference Network Browsing •Belief Network Browsing •Language Models •Flat •Structure Guided •Hypertext
    • Pattern Matching Algorithms • Most common, matches a string that user entered • Depending on your user’s needs you have to emphasize recall or precision. • Recall - #relevant docs retrieved / #relevant docs in collection • Precision - #relevant docs retrieved / #total docs in collection
    • Pattern Matching Algorithms • Automatic Stemming – expands a term to include other terms that share the same root ▫ Eg: “word” gets you “password” • No Stemming – results contain just that word • Depends on the content you are indexing. Eg – course catalog
    • Other Approaches • Document Similarity - Allowing user feedback (more like this option) ▫ Can be done by re-querying w/o stopwords or automatically based on metadata • Collaborative filtering  Cited by  Active Bibliography (related docs)  Users who viewed this document also viewed  Similar documents based on text  Related documents based on co-citation
    • Query Builders • Tools that help SE performance – invisible to users ▫ Spell-checkers – Google’s “did you mean” ▫ Phonetic tools – sounds like ▫ Stemming tools – same stem results ▫ Natural language processing tools – how to ▫ Controlled vocabulary – include synonyms
    • Presenting Results • What to display? ▫ Title ▫ Summary ▫ Relevance score ▫ Other parts of the structure of docs ▫ Depends on your audience – more or less info – give users the option to see ‘detailed’ results if they choose – descriptive vs reprenstational • How many documents? ▫ Number of retrieved docs ▫ Number of results per page
    • Listing Results • Sorting  Alphabetically  Chronologically • Ranking  By relevance  By popularity  By users’ or experts’ ratings  By pay-for-placement
    • Listing Results • Grouping results: Clustering • Exporting results  Print or email results  Select a subset of results  Save search • No single approach is perfect – combine approaches
    • Search Interfaces • Factors that affect the interface design  User’s searching expertise  Type of results wanted  Type of information being searched  Amount of information being searched
    • Search Interface • The box: Simple and clear ▫ Good for users that don’t want to learn more about the search mechanism ▫ Placement of search matters on a site ▫ Put close to main navigation or near top of page ▫ Don’t be creative with button label
    • Advanced Search • Unveils search system functionality ▫ Field searching ▫ Date ranges ▫ Search zones • How often do you take advantage of these features?
    • Supporting Revision • What to do when users don’t get what they want?  Repeat search in results  Explain where results came from (what data was searched)  Explain what the user did (restate query, filters, sort order)  Integrate searching and browsing (product inventory)
    • Search Systems • When users get stuck ▫ Way too many results  Options to narrow search ▫ Zero results:  Offer means of revising the search  Search tips  A means of browsing (I.e. site map)  Human contact if searching & browsing don’t work
    • Search Systems • Commercial web site search available:  Verity Ultraseek  Altavista  Google  …… and many others
    • Search Systems • Free search options: ▫ Adding Google search to your site:  ▫ Open source software:  Lucene: (Jakarta Project)  MG: (Managing Gigabytes)
    • Discussion Questions • How has the search engine changed the way we use the web? • Where do you see it going in the future? • Search Engines – Pros / Cons • Articles