Federated Search in a Disparate Environment


Published on

Why use Federated Search, defining it, challenges, benefits, case studies, best practices & future vision.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Federated Search in a Disparate Environment

    1. 1. Federated Search in a Disparate Environment<br />PREPARED FOR:<br />Gilbane San Francisco<br />8403 Colesville Road <br />Silver Spring Metro Plaza 2<br />Suite 400<br />Silver Spring, MD 20910<br />(301.588.5900<br />7 301.588.0390<br />*info@macf.com <br />www.macf.com<br />Helen L. Mitchell Curtis<br />Senior Program Director, Enterprise Solutions<br />June 4, 2009<br />
    2. 2. Biography<br />Helen L. Mitchell Curtis – Senior Program Director of Enterprise Solutions, Macfadden<br />32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal Agencies<br />Develop enterprise-wide search strategies & solutions<br />Integrate search technologies across IT applications and disparate document repositories<br />Build governance, management and end user buy-in<br />Promote collaboration, standards, findability and improved organization of data and document assets<br />Passion – to help clients to reduce costs, improve quality and efficiency, reduce &apos;pain points&apos; and achieve a positive search experience<br />
    3. 3. About Macfadden<br />Founded in 1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998<br />Became 100% employee-owned in 2007, S-Corporation<br />Acquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008<br />225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary)<br />FAST X10 Partner<br />Microsoft Certified Partner - Information Worker Solutions with Search Specialization Competency<br />CAPABILITIES:<br /><ul><li>Enterprise Search Solutions
    4. 4. Integrated IT Solutions & Security
    5. 5. Counter Terrorism Planning
    6. 6. Disaster Response Management
    7. 7. Threat & Vulnerability Assessment
    8. 8. Program/Project Management
    9. 9. Intelligence Gathering & Analysis</li></li></ul><li>Clarify Terms<br />Definition by AIIM Market IQ<br />Definition by CMS Watch<br />A Federated Search Primer – Part II<br />Deep Web Technologies<br />
    10. 10. Findability Issues<br />AIIM Market IQ Research on Findability (of 528 end users):<br />50% believe Findability in their organization is “Worse to Much Worse” than their consumer-facing web sites<br />49% have no formal goal for Enterprise Findability within their organizations<br />49% “Agreed or Strongly Agreed” that finding the information to do their job is difficult and time consuming<br />69% believe less than 50% of their organization&apos;s information is searchable online<br />36% reference five or more systems in any given week<br />Source: AIIM Market Intelligence, 2008<br />
    11. 11. Why Use Federated Search<br />To increase findability so users can accomplish their business objectives <br />To access multiple content sources through a common search interface <br />To increase user awareness of all content sources<br />To eliminate using multiple database search protocols and passwords<br />To access public or subscription search sites<br />To search the deep web for scientific, technical and business content <br />To reduce search time and display results in a common format<br />
    12. 12. Federated ‘Master Index’ Search<br />Index content from multiple data sources into a singlemaster search index<br />Queries & results come from that one master index<br />Many Enterprise Search products integrate FS via ‘connectors’ to accomplish this (ex., FAST, Autonomy, Endeca)<br />Source: New Idea Engineering, Inc.<br />
    13. 13. Federated ‘Data Silos’ Search<br />‘Search federator’ process queries each data source silo<br />Transforms the users search terms to match each content source&apos;s requirements<br />Submits the query to each of the sources simultaneously<br />Merges each source’s results together - a single look and feel<br />Maintains no indices of its own, relies upon the capabilities of all the linked systems<br />Source: New Idea Engineering, Inc.<br />
    14. 14. Surface vs. Deep Web Search<br />Deep Web FS Examples:<br />www.completeplanet.com - 70,000+ searchable DBs & specialty search engines<br />www.science.gov-federates U.S. federal agency science information<br />http://imlsdcc.grainger.uiuc.edu/ - Institute of Museum & Library Services (IMLS) - Digital Collections & Content w/descriptions of digital resources developed by IMLS grantees<br />Source: Juanico-Environmental Consultants, Ltd.<br />
    15. 15. Vertical Search Engine<br />Closely related to Deep Web – searches for a particular nichei.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)<br />Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics<br />
    16. 16. Challenges<br />Authentication<br />Showing each record’s branding and copyright information<br />Licensed or subscription databases<br />True De-duplication<br />Virtually impossible because DBs return 10-20 results at a time<br />Vendors usually just de-duping the first results set returned <br />Security<br />Mapping user credentials and access rights to each repository security model<br />Speed<br />Limited by slowest search engine’s performance<br />
    17. 17. Challenges (continued)<br />Lack of data standardization<br />Each source has a unique access method & needs translation<br />Metadata mapping between FSS and underlying systems<br />Access methods to sources may change<br />Requires an interface rewrite or modification <br />Rules for error handling <br />Ex. Query term not available—exclude the query, the repository, or proceed without the term?<br />Ex. Timeouts or connection problem <br />Complex searches usually not available<br />Fielded searches<br />
    18. 18. Challenges (continued)<br />Relevancy scores<br />Can’t identify a single relevancy ranking model<br />Relevancy rankings for repository’s results refers to its own<br />May be not be useful when comparing the results with those from another system<br />Access to content stored in a variety of places<br />Results page may not let user obtain identified documents<br />This may involve a built-in viewer or invoking the owning product’s interface.<br />Combining navigators from each result set<br />i.e., faceted search, taxonomies and auto-generate clusters<br />Selecting the right FS engine<br />Depends on business goals, type of content sources – structured vs. unstructured, licensed/subscriptions<br />
    19. 19. Benefits<br />Single master index<br />Quicker response times<br />No need to access original data sources<br />Relevancy algorithms applied uniformly<br />Dynamic navigators are available for all documents<br />Time savings<br />Searches many sources at one time<br />Combines results into a single results page<br />Quality of results<br />Client selects the sources to search<br />Minimum impact on the data silos <br />Only accessed when a user performs a query <br />Eliminates increased load crawling/indexing the data source<br />
    20. 20. Benefits (continued)<br />Improve productivity<br />Reduces number of searches executed to find relevant results<br />Save, reuse, schedule, and even share effective search queries<br />Leverage security controls at queried source<br />Access repositories secured against crawls but can be accessed by search queries<br />Reduce costs<br />No additional capacity requirements for content index since its not crawled by search server<br />Most current content<br />As soon as the source is updated, the info is available to the searcher on the very next query<br />Increase awareness<br />Identify most relevant sources to search based on # of results each source produced<br />
    21. 21. FDA Case Study Success(Federated ‘Master Index’ Search System)<br />
    22. 22. FSS Example(uses FAST ESP – Vertical Search)<br />
    23. 23. FSS Example(uses MS & Vivisimo)<br />
    24. 24. FSS Example (uses Webfeat)<br />
    25. 25. Best Practices<br />
    26. 26. Future Vision<br />
    27. 27. Future Vision (continued)<br />
    28. 28. Resources<br />Great source of info on many Federated Search topics: www.federatedsearchblog.com – Author: Sol Lederman<br />List of Open Source & commercial search components & tools: http://www.searchcomponentsonline.com/federated-search-vendors.html<br />List of many Deep Web Databases: http://www.noodletools.com/debbie/literacies/information/5locate/advicedepth.html<br />Info on the Deep Web: http://www.internettutorials.net/deepweb.asp<br />Some Digital Image Resources on the Deep Web: http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php<br />Info on Vertical Search Engines:http://www.altsearchengines.com/category/verticals/<br />50 Niche Search Engines: http://www.accrediteddldegrees.com/2008/50-niche-search-engines-that-will-make-your-everyday-life-easier/<br />Library of Congress list of FS Portal Products & Vendors: http://www.loc.gov/catdir/lcpaig/portalproducts.html<br />99 Resources to Research & Mine the Invisible Web: http://www.collegedegree.com/library/college-life/99-resources-to/<br />
    29. 29. References<br />“What’s in a Name: Federated Search” – By Miles Kehoe, New Idea Engineering, Inc. - Volume 4 Number 4 - August 2007<br />“Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap 2004 (Reprint of article by Donna Fryer www.SearchitRight.com )<br />“Growing Up With Federated Search” - by Walt Warnick, OSTI <br />“Sophisticated Yet Simple - The Technology Behind OSTI&apos;s E-print Network: Part 3” – Walt Warnick, OSTI<br />“Vertical Search Engines & the Deep Web” - Laura B. Cohen http://www.internettutorials.net/<br />www.federatedsearchblog.com – by Sol Lederman <br />“Exploring a ‘Deep Web’ that Google can’t Grasp” - NYT 2-23-09 http://www.nytimes.com/2009/02/23/technology/internet/23search.html?_r=1&ref=business<br />“Federated Search Primer, Part I-III”– by Sol Lederman<br />www.searchdoneright.com – by Vivisimo –Raoul – CEO & Cofounder<br />“Enterprise Search Grows Up’”- Podcast from BizTalk<br />“Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08<br />“The Future of Federated Search or What Will the World Look Like in 10 Years” – Rich Turner<br />
    30. 30. 25<br />THANK YOU!<br />Helen L. Mitchell Curtis<br />Senior Program Director, Enterprise Solutions<br />hmitchell@macf.com<br />240-247-1946 (w)<br />240-743-7975 (m)<br />
    31. 31. MACFADDEN<br />Delivering Results. Exceeding Expectations.<br />