Federated Search in a Disparate Environment PREPARED FOR: Gilbane San Francisco 8403 Colesville Road Silver Spring Metro Plaza 2 Suite 400 Silver Spring, MD 20910 (301.588.5900 7 301.588.0390 *email@example.com www.macf.com Helen L. Mitchell Curtis Senior Program Director, Enterprise Solutions June 4, 2009
Biography Helen L. Mitchell Curtis – Senior Program Director of Enterprise Solutions, Macfadden 32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal Agencies Develop enterprise-wide search strategies & solutions Integrate search technologies across IT applications and disparate document repositories Build governance, management and end user buy-in Promote collaboration, standards, findability and improved organization of data and document assets Passion – to help clients to reduce costs, improve quality and efficiency, reduce 'pain points' and achieve a positive search experience
About Macfadden Founded in 1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998 Became 100% employee-owned in 2007, S-Corporation Acquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008 225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary) FAST X10 Partner Microsoft Certified Partner - Information Worker Solutions with Search Specialization Competency CAPABILITIES:
Clarify Terms Definition by AIIM Market IQ Definition by CMS Watch A Federated Search Primer – Part II Deep Web Technologies
Findability Issues AIIM Market IQ Research on Findability (of 528 end users): 50% believe Findability in their organization is “Worse to Much Worse” than their consumer-facing web sites 49% have no formal goal for Enterprise Findability within their organizations 49% “Agreed or Strongly Agreed” that finding the information to do their job is difficult and time consuming 69% believe less than 50% of their organization's information is searchable online 36% reference five or more systems in any given week Source: AIIM Market Intelligence, 2008
Why Use Federated Search To increase findability so users can accomplish their business objectives To access multiple content sources through a common search interface To increase user awareness of all content sources To eliminate using multiple database search protocols and passwords To access public or subscription search sites To search the deep web for scientific, technical and business content To reduce search time and display results in a common format
Federated ‘Master Index’ Search Index content from multiple data sources into a singlemaster search index Queries & results come from that one master index Many Enterprise Search products integrate FS via ‘connectors’ to accomplish this (ex., FAST, Autonomy, Endeca) Source: New Idea Engineering, Inc.
Federated ‘Data Silos’ Search ‘Search federator’ process queries each data source silo Transforms the users search terms to match each content source's requirements Submits the query to each of the sources simultaneously Merges each source’s results together - a single look and feel Maintains no indices of its own, relies upon the capabilities of all the linked systems Source: New Idea Engineering, Inc.
Surface vs. Deep Web Search Deep Web FS Examples: www.completeplanet.com - 70,000+ searchable DBs & specialty search engines www.science.gov-federates U.S. federal agency science information http://imlsdcc.grainger.uiuc.edu/ - Institute of Museum & Library Services (IMLS) - Digital Collections & Content w/descriptions of digital resources developed by IMLS grantees Source: Juanico-Environmental Consultants, Ltd.
Vertical Search Engine Closely related to Deep Web – searches for a particular nichei.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs) Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics
Challenges Authentication Showing each record’s branding and copyright information Licensed or subscription databases True De-duplication Virtually impossible because DBs return 10-20 results at a time Vendors usually just de-duping the first results set returned Security Mapping user credentials and access rights to each repository security model Speed Limited by slowest search engine’s performance
Challenges (continued) Lack of data standardization Each source has a unique access method & needs translation Metadata mapping between FSS and underlying systems Access methods to sources may change Requires an interface rewrite or modification Rules for error handling Ex. Query term not available—exclude the query, the repository, or proceed without the term? Ex. Timeouts or connection problem Complex searches usually not available Fielded searches
Challenges (continued) Relevancy scores Can’t identify a single relevancy ranking model Relevancy rankings for repository’s results refers to its own May be not be useful when comparing the results with those from another system Access to content stored in a variety of places Results page may not let user obtain identified documents This may involve a built-in viewer or invoking the owning product’s interface. Combining navigators from each result set i.e., faceted search, taxonomies and auto-generate clusters Selecting the right FS engine Depends on business goals, type of content sources – structured vs. unstructured, licensed/subscriptions
Benefits Single master index Quicker response times No need to access original data sources Relevancy algorithms applied uniformly Dynamic navigators are available for all documents Time savings Searches many sources at one time Combines results into a single results page Quality of results Client selects the sources to search Minimum impact on the data silos Only accessed when a user performs a query Eliminates increased load crawling/indexing the data source
Benefits (continued) Improve productivity Reduces number of searches executed to find relevant results Save, reuse, schedule, and even share effective search queries Leverage security controls at queried source Access repositories secured against crawls but can be accessed by search queries Reduce costs No additional capacity requirements for content index since its not crawled by search server Most current content As soon as the source is updated, the info is available to the searcher on the very next query Increase awareness Identify most relevant sources to search based on # of results each source produced
FDA Case Study Success(Federated ‘Master Index’ Search System)
Resources Great source of info on many Federated Search topics: www.federatedsearchblog.com – Author: Sol Lederman List of Open Source & commercial search components & tools: http://www.searchcomponentsonline.com/federated-search-vendors.html List of many Deep Web Databases: http://www.noodletools.com/debbie/literacies/information/5locate/advicedepth.html Info on the Deep Web: http://www.internettutorials.net/deepweb.asp Some Digital Image Resources on the Deep Web: http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php Info on Vertical Search Engines:http://www.altsearchengines.com/category/verticals/ 50 Niche Search Engines: http://www.accrediteddldegrees.com/2008/50-niche-search-engines-that-will-make-your-everyday-life-easier/ Library of Congress list of FS Portal Products & Vendors: http://www.loc.gov/catdir/lcpaig/portalproducts.html 99 Resources to Research & Mine the Invisible Web: http://www.collegedegree.com/library/college-life/99-resources-to/
References “What’s in a Name: Federated Search” – By Miles Kehoe, New Idea Engineering, Inc. - Volume 4 Number 4 - August 2007 “Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap 2004 (Reprint of article by Donna Fryer www.SearchitRight.com ) “Growing Up With Federated Search” - by Walt Warnick, OSTI “Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3” – Walt Warnick, OSTI “Vertical Search Engines & the Deep Web” - Laura B. Cohen http://www.internettutorials.net/ www.federatedsearchblog.com – by Sol Lederman “Exploring a ‘Deep Web’ that Google can’t Grasp” - NYT 2-23-09 http://www.nytimes.com/2009/02/23/technology/internet/23search.html?_r=1&ref=business “Federated Search Primer, Part I-III”– by Sol Lederman www.searchdoneright.com – by Vivisimo –Raoul – CEO & Cofounder “Enterprise Search Grows Up’”- Podcast from BizTalk “Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08 “The Future of Federated Search or What Will the World Look Like in 10 Years” – Rich Turner
25 THANK YOU! Helen L. Mitchell Curtis Senior Program Director, Enterprise Solutions firstname.lastname@example.org 240-247-1946 (w) 240-743-7975 (m)