Federated Search in a Disparate EnvironmentPREPARED FOR:Gilbane San Francisco8403 Colesville Road Silver Spring Metro Plaza 2Suite 400Silver Spring, MD 20910(301.588.59007 301.588.0390*info@macf.com www.macf.comHelen L. Mitchell CurtisSenior Program Director, Enterprise SolutionsJune 4, 2009
BiographyHelen L. Mitchell Curtis – Senior Program Director of Enterprise Solutions, Macfadden32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal AgenciesDevelop enterprise-wide search strategies & solutionsIntegrate search technologies across IT applications and disparate document repositoriesBuild governance, management and end user buy-inPromote collaboration, standards, findability and improved organization of data and document assetsPassion – to help clients to reduce costs, improve quality and efficiency, reduce 'pain points' and achieve a positive search experience
About MacfaddenFounded in 1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998Became 100% employee-owned in 2007, S-CorporationAcquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary)FAST X10 PartnerMicrosoft Certified Partner - Information Worker Solutions with Search Specialization CompetencyCAPABILITIES:Enterprise Search Solutions
Integrated IT Solutions & Security
Counter Terrorism Planning
Disaster Response Management
Threat & Vulnerability Assessment
Program/Project Management
Intelligence Gathering & AnalysisClarify TermsDefinition by AIIM Market IQDefinition by CMS WatchA Federated Search Primer – Part IIDeep Web Technologies
Findability IssuesAIIM Market IQ Research on Findability (of 528 end users):50% believe Findability in their organization is “Worse to Much Worse” than their consumer-facing web sites49% have no formal goal for Enterprise Findability within their organizations49% “Agreed or Strongly Agreed” that finding the information to do their job is difficult and time consuming69% believe less than 50% of their organization's information is searchable online36% reference five or more systems in any given weekSource: AIIM Market Intelligence, 2008
Why Use Federated SearchTo increase findability so users can accomplish their business objectives To access multiple content sources through a common search interface To increase user awareness of all content sourcesTo eliminate using multiple database search protocols and passwordsTo access public or subscription search sitesTo search the deep web for scientific, technical and business content To reduce search time and display results in a common format
Federated ‘Master Index’ SearchIndex content from multiple data sources into a singlemaster search indexQueries & results come from that one master indexMany Enterprise Search products integrate FS via ‘connectors’ to accomplish this (ex., FAST, Autonomy, Endeca)Source:  New Idea Engineering, Inc.
Federated ‘Data Silos’ Search‘Search federator’ process queries each data source siloTransforms the users search terms to match each content source's requirementsSubmits the query to each of the sources simultaneouslyMerges each source’s results together - a single look and feelMaintains no indices of its own, relies upon the capabilities of all the linked systemsSource:  New Idea Engineering, Inc.
Surface vs. Deep Web SearchDeep Web FS Examples:www.completeplanet.com - 70,000+ searchable DBs & specialty search engineswww.science.gov-federates U.S. federal agency science informationhttp://imlsdcc.grainger.uiuc.edu/ - Institute of Museum & Library Services (IMLS) - Digital Collections & Content w/descriptions of digital resources developed by IMLS granteesSource: Juanico-Environmental Consultants, Ltd.
Vertical Search EngineClosely related to Deep Web – searches for a particular nichei.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics
ChallengesAuthenticationShowing each record’s branding and copyright informationLicensed or subscription databasesTrue De-duplicationVirtually impossible because DBs return 10-20 results at a timeVendors usually just de-duping the first results set returned SecurityMapping user credentials and access rights to each repository security modelSpeedLimited by slowest search engine’s performance
Challenges (continued)Lack of data standardizationEach source has a unique access method & needs translationMetadata mapping between FSS and underlying systemsAccess methods to sources may changeRequires an interface rewrite or modification Rules for error handling Ex. Query term not available—exclude the query, the repository, or proceed without the term?Ex. Timeouts or connection problem Complex searches usually not availableFielded searches
Challenges (continued)Relevancy scoresCan’t identify a single relevancy ranking modelRelevancy rankings for repository’s results refers to its ownMay be not be useful when comparing the results with those from another systemAccess to content stored in a variety of placesResults page may not let user obtain identified documentsThis may involve a built-in viewer or invoking the owning product’s interface.Combining navigators from each result seti.e., faceted search, taxonomies and auto-generate clustersSelecting the right FS engineDepends on business goals, type of content sources – structured vs. unstructured, licensed/subscriptions
BenefitsSingle master indexQuicker response timesNo need to access original data sourcesRelevancy algorithms applied uniformlyDynamic navigators are available for all documentsTime savingsSearches many sources at one timeCombines results into a single results pageQuality of resultsClient selects the sources to searchMinimum impact on the data silos Only accessed when a user performs a query Eliminates increased load crawling/indexing the data source
Benefits (continued)Improve productivityReduces number of searches executed to find relevant resultsSave, reuse, schedule, and even share effective search queriesLeverage security controls at queried sourceAccess repositories secured against crawls but can be accessed by search queriesReduce costsNo additional capacity requirements for content index since its not crawled by search serverMost current contentAs soon as the source is updated, the info is available to the searcher on the very next queryIncrease awarenessIdentify most relevant sources to search based on # of results each source produced
FDA Case Study Success(Federated ‘Master Index’ Search System)
FSS Example(uses FAST ESP – Vertical Search)
FSS Example(uses MS & Vivisimo)
FSS Example (uses Webfeat)
Best Practices
Future Vision

Federated Search in a Disparate Environment

  • 1.
    Federated Search ina Disparate EnvironmentPREPARED FOR:Gilbane San Francisco8403 Colesville Road Silver Spring Metro Plaza 2Suite 400Silver Spring, MD 20910(301.588.59007 301.588.0390*info@macf.com www.macf.comHelen L. Mitchell CurtisSenior Program Director, Enterprise SolutionsJune 4, 2009
  • 2.
    BiographyHelen L. MitchellCurtis – Senior Program Director of Enterprise Solutions, Macfadden32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal AgenciesDevelop enterprise-wide search strategies & solutionsIntegrate search technologies across IT applications and disparate document repositoriesBuild governance, management and end user buy-inPromote collaboration, standards, findability and improved organization of data and document assetsPassion – to help clients to reduce costs, improve quality and efficiency, reduce 'pain points' and achieve a positive search experience
  • 3.
    About MacfaddenFounded in1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998Became 100% employee-owned in 2007, S-CorporationAcquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary)FAST X10 PartnerMicrosoft Certified Partner - Information Worker Solutions with Search Specialization CompetencyCAPABILITIES:Enterprise Search Solutions
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Intelligence Gathering &AnalysisClarify TermsDefinition by AIIM Market IQDefinition by CMS WatchA Federated Search Primer – Part IIDeep Web Technologies
  • 10.
    Findability IssuesAIIM MarketIQ Research on Findability (of 528 end users):50% believe Findability in their organization is “Worse to Much Worse” than their consumer-facing web sites49% have no formal goal for Enterprise Findability within their organizations49% “Agreed or Strongly Agreed” that finding the information to do their job is difficult and time consuming69% believe less than 50% of their organization's information is searchable online36% reference five or more systems in any given weekSource: AIIM Market Intelligence, 2008
  • 11.
    Why Use FederatedSearchTo increase findability so users can accomplish their business objectives To access multiple content sources through a common search interface To increase user awareness of all content sourcesTo eliminate using multiple database search protocols and passwordsTo access public or subscription search sitesTo search the deep web for scientific, technical and business content To reduce search time and display results in a common format
  • 12.
    Federated ‘Master Index’SearchIndex content from multiple data sources into a singlemaster search indexQueries & results come from that one master indexMany Enterprise Search products integrate FS via ‘connectors’ to accomplish this (ex., FAST, Autonomy, Endeca)Source: New Idea Engineering, Inc.
  • 13.
    Federated ‘Data Silos’Search‘Search federator’ process queries each data source siloTransforms the users search terms to match each content source's requirementsSubmits the query to each of the sources simultaneouslyMerges each source’s results together - a single look and feelMaintains no indices of its own, relies upon the capabilities of all the linked systemsSource: New Idea Engineering, Inc.
  • 14.
    Surface vs. DeepWeb SearchDeep Web FS Examples:www.completeplanet.com - 70,000+ searchable DBs & specialty search engineswww.science.gov-federates U.S. federal agency science informationhttp://imlsdcc.grainger.uiuc.edu/ - Institute of Museum & Library Services (IMLS) - Digital Collections & Content w/descriptions of digital resources developed by IMLS granteesSource: Juanico-Environmental Consultants, Ltd.
  • 15.
    Vertical Search EngineCloselyrelated to Deep Web – searches for a particular nichei.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics
  • 16.
    ChallengesAuthenticationShowing each record’sbranding and copyright informationLicensed or subscription databasesTrue De-duplicationVirtually impossible because DBs return 10-20 results at a timeVendors usually just de-duping the first results set returned SecurityMapping user credentials and access rights to each repository security modelSpeedLimited by slowest search engine’s performance
  • 17.
    Challenges (continued)Lack ofdata standardizationEach source has a unique access method & needs translationMetadata mapping between FSS and underlying systemsAccess methods to sources may changeRequires an interface rewrite or modification Rules for error handling Ex. Query term not available—exclude the query, the repository, or proceed without the term?Ex. Timeouts or connection problem Complex searches usually not availableFielded searches
  • 18.
    Challenges (continued)Relevancy scoresCan’tidentify a single relevancy ranking modelRelevancy rankings for repository’s results refers to its ownMay be not be useful when comparing the results with those from another systemAccess to content stored in a variety of placesResults page may not let user obtain identified documentsThis may involve a built-in viewer or invoking the owning product’s interface.Combining navigators from each result seti.e., faceted search, taxonomies and auto-generate clustersSelecting the right FS engineDepends on business goals, type of content sources – structured vs. unstructured, licensed/subscriptions
  • 19.
    BenefitsSingle master indexQuickerresponse timesNo need to access original data sourcesRelevancy algorithms applied uniformlyDynamic navigators are available for all documentsTime savingsSearches many sources at one timeCombines results into a single results pageQuality of resultsClient selects the sources to searchMinimum impact on the data silos Only accessed when a user performs a query Eliminates increased load crawling/indexing the data source
  • 20.
    Benefits (continued)Improve productivityReducesnumber of searches executed to find relevant resultsSave, reuse, schedule, and even share effective search queriesLeverage security controls at queried sourceAccess repositories secured against crawls but can be accessed by search queriesReduce costsNo additional capacity requirements for content index since its not crawled by search serverMost current contentAs soon as the source is updated, the info is available to the searcher on the very next queryIncrease awarenessIdentify most relevant sources to search based on # of results each source produced
  • 21.
    FDA Case StudySuccess(Federated ‘Master Index’ Search System)
  • 22.
    FSS Example(uses FASTESP – Vertical Search)
  • 23.
  • 24.
  • 25.
  • 26.