Your SlideShare is downloading. ×
Federated Search in a Disparate Environment
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Federated Search in a Disparate Environment


Published on

Why use Federated Search, defining it, challenges, benefits, case studies, best practices & future vision.

Why use Federated Search, defining it, challenges, benefits, case studies, best practices & future vision.

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Transcript

    • 1. Federated Search in a Disparate Environment
      Gilbane San Francisco
      8403 Colesville Road
      Silver Spring Metro Plaza 2
      Suite 400
      Silver Spring, MD 20910
      7 301.588.0390
      Helen L. Mitchell Curtis
      Senior Program Director, Enterprise Solutions
      June 4, 2009
    • 2. Biography
      Helen L. Mitchell Curtis – Senior Program Director of Enterprise Solutions, Macfadden
      32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal Agencies
      Develop enterprise-wide search strategies & solutions
      Integrate search technologies across IT applications and disparate document repositories
      Build governance, management and end user buy-in
      Promote collaboration, standards, findability and improved organization of data and document assets
      Passion – to help clients to reduce costs, improve quality and efficiency, reduce 'pain points' and achieve a positive search experience
    • 3. About Macfadden
      Founded in 1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998
      Became 100% employee-owned in 2007, S-Corporation
      Acquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008
      225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary)
      FAST X10 Partner
      Microsoft Certified Partner - Information Worker Solutions with Search Specialization Competency
      • Enterprise Search Solutions
      • 4. Integrated IT Solutions & Security
      • 5. Counter Terrorism Planning
      • 6. Disaster Response Management
      • 7. Threat & Vulnerability Assessment
      • 8. Program/Project Management
      • 9. Intelligence Gathering & Analysis
    • Clarify Terms
      Definition by AIIM Market IQ
      Definition by CMS Watch
      A Federated Search Primer – Part II
      Deep Web Technologies
    • 10. Findability Issues
      AIIM Market IQ Research on Findability (of 528 end users):
      50% believe Findability in their organization is “Worse to Much Worse” than their consumer-facing web sites
      49% have no formal goal for Enterprise Findability within their organizations
      49% “Agreed or Strongly Agreed” that finding the information to do their job is difficult and time consuming
      69% believe less than 50% of their organization's information is searchable online
      36% reference five or more systems in any given week
      Source: AIIM Market Intelligence, 2008
    • 11. Why Use Federated Search
      To increase findability so users can accomplish their business objectives
      To access multiple content sources through a common search interface
      To increase user awareness of all content sources
      To eliminate using multiple database search protocols and passwords
      To access public or subscription search sites
      To search the deep web for scientific, technical and business content
      To reduce search time and display results in a common format
    • 12. Federated ‘Master Index’ Search
      Index content from multiple data sources into a singlemaster search index
      Queries & results come from that one master index
      Many Enterprise Search products integrate FS via ‘connectors’ to accomplish this (ex., FAST, Autonomy, Endeca)
      Source: New Idea Engineering, Inc.
    • 13. Federated ‘Data Silos’ Search
      ‘Search federator’ process queries each data source silo
      Transforms the users search terms to match each content source's requirements
      Submits the query to each of the sources simultaneously
      Merges each source’s results together - a single look and feel
      Maintains no indices of its own, relies upon the capabilities of all the linked systems
      Source: New Idea Engineering, Inc.
    • 14. Surface vs. Deep Web Search
      Deep Web FS Examples: - 70,000+ searchable DBs & specialty search engines U.S. federal agency science information - Institute of Museum & Library Services (IMLS) - Digital Collections & Content w/descriptions of digital resources developed by IMLS grantees
      Source: Juanico-Environmental Consultants, Ltd.
    • 15. Vertical Search Engine
      Closely related to Deep Web – searches for a particular nichei.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)
      Example: - is a search engine focusing on veterinary science and related topics
    • 16. Challenges
      Showing each record’s branding and copyright information
      Licensed or subscription databases
      True De-duplication
      Virtually impossible because DBs return 10-20 results at a time
      Vendors usually just de-duping the first results set returned
      Mapping user credentials and access rights to each repository security model
      Limited by slowest search engine’s performance
    • 17. Challenges (continued)
      Lack of data standardization
      Each source has a unique access method & needs translation
      Metadata mapping between FSS and underlying systems
      Access methods to sources may change
      Requires an interface rewrite or modification
      Rules for error handling
      Ex. Query term not available—exclude the query, the repository, or proceed without the term?
      Ex. Timeouts or connection problem
      Complex searches usually not available
      Fielded searches
    • 18. Challenges (continued)
      Relevancy scores
      Can’t identify a single relevancy ranking model
      Relevancy rankings for repository’s results refers to its own
      May be not be useful when comparing the results with those from another system
      Access to content stored in a variety of places
      Results page may not let user obtain identified documents
      This may involve a built-in viewer or invoking the owning product’s interface.
      Combining navigators from each result set
      i.e., faceted search, taxonomies and auto-generate clusters
      Selecting the right FS engine
      Depends on business goals, type of content sources – structured vs. unstructured, licensed/subscriptions
    • 19. Benefits
      Single master index
      Quicker response times
      No need to access original data sources
      Relevancy algorithms applied uniformly
      Dynamic navigators are available for all documents
      Time savings
      Searches many sources at one time
      Combines results into a single results page
      Quality of results
      Client selects the sources to search
      Minimum impact on the data silos
      Only accessed when a user performs a query
      Eliminates increased load crawling/indexing the data source
    • 20. Benefits (continued)
      Improve productivity
      Reduces number of searches executed to find relevant results
      Save, reuse, schedule, and even share effective search queries
      Leverage security controls at queried source
      Access repositories secured against crawls but can be accessed by search queries
      Reduce costs
      No additional capacity requirements for content index since its not crawled by search server
      Most current content
      As soon as the source is updated, the info is available to the searcher on the very next query
      Increase awareness
      Identify most relevant sources to search based on # of results each source produced
    • 21. FDA Case Study Success(Federated ‘Master Index’ Search System)
    • 22. FSS Example(uses FAST ESP – Vertical Search)
    • 23. FSS Example(uses MS & Vivisimo)
    • 24. FSS Example (uses Webfeat)
    • 25. Best Practices
    • 26. Future Vision
    • 27. Future Vision (continued)
    • 28. Resources
      Great source of info on many Federated Search topics: – Author: Sol Lederman
      List of Open Source & commercial search components & tools:
      List of many Deep Web Databases:
      Info on the Deep Web:
      Some Digital Image Resources on the Deep Web:
      Info on Vertical Search Engines:
      50 Niche Search Engines:
      Library of Congress list of FS Portal Products & Vendors:
      99 Resources to Research & Mine the Invisible Web:
    • 29. References
      “What’s in a Name: Federated Search” – By Miles Kehoe, New Idea Engineering, Inc. - Volume 4 Number 4 - August 2007
      “Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap 2004 (Reprint of article by Donna Fryer )
      “Growing Up With Federated Search” - by Walt Warnick, OSTI
      “Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3” – Walt Warnick, OSTI
      “Vertical Search Engines & the Deep Web” - Laura B. Cohen – by Sol Lederman
      “Exploring a ‘Deep Web’ that Google can’t Grasp” - NYT 2-23-09
      “Federated Search Primer, Part I-III”– by Sol Lederman – by Vivisimo –Raoul – CEO & Cofounder
      “Enterprise Search Grows Up’”- Podcast from BizTalk
      “Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08
      “The Future of Federated Search or What Will the World Look Like in 10 Years” – Rich Turner
    • 30. 25
      THANK YOU!
      Helen L. Mitchell Curtis
      Senior Program Director, Enterprise Solutions
      240-247-1946 (w)
      240-743-7975 (m)
    • 31. MACFADDEN
      Delivering Results. Exceeding Expectations.