Federated Search Webinar for SLA (Special Libraries Assoc.)

Federated Search in a Disparate
Environment

PREPARED FOR:
SLA Webinar Series
Evidence-Based Practice in Libraries
2040 Corbett Rd
Monkton, Md 21111

(410.472.4631
Helen L. Mitchell Curtis
* hmitchell5@gmail.com
Principal, Enterprising Solutions

September 9, 2009

Enterprising Solutions

Biography
Helen L. Mitchell Curtis – Principal, Enterprising
Solutions

 32+ years at FDA leading one of the largest
enterprise search implementations among Civilian
Federal Agencies
 Develop enterprise-wide search strategies &
solutions
 Integrate search technologies across IT
applications and disparate document repositories
 Build governance, management and end user
buy-in
 Promote collaboration, standards, findability and
improved organization of data and document
assets
 Passion – to help clients to reduce costs, improve
quality and efficiency, reduce 'pain points' and
achieve a positive search experience

2


Polling Question

• What is Your Role? (select all that apply, if group participants)

• CIO, Executive Director
• Library Director (Corporate, Gov’t, Academia, Solo)
• Librarian/Information Management Professional
• IT Professional or Consultant
• Project/Product Manager
• Sales/Marketing/Communications
• End User (i.e., Scientist, Researcher, Engineering Professional)
• Federated Search Vendor
• Other

3


Agenda

1. Terms Clarified
2. Types of Federated Search (FS)
3. FS Challenges & Benefits
4. FDA Case Study
5. FS Evaluation Criteria
6. Examples of FS Solutions
7. Live Federated Search Demo
8. Best Practices
9. Future Vision
10. Questions & Answers

4

1. Definition by AIIM Market IQ
2. Definition by CMS Watch

Clarify Terms
3. A Federated Search Primer – Part II
4. Deep Web Technologies
5. Federated Search Rpt & Toolkit-Jill Hurst-Wahl

• Reliable and complete retrieval of content based on user need,
i.e. everything relevant is recalled (recall) while simultaneously
Findability returning only that content relevant to the user’s focus
(precision), thus eliminating the review of irrelevant content by the
1
user.

• Systems…within an organization…seeking information held
Enterprise internally…in a variety of formats and locations, including
Search databases, document management systems, and other
2
repositories. Content is pre-indexed, simultaneously searched,
(ES) and displayed to authorized users.

• The process of performing a simultaneous real-time search of
Federated multiple diverse and distributed sources from a single search
3
page, with the federated search engine acting as intermediary.
Search (FS)
• The set of web-sites and their documents that cannot be accessed
via crawler-type search engines such as Google. Deep web content
Deep Web typically lives inside of databases, and is accessed through search
4
forms. It is also referred to as the Hidden or Invisible Web.

• SW written to access a content source that must know the URL of
Connector the source, how to send search commands, its search syntax, &
5
how to process the search results returned from a source.
5


Polling Question
 Information Accessibility (select all that apply)

1. I can easily find information to do my job
2. Less than 50% of our organization’s info is searchable online
3. More than 50% of our organization's info is searchable online
4. I reference less than 5 systems (info sources) in any given
week
5. I reference 5 or more systems (info sources) in any given
week

6


Findability Issues
 AIIM Market IQ Research on Findability (of 528 end users):
 50% believe Findability in their organization is ―Worse to Much Worse‖
than their consumer-facing web sites
 49% have no formal goal for Enterprise Findability within their
organizations
 49% ―Agreed or Strongly Agreed‖ that finding the information to do their
job is difficult and time consuming
 69% believe less than 50% of their organization's information is
searchable online
 36% reference five or more systems in any given week

7

Source: AIIM Market Intelligence, 2008


Why Use Federated Search

To increase findability to better accomplish business objectives.

To issue a single query across multiple content sources through a common
search interface.

When not feasible to re-index all of the content available from large public
sites like PubMed.

To increase user awareness of all content sources such as deep web for
scientific, technical and business content.

To eliminate using multiple database search protocols & passwords.

When don‘t have the rights to index the content (e.g. subscription sites).

Real-time search: for content constantly being updated & impractical to
8 keep the data as timely as it needs to be.

Federated Search Sources

(examples)

Reason Corporate Academic Gov’t Public
Library
Subscription Databases X X X X
Internal or External Repositories X X
Library Catalog(s) X X X X
News X X
Digitized Material X X X
Blogs & Wikis X X X
Intranet/Internet Sites X X
Industry Specific Sources X
DB‘s available to customers X X
Historical Collections X

9


Typical Non-Federated Search

10 Courtesy of MuseGlobal, Inc.


Typical Federated Search

11
Courtesy of MuseGlobal, Inc.


Federated „Master Index‟ Search
 Index multiple data sources content into a single master index
 Queries & results come from that one master index

 Many Enterprise Search products integrate FS via ‗connectors‘ to
accomplish this (ex., FAST, Autonomy, Endeca)

12 Source: New Idea Engineering, Inc.


Federated „Data Silos‟ Search
‗Search Federator‘ processes queries for each data source silo
Transforms search terms to match each content source requirements
Submits query to each of the sources simultaneously
Merges each source‘s results together - single look & feel
Maintains no indices of its own, relies on linked systems capabilities

13
Source: New Idea Engineering, Inc.


Surface vs. Deep Web Search

Popular search engines (Google, Yahoo…) ―crawl‖ surface web

FS can drill down to the deep web where specialized content (i.e.,
scientific and technical databases) reside

Deep Web FS Examples:
www.completeplanet.com -
70,000+ searchable DBs & specialty
search engines
www.science.gov- federates U.S.
federal agency science info
http://imlsdcc.grainger.uiuc.edu/ -
Institute of Museum & Library
Services (IMLS) - Digital Collections
& Content w/descriptions of digital
resources developed by IMLS
grantees

14
Source: Juanico-Environmental Consultants, Ltd.


Vertical Search Engine

 Closely related to Deep Web – searches for a particular niche i.e.,
a specific industry, topic, type of content (e.g., scientific research,
travel, movies, images, blogs)
 Example: www.vetseek.info - is a search engine focusing on veterinary science and
related topics

15


Polling Question

Federated Search Solutions (select one)

1. We are currently conducting an evaluation to procure a
Federated Search Product
2. We currently have a Federated Search Solution installed that
satisfies our requirements
3. We have a Federated Search Solution by are considering
replacing it or enhancing its capabilities & features

16


Challenges

 Authentication
 Showing each record‘s branding and copyright information
 Licensed or subscription databases
 True De-duplication
 Virtually impossible because DBs return 10-20 results at a
time
 Vendors usually just de-dupe the first results set returned
 Security
 Mapping user credentials and access rights to each
repository security model
 Speed
 Limited by slowest search engine‘s performance

17


Challenges (continued)

 Lack of data standardization
 Each source has a unique access method & needs
translation
 Metadata mapping between FSS and underlying systems
 Access methods to sources may change
 Requires an interface rewrite or modification
 Rules for error handling
 Ex. Query term not available—exclude the query, the
repository, or proceed without the term?
 Ex. Timeouts or connection problem
 Complex searches usually not available
 Fielded searches
 Known Items, i.e. Article Name
 Best to directly search database
18


Challenges (continued)

 Relevancy scores
 Can‘t identify a single relevancy ranking model
 Relevancy rankings for repository‘s results refers to its own
 May be not be useful when comparing the results with
those from another system
 Access to content stored in a variety of
places
 Results page may not let user obtain identified documents
 This may involve a built-in viewer or invoking the owning
product‘s interface.

 Combining navigators from each result set
 i.e., faceted search, taxonomies and auto-generate
clusters
 Selecting the right FS engine
 Depends on business goals, type of content sources –
structured vs. unstructured, licensed/subscriptions
19


Benefits

• Single master index
• Quicker response times
• No need to access original data sources
• Relevancy algorithms applied uniformly
• Dynamic navigators are available for all documents
• Time savings
• Searches many sources at one time
• Combines results into a single results page
• Quality of results
• Client selects the sources to search
• Minimum impact on the data silos
• Only accessed when a user performs a query
• Eliminates increased load crawling/indexing the data source

20


Benefits (continued)

• Improve productivity
• Reduces number of searches executed to find relevant results
• Save, reuse, schedule, and share effective search queries
• Leverage security controls at queried source
• Access repositories secured against crawls but can be accessed
by search queries
• Reduce costs
• No additional capacity requirements for content index since its
not crawled by search server
• Most current content
• Real time searches - as soon as the source is updated, the info is
available to the searcher on the very next query
• Increase awareness
• Identify most relevant sources to search based on # of results
each source produced

21

FDA Case Study Success
(Federated „Master Index‟ Search System)

ACTIONS RESULT
Started small with high ‘pain Increased productivity & popularity.
points’.
Modified business processes. Standardized nomenclature improved
efficiencies.
Users across organization Produced more timely & QUALITY
could find content in silos. work products.
Indexed structured & Grew from 1 repository of 500 docs
unstructured content with to 50 with 30 million docs. Accessed
document level security. on ‘need to know’ basis.
Introduced standardized Reduced development time & costs.
search web services into Increased mgmt & user acceptance.
applications. Integrated in more applications.
Increased user awareness Used more & content added. Search
with training, newsletters & requirements now captured at
meetings. BEGINNING of project development.

22


Evaluation Criteria Overview

 Identify Goals
 Create an Effective Search
Strategy
 Collect Business Requirements
 Conduct needs assessment
 Work Closely with User
Community

23

Evaluation Criteria Overview

(continued)

 Define Features and Functions
 Eliminate emotional decisions re: product,
company or others using the product
 High Precision
 Return content relevant to user‘s focus
 High Recall
 Recall everything relevant to user‘s need
 Thoroughly Research
Products, Users & Product
Reviewers
24


Sample Evaluation Criteria
Rating Criteria Importance Product #1 Product #1 Product #2 Product #2
(Rank 1-5) Score Weighted Score Score Weighted Score
(0-100) (Rank x Score) (0-100) (Rank x Score)

Ease of Use 5 85 425 70 350
Ability to Customize UI 1 80 80 65 65
Speed 5 90 450 85 425
De-duplication 4 75 300 75 300
Clustering 4 85 340 80 320
Help Functionality 3 70 210 0 0
Alerts 4 90 360 50 200
# of Searchable Sources 3 90 270 80 240
Save Selections/Citations 2 85 170 0 0
Security 4 90 360 85 340
Product Cost 5 75 375 85 425
Vendor Credibility 4 95 380 85 340

Total Weighted Score 1010 3720 760 3005

25
-Courtesy of Federated Search Report & Tool Kit

FSS Example

(uses FAST ESP – Vertical Search)
Features of Interest

26

FSS Example

(uses MS & Vivisimo)


27

FSS Example
(uses Deep Web Technologies)


28

FSS Example
(uses Webfeat)


29

Digital Library FSS Example

http://www.calisphere.universityofcalifornia.edu/


30

Digital Library FSS Example

http://www.calisphere.universityofcalifornia.edu

1 2

3

31

FSS Example
(LibraryFind® developed by Oregon State Univ Libraries)


32

Semantic Federated Search
(prototype by Collexis & Deep Web Technologies)

SOURCES:

•PubMed
•NCI=Nat‘l Cancer Inst
DeepWeb Technologies (a federated search provider) and •DTIC=Defense Tech. Info Ctr
•PMC=PubMed Central
Collexis (a developer of semantic search & knowledge •ScrDOEIB=DOE Info Bridge
discovery solutions) teamed up to deliver the world’s first •Eurekalert=Science News

semantic federated search. THESAURI Used:

•MeSH
•DTIC=Defense Tech. Info Ctr

•How does semantic federated search work?
•All results from your initial query are processed
through one or more thesauri. (i.e., MeSH & DTIC.)
•The system then returns terms that are found both in
the top results and in the thesauri.
33


Collexis & Deep Web Technologies
(Search Results – screenshot 1)

Unlike clustering, which
simply lumps together
words that are
frequently found near
each other, these terms
are being suggested
from an expert-
developed thesaurus
(taxonomy) in which 2429 hits
terms are meaningfully
& consistently
organized.

The longer the
Semantic terms. blue bar, the
more semantic
evidence found
for that term.

34


Collexis & Deep Web Technologies
(Search Results – screenshot 2)

•Clicking on term
“Mental Recall” from
prior screen added
term to search, reduced
relevant hits to 3; &
terms suggested are
organized.

•Thesaurus-based search will
consistently suggest terms in
the same organized way.
•Clustering changes the way it
organizes suggestions with
every query.
• Clustering tends to be useful
for very broad, general or
unpredictable content.

*Thesaurus-based semantic search tends to be better
when you are working consistently in knowledge
domains, such as medicine, physics or electronics.
35


Best Practices

Strategically plan how to deliver your
mission and just DO IT!

Do proof of concept – demos can be
deceiving

Establish common set of standards &
governance model

Measure results by establishing key
performance indicators

Leverage lessons learned to reduce
project cycles, increase trust and
empower communities
36


Future Vision

Personalized Search
• A simple, persistent box on a users‘ browser, cell, or entertainment screen
that initiates a search based on what the user was doing, their previous
keystrokes, & perhaps using historical data.

Better Quality of Search Results
• Number of results retrieved, Relevance Ranking, De-Duplication

Enterprise Mashups

• Combine real-time searching with social networking tools, maps, etc.

Users build the index by their searching

• Know Web pages people display, what‘s on them & what apps are
showing up on users' computers
37


Future Vision (continued)

Query analysis & predictive modeling on the fly

• Business users expect to access info behind company firewalls &
from the larger web world using the same tools and consistency

Improved Navigators, Facets, Clustering

• Filter result sets dynamically for more relevant results

Web of Interconnected Data

• Automate analysis of database structures and cross-reference
results. Ex.- Health site cross-references data from pharmaceutical
companies with the latest findings from medical researchers

Visualization Technologies

38
• Enable extreme-scale knowledge discovery


Resources
1. Great resource for many Federated Search topics:
www.federatedsearchblog.com – Author: Sol Lederman

2. Open Source & commercial search components & tools list:
http://tinyurl.com/l3w8of

3. Federated Search Vendors: http://tinyurl.com/92s8qv

4. Deep Web Databases: http://tinyurl.com/yam3sw

5. Deep Web resources: http://www.internettutorials.net/deepweb.asp

6. Digital Image Resources on the Deep Web: http://tinyurl.com/46vcqp

7. Info on Vertical Search Engines: http://tinyurl.com/lpcufw

8. 50 Niche Search Engines: http://tinyurl.com/lukxwx

9. Library of Congress FS Portal Products/Vendors list:
http://tinyurl.com/l6mdy8

10. Resources to Research & Mine the Deep Web: http://tinyurl.com/6g5768
39


References
1) ―What‟s in a Name: Federated Search‖ – Miles Kehoe, New Idea Engineering, Inc,Vol. 4 No.4 8/07
2) “Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap 2004 (Reprint of
article by Donna Fryer www.SearchitRight.com )
3) “Growing Up With Federated Search” - by Walt Warnick, OSTI
4) “Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network: Part 3” – Walt Warnick,
OSTI
5) “Vertical Search Engines & the Deep Web” - Laura B. Cohen http://www.internettutorials.net/
6) Blog: www.federatedsearchblog.com – by Sol Lederman
7) “Exploring a „Deep Web‟ that Google can‟t Grasp” - NYT 2-23-09 http://tinyurl.com/mvt42f
8) “Federated Search Primer, Part I-III” – by Sol Lederman
9) www.searchdoneright.com – by Vivisimo –Raoul – CEO & Cofounder
10) “Enterprise Search Grows Up‟”- Podcast from BizTalk
11) “Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08
12) “The Future of Federated Search or What Will the World Look Like in 10 Years” – Rich Turner
13) “Federated Search Report & Tool Kit” – Jill Hurst-Wahl, 10/08, © Free Pint Limited 2008

40


QUESTIONS

41


THANK YOU!

Helen L. Mitchell Curtis
Principal

hmitchell5@gmail.com

410-472-4631(w)
410-259-7766(m)

42
42


“Results Driven…Exceeding Expectations”

43

Federated Search Webinar for SLA (Special Libraries Assoc.)

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Federated Search Webinar for SLA (Special Libraries Assoc.)

Similar to Federated Search Webinar for SLA (Special Libraries Assoc.) (20)

Recently uploaded

Recently uploaded (20)

Federated Search Webinar for SLA (Special Libraries Assoc.)