More Related Content Similar to Solving Real World Challenges with Enterprise Search Similar to Solving Real World Challenges with Enterprise Search(20) More from Agnes Molnar(13) Solving Real World Challenges with Enterprise Search2. Introduction – Agnes Molnar
International SharePoint Consultant
• 10+ Years SharePoint Experience
• Information Architecture & ECM
• Search
SharePoint Server MVP
• 6 Years SharePoint Server MVP
• 5+ Years Speaking at Conferences Around the
World
• Numerous Books, White Papers, Articles
Contact
• E-mail: aghy@aghy.hu
• Blog: http://aghy.hu
• Twitter: @molnaragnes
2
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
4. Information Overload OR Filter Failure?
Source - http://financiallyeliteblog.com/wp-content/uploads/2011/04/information-overload.jpg
6. Search is Easy…
Find is the real challenge!
6
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
7. Search as an Application
Source: http://www.domorewithsearch.com
7
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
8. Search as an Application
Search is no longer the white box
Content lives in disparate locations
Structured and unstructured content lives in different locations
Need to aggregate content according to
Process
Context
Customer
Goal
Program
Parameter of any of the above
8
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
9. User – Context – Content
Context:
Business models & goals, corporate
culture, resources
Context
[Where information is used]
Content:
Document types Objects, structure,
attributes, Meta-information
[How to describe the information]
Users:
Information needs, audience types,
expertise, tasks
Content
Users
[How to Use the Information]
9
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
10. Requirements Gathering
Types of
Content
Types of
Users
Users’
Behavior
Content
Sources
Metadata
Actions to
Take
Amount of
Content
Current
“Pain Points”
10
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
11. Search is more than Technology
Source: http://searchpatterns.org
11
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
12. The Complexity of Enterprise Information
What we give to the search engine…
What the search engine sees…
Title
Author
Created Date
Modified Date
File Type
…
Overview of SharePoint 2013 Preview Installation and Configuration
Alex Yarrow
06/21/2012
10/16/2012
docx
…
12
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
13. Explicit metadata versus implicit metadata
Content Type =
License
Explicit metadata
ABC Company
Organization =
DEF Company
Topic =
Forward Index – Words per document
Inverted Index – Documents per word
Support
ABC shall provide first level technical support
to all Licensed Product end users and/or
Sublicensed Product customers/users. DEF
will provide second level support. DEF shall
provide to ABC a primary and a secondary
support person to act as the primary interface
with ABC’s technical and customer support
team. DEF shall provide direct technical
support to ABC for all uses of the DEF
Software. Support level definitions and
responsibilities are set forth in Exhibit C. An
“SLA Failure” as defined in Exhibit C shall
qualify as a Release Condition sufficient to
authorize the Escrow Agent to release to
Source Code to ABC pursuant to Section 7
and the Escrow Agreement.
ABC
customers
customer support
customer support team
DEF
DEF software
end users
escrow agreement.
escrow agent
exhibit c
licensed product
release condition
section 7
secondary support
SLA
SLA failure
software
source code
support level
sublicensed product
technical support
Implicit metadata
13
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
14. The Complexity of Search
Result Block
Data Source
Content Source
Result Block
Data Source
Query Rule
Query Rule
Query Rule
Result Set
Display
Templates
Content Source
Data Source
metadata
Content Source
Data Source
Local Search Index
Refinement Panel
Result Source
Indexing
Hover Panel
Federation
Result Source
Remote Search index
14
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
15. Requirements Gathering
Information-Seeking Patterns
„I know what I’m searching for and know how to do that”
„I know what I’m searching for but I don’t know how to do that”
„I don’t know what I’m searching for”
„Am I Searching?...”
15
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
17. Content Inventory
“I have a lot of content, but I don’t know what to do with them…”
17
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
18. Content Inventory
SharePoint content (2013, 2010, …)
File shares
Internal communication
Business Data
Company public web site
Professional Know-How Web Sites
(finance, IT, development, etc.)
Common interest
(stock, management, etc.)
Exchange Public Folders
Sales repository (RFPs, proposals, etc.)
Marketing documents (DMs, brochures, etc.)
Web sites
Intranet
Department sites
Project sites
Internal KB
Data from databases
Custom connector
SAP data
CRM data
18
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
20. Crawl or Federate? – Where to get the content from?
Crawl + Use Local Index:
Examples:
Pros:
Full control over the index (crawl schedule, metadata included, etc.) and ranking model
Results can be aggregated into one result set
Common refiners (facets)
Cons:
Intranet
Company file shares
Needs resources for the crawling process
Needs storage to store the index
Federate:
Examples:
Pros:
Professional know-how web sites (TechNet, MSDN, etc.)
Internet results for a specific topic (financial news, stock information, etc.)
3rd party Content Management System
Doesn’t need resources to crawl / store the index
Cons:
Live Internet connection is required
No control over the index
No control over the ranking model
No real aggregation with other result sources
20
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
21. Content Source Inventory
Name
Type
Location
Owner
Volume of
Content
Frequency of
Updates
Intranet
SharePoint
http://intranet
Intranet Team
200K items
100-300/hr
Project Sites
SharePoint
http://projects
Delivery
200K items
100-200/hr
Sales share
File share
X:Sales
Sales
500K docs
300-500/hr
Marketing share
File share
X:Marketing
Marketing
200K docs
300-500/hr
Company web site
Web site
http://mycompany.com
Marketing/
Publishing Team
<100K pages
1-10/day
Competitor’s web
site
Web site
http://competitor.com
[external]
<100K pages
1-10/day
Professional
Know-How
Web site
http://www.mykb.com
[external]
<100K pages
5-10/week
Company
Announcements
Exchange
Public
Folder
Exchange/Public
Folders/Announcements
Marketing/
Internal Comm.
Team
<100K items
5-10/day
HR data
Business
Data (SQL)
SQL database
HR
<100K items
10-100/day
CRM data
Custom
Connector
CRM system
Sales
500K entries
500-1000/hr
21
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
22. Metadata in Search
The “glue” of Search Applications
Crawled property:
metadata extracted from the documents/items during the
crawl.
Managed property:
mapped to crawled properties, controlled by Search Admins,
helping users perform more efficient and successful queries:
Refiners
Displayed in Search Results
Sorting Properties
22
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
23. Metadata in Search
Crawled Property
Managed Property
Usage
Refiner
Author
Display on
Result Set
CreatedBy
Author
Display on
Hover Panel
From
Sorting by
23
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
24. Using Managed Properties
In Query
Rules
Refinement
Result Type &
Display Template
On Hover
Panel
24
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
25. Security
Users can see what they have access to.
vs.
Users cannot see what they don’t have access to.
25
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
26. The Search Security Paradox
As Search is deployed further and further into the Enterprise, the likelihood of
having a security problem increases.
26
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
27. Sizing and Capacity Planning
“Sounds good, but I’m not sure if we have resources for this…”
27
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
29. Components – Scaling cheat sheet
Component
CPU
Network
Disk
Memory
Search administration
Crawling
Content processing (CPC)
Analytics processing (APC)
Index
Query processing (QPC)
29
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
30. Sorting the Results – Relevance Ranking
Requirements:
“I’d like to see ALL the relevant results.”
vs.
“I don’t want to see anything that is not relevant
(to me, in this context).”
30
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
31. User Experience
Recall: the fraction of relevant instances that are retrieved
Precision: the fraction of retrieved instances that are
relevant
Relevance: how well a retrieved document or set of documents meets
the information need of the current user, in the current context
Ranking: the order in which the search results for a query appear
31
31
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
32. Sorting the Results – Relevance Ranking
Various elements can be monitored, interpreted or used in calculation
of ranking
These can be tuned and weighted in different ways to impact results
Element
Description
Freshness
Authority
Quality
Geo
Age of a document compared to the time when the query is issued
Importance of a document determined by the links to it from other documents
Assigned importance of a document, independent of the query
Importance of geographical distance between a document’s associated latitude/longitude
and a target location specified in a query
Context
Proximity
Importance of matching a query in a given document field
For multi-term queries: the shorter the distance between query terms in a document, the
higher the document’s rank value
Position
Frequency
The earlier a query term occurs in a field, the higher the document’s rank value
The more frequent a query term occurs in a document, the higher the document’s rank
value
Completeness The greater the number of query terms present in the same field of a matching document,
the higher the document’s rank value
Number
For multi-term queries; the more query terms matched in a document, the higher the
document’s rank value
Reference: Okapi BM25
http://en.wikipedia.org/wiki/Probabilistic_relevance_model_(BM25)
32
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
33. Search Analytics
“How to Improve the Search Experience?”
33
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
34. Search Analytics in SharePoint 2013
•
Usage Events – As users interact with content in SharePoint, actions are captured and
stored as events (click a link, press a button, view or open a document).
•
Access and create experiences using data captured in the analytics database.
34
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
35. Search Analytics – Examples
35
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
36. Search Analytics – Examples
36
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
38. Want to Learn More?
SP41 How to Manage and Troubleshoot Search – A Practical Guide
POSTCON03 Architecting the Optimal Enterprise Search Strategy
Blog: http://aghy.hu
The Essential Guide to Enterprise Search in SharePoint 2013 (free e-book)
http://www.bainsight.com/pages/sharepoint-search-2013.aspx
Search Circle (subscription service for Search Managers)
http://www.intranetfocus.com/enterprise-search/thesearchcircle
SharePoint Videos – online trainings: http://www.SharePoint-Videos.com
Code for 30-days free access: SPC12Free
Online webinars and trainings for IA and Search Managers
http://earley.com/Training-Webinars
38
© DEVintersection. All rights reserved.
http://www.DEVintersection.com
Editor's Notes Source: http://financiallyeliteblog.com/wp-content/uploads/2011/04/information-overload.jpg No longer within the firewallRelevance is criticalSearch within the organization„Transparent” SearchSearch Driven Applications Management by Walking Around “Join” by…FilterRefinementDisplaySort/Order Resource: Configure properties of the Search Box Web Part in SharePoint Server 2013 (http://technet.microsoft.com/en-us/library/gg576963.aspx).Entity Extraction for other content sources Search “opens up windows” but not a “security leak”!!Plan!!Research on SOURCE SYSTEM, involve the admins there!!TestOn Source systemOn SearchInvolve:Source system key usersSource system adminsTest users (<7)More test users the relevant items are to the left of the straight line while the retrieved items are within the oval. The red regions represent errors. On the left these are the relevant items not retrieved (false negatives), while on the right they are the retrieved items that are not relevant (false positives). New analytics processing component analyzes content in the search index and user actions that were performed on a site to identify items that users perceive as more relevant than others.Number of ViewsNumber of ClicksOverall item usageRecommendationSocial distance… Jeff