Successfully reported this slideshow.

Solving Real World Challenges with Enterprise Search



Loading in …3
1 of 39
1 of 39

More Related Content

Similar to Solving Real World Challenges with Enterprise Search

Related Books

Free with a 14 day trial from Scribd

See all

Solving Real World Challenges with Enterprise Search

  1. 1. SharePoint Intersection Session SP40 Solving Real World Challenges with Enterprise Search Agnes Molnar International Consultant, ECM & Search Expert
  2. 2. Introduction – Agnes Molnar International SharePoint Consultant • 10+ Years SharePoint Experience • Information Architecture & ECM • Search SharePoint Server MVP • 6 Years SharePoint Server MVP • 5+ Years Speaking at Conferences Around the World • Numerous Books, White Papers, Articles Contact • E-mail: • Blog: • Twitter: @molnaragnes 2 © DEVintersection. All rights reserved.
  3. 3. Agenda 3 © DEVintersection. All rights reserved.
  4. 4. Information Overload OR Filter Failure? Source -
  5. 5. Enterprise Search Search Technology that your organization owns and controls 5 © DEVintersection. All rights reserved.
  6. 6. Search is Easy… Find is the real challenge! 6 © DEVintersection. All rights reserved.
  7. 7. Search as an Application Source: 7 © DEVintersection. All rights reserved.
  8. 8. Search as an Application  Search is no longer the white box  Content lives in disparate locations  Structured and unstructured content lives in different locations  Need to aggregate content according to       Process Context Customer Goal Program Parameter of any of the above 8 © DEVintersection. All rights reserved.
  9. 9. User – Context – Content  Context: Business models & goals, corporate culture, resources  Context [Where information is used]  Content: Document types Objects, structure, attributes, Meta-information  [How to describe the information]  Users: Information needs, audience types, expertise, tasks  Content Users [How to Use the Information] 9 © DEVintersection. All rights reserved.
  10. 10. Requirements Gathering Types of Content Types of Users Users’ Behavior Content Sources Metadata Actions to Take Amount of Content Current “Pain Points” 10 © DEVintersection. All rights reserved.
  11. 11. Search is more than Technology Source: 11 © DEVintersection. All rights reserved.
  12. 12. The Complexity of Enterprise Information What we give to the search engine… What the search engine sees… Title Author Created Date Modified Date File Type … Overview of SharePoint 2013 Preview Installation and Configuration Alex Yarrow 06/21/2012 10/16/2012 docx … 12 © DEVintersection. All rights reserved.
  13. 13. Explicit metadata versus implicit metadata Content Type = License Explicit metadata ABC Company Organization = DEF Company Topic = Forward Index – Words per document Inverted Index – Documents per word Support ABC shall provide first level technical support to all Licensed Product end users and/or Sublicensed Product customers/users. DEF will provide second level support. DEF shall provide to ABC a primary and a secondary support person to act as the primary interface with ABC’s technical and customer support team. DEF shall provide direct technical support to ABC for all uses of the DEF Software. Support level definitions and responsibilities are set forth in Exhibit C. An “SLA Failure” as defined in Exhibit C shall qualify as a Release Condition sufficient to authorize the Escrow Agent to release to Source Code to ABC pursuant to Section 7 and the Escrow Agreement. ABC customers customer support customer support team DEF DEF software end users escrow agreement. escrow agent exhibit c licensed product release condition section 7 secondary support SLA SLA failure software source code support level sublicensed product technical support Implicit metadata 13 © DEVintersection. All rights reserved.
  14. 14. The Complexity of Search Result Block Data Source Content Source Result Block Data Source Query Rule Query Rule Query Rule Result Set Display Templates Content Source Data Source metadata Content Source Data Source Local Search Index Refinement Panel Result Source Indexing Hover Panel Federation Result Source Remote Search index 14 © DEVintersection. All rights reserved.
  15. 15. Requirements Gathering Information-Seeking Patterns  „I know what I’m searching for and know how to do that”  „I know what I’m searching for but I don’t know how to do that”  „I don’t know what I’m searching for”  „Am I Searching?...” 15 © DEVintersection. All rights reserved.
  16. 16. Real World Expectations
  17. 17. Content Inventory  “I have a lot of content, but I don’t know what to do with them…” 17 © DEVintersection. All rights reserved.
  18. 18. Content Inventory  SharePoint content (2013, 2010, …)      File shares      Internal communication Business Data   Company public web site Professional Know-How Web Sites (finance, IT, development, etc.) Common interest (stock, management, etc.) Exchange Public Folders   Sales repository (RFPs, proposals, etc.) Marketing documents (DMs, brochures, etc.) Web sites   Intranet Department sites Project sites Internal KB Data from databases Custom connector   SAP data CRM data 18 © DEVintersection. All rights reserved.
  19. 19. Search Federation 19 © DEVintersection. All rights reserved.
  20. 20. Crawl or Federate? – Where to get the content from?  Crawl + Use Local Index:  Examples:    Pros:     Full control over the index (crawl schedule, metadata included, etc.) and ranking model Results can be aggregated into one result set Common refiners (facets) Cons:    Intranet Company file shares Needs resources for the crawling process Needs storage to store the index Federate:  Examples:     Pros:   Professional know-how web sites (TechNet, MSDN, etc.) Internet results for a specific topic (financial news, stock information, etc.) 3rd party Content Management System Doesn’t need resources to crawl / store the index Cons:     Live Internet connection is required No control over the index No control over the ranking model No real aggregation with other result sources 20 © DEVintersection. All rights reserved.
  21. 21. Content Source Inventory Name Type Location Owner Volume of Content Frequency of Updates Intranet SharePoint http://intranet Intranet Team 200K items 100-300/hr Project Sites SharePoint http://projects Delivery 200K items 100-200/hr Sales share File share X:Sales Sales 500K docs 300-500/hr Marketing share File share X:Marketing Marketing 200K docs 300-500/hr Company web site Web site Marketing/ Publishing Team <100K pages 1-10/day Competitor’s web site Web site [external] <100K pages 1-10/day Professional Know-How Web site [external] <100K pages 5-10/week Company Announcements Exchange Public Folder Exchange/Public Folders/Announcements Marketing/ Internal Comm. Team <100K items 5-10/day HR data Business Data (SQL) SQL database HR <100K items 10-100/day CRM data Custom Connector CRM system Sales 500K entries 500-1000/hr 21 © DEVintersection. All rights reserved.
  22. 22. Metadata in Search  The “glue” of Search Applications  Crawled property: metadata extracted from the documents/items during the crawl.  Managed property: mapped to crawled properties, controlled by Search Admins, helping users perform more efficient and successful queries:    Refiners Displayed in Search Results Sorting Properties 22 © DEVintersection. All rights reserved.
  23. 23. Metadata in Search Crawled Property Managed Property Usage Refiner Author Display on Result Set CreatedBy Author Display on Hover Panel From Sorting by 23 © DEVintersection. All rights reserved.
  24. 24. Using Managed Properties In Query Rules Refinement Result Type & Display Template On Hover Panel 24 © DEVintersection. All rights reserved.
  25. 25. Security Users can see what they have access to. vs. Users cannot see what they don’t have access to. 25 © DEVintersection. All rights reserved.
  26. 26. The Search Security Paradox As Search is deployed further and further into the Enterprise, the likelihood of having a security problem increases. 26 © DEVintersection. All rights reserved.
  27. 27. Sizing and Capacity Planning  “Sounds good, but I’m not sure if we have resources for this…” 27 © DEVintersection. All rights reserved.
  28. 28. Scaling Factors Content characteristics Search features Document freshness Query performance High availability 28 © DEVintersection. All rights reserved.
  29. 29. Components – Scaling cheat sheet Component CPU Network Disk Memory Search administration     Crawling     Content processing (CPC)   Analytics processing (APC)     Index     Query processing (QPC)   29   © DEVintersection. All rights reserved.
  30. 30. Sorting the Results – Relevance Ranking  Requirements: “I’d like to see ALL the relevant results.” vs. “I don’t want to see anything that is not relevant (to me, in this context).” 30 © DEVintersection. All rights reserved.
  31. 31. User Experience  Recall: the fraction of relevant instances that are retrieved  Precision: the fraction of retrieved instances that are relevant  Relevance: how well a retrieved document or set of documents meets the information need of the current user, in the current context  Ranking: the order in which the search results for a query appear 31 31 © DEVintersection. All rights reserved.
  32. 32. Sorting the Results – Relevance Ranking  Various elements can be monitored, interpreted or used in calculation of ranking  These can be tuned and weighted in different ways to impact results Element Description Freshness Authority Quality Geo Age of a document compared to the time when the query is issued Importance of a document determined by the links to it from other documents Assigned importance of a document, independent of the query Importance of geographical distance between a document’s associated latitude/longitude and a target location specified in a query Context Proximity Importance of matching a query in a given document field For multi-term queries: the shorter the distance between query terms in a document, the higher the document’s rank value Position Frequency The earlier a query term occurs in a field, the higher the document’s rank value The more frequent a query term occurs in a document, the higher the document’s rank value Completeness The greater the number of query terms present in the same field of a matching document, the higher the document’s rank value Number For multi-term queries; the more query terms matched in a document, the higher the document’s rank value Reference: Okapi BM25 32 © DEVintersection. All rights reserved.
  33. 33. Search Analytics “How to Improve the Search Experience?” 33 © DEVintersection. All rights reserved.
  34. 34. Search Analytics in SharePoint 2013 • Usage Events – As users interact with content in SharePoint, actions are captured and stored as events (click a link, press a button, view or open a document). • Access and create experiences using data captured in the analytics database. 34 © DEVintersection. All rights reserved.
  35. 35. Search Analytics – Examples 35 © DEVintersection. All rights reserved.
  36. 36. Search Analytics – Examples 36 © DEVintersection. All rights reserved.
  37. 37. Conclusions 37 © DEVintersection. All rights reserved.
  38. 38. Want to Learn More?  SP41 How to Manage and Troubleshoot Search – A Practical Guide  POSTCON03 Architecting the Optimal Enterprise Search Strategy  Blog:  The Essential Guide to Enterprise Search in SharePoint 2013 (free e-book)  Search Circle (subscription service for Search Managers)  SharePoint Videos – online trainings: Code for 30-days free access: SPC12Free  Online webinars and trainings for IA and Search Managers 38 © DEVintersection. All rights reserved.
  39. 39. Questions? Don’t forget to enter your evaluation of this session using EventBoard! Thank you!

Editor's Notes

  • Source:
  • No longer within the firewallRelevance is criticalSearch within the organization„Transparent” SearchSearch Driven Applications
  • Management by Walking Around
  • “Join” by…FilterRefinementDisplaySort/Order
  • Resource: Configure properties of the Search Box Web Part in SharePoint Server 2013 ( Extraction for other content sources
  • Search “opens up windows” but not a “security leak”!!Plan!!Research on SOURCE SYSTEM, involve the admins there!!TestOn Source systemOn SearchInvolve:Source system key usersSource system adminsTest users (&lt;7)More test users
  • the relevant items are to the left of the straight line while the retrieved items are within the oval. The red regions represent errors. On the left these are the relevant items not retrieved (false negatives), while on the right they are the retrieved items that are not relevant (false positives).
  • New analytics processing component analyzes content in the search index and user actions that were performed on a site to identify items that users perceive as more relevant than others.Number of ViewsNumber of ClicksOverall item usageRecommendationSocial distance…
  • Jeff
  • ×