Published on

Search in SharePoint 2013

Published in: Software, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Three important facts about architecture – Parsers, Custom Entity and Web Service Call-out.
  • This slide shows all that we had up to this point in SharePoint. This is a kind of look back of our Search in SharePoint 2010.In SharePoint 2010 we had two different Search engines, SharePoint Search and the FAST search Server.SharePoint search was an easier deployment that focused on enterprise / portal searching.Fast was more of a search platform meant more for large scale and extensibility. With extensibility and configuration also came some complexities.Each platform had its pros and cons and often there was misunderstanding or misconceptions with the different platforms.
  • What we have in this latest version is just one thing which we need to worry about. In 2013 we have 1 search engine and the idea was to take the best of each of the platforms in 2010 and build on top of that.So for example we have the extensibility and relevancy of the FAST search engine, but more of the ease and familiarity of configuration and management of the SharePoint Search engine.
  • This architecture slide covers some of the things you can do with search and how it is an extensible platform. For example Enterprise Search, People, and Site search, and Media Search are all OOB things that people are familiar withThe blue bars are those that talks about various capabilities that comes out of the box with this versionBut with this release we really started building common patterns in the search platform. One example of that is the Topic Pages and the Content by Search. These features greatly enhance our WCM capabilities and allow for search-driven sites in this version of SharePoint. If we are building an Internet Site that is based off of a product catalog users can select products by a category or see related products by using Content by Search and these topic pages. This is an experience we are all familiar with on public facing sites. Another area where this pattern might be useful is in Knowledge management sites.Another pattern that is now handled by Search is the My Tasks is in Project. This was a common pattern in previous versions to build a search based application to over come the problem of tasks being assigned in multiple sites and users not being able to easy track these tasks. This is now being handled by the search platform. The My Tasks is a way to pull together Tasks and show them in the MySite.An important part of this platform is we built the platform to make Search more easily extensible by Customers and Partners and we will talk more about the extensible through the later parts of the search module.
  • Before going into details of extensibility it is important to understand the base components of Search and how it is architectured.Remember this platform was a combination of the FAST search engine and SharePoint. In 2010 FAST really didn't crawl, it allowed push into index (Content API), while SharePoint crawledIn SharePoint 2013, the Content API is gone so if you are a FAST person and familiar with leveraging the Content API that is no longer available. We moved completely to a crawling paradigm as apposed to a content push paradigm that FAST previously had. In order to make sure the content is up to date the crawler now runs continuously. Continuous crawl focuses on a small number of things to be efficient. For example: Continuous crawl is for the SharePoint index only so although we may have other sources outside of SharePoint, the continuous crawl is only focused on SharePoint content. It makes use of the change log to pick up changes. Continuous crawls run in parallel and do not wait for previous threads to complete so it updates quickly Continuous crawls do not retry errors from previous crawls, so if a threads errors off on a piece of content that will not be retried and it will wait for the incremental crawl to pick it back up. What this means is we will still need Incremental crawls. Security changes are included in the Continuous crawlsYoudon't really need a dedicated server for continuous, but you could. Depends on resource usage.The Full and Incremental crawls still exist, and are still needed and be used for similarly how they were in 2010. Full is still required under the same conditions as SharePoint 2010 ( Incremental is required for security changes Continuous for changes noticeable by end usersSo you have this vision of continuous crawls running all the time, incremental crawls running periodically throughout a day and a fully crawl running maybe once a week. Those three types of crawls will keep the index updated and healthy.We still have the concept of Managed Properties, much like previous versions we have crawled properties that get created by the crawler then we create managed properties to allow us to keyword query language searches against the index. Managed properties are Administered at the Site Collection level. In O365 you can do it on a tenant admin level.Lets walk through the different components of the Search Architecture.ArchitectureOver on the left we are talking about the different Content Sources. In any Search architecture there are two things that need to happen. We need to be able to connect to the content Source and find any documents inside. Then the search needs to get the documents and be able to crawl the contents of the document itself to build up the index. The way that we do that is through .Net assembly Connectors. And working through the document content is done through parsers.In previous versions of SharePoint we used the connector Framework for Search.The Parsers are new to the 2013 search architecture, but the concept is not. The Parser fills the role of the iFilter from previous versions of SharePoint. The iFilters have not gone away completely and in later parts of this section we will talk in more detail about the Parsers.The next item in our architecture diagram is the content Pipeline. This is really just the processing of what the crawler finds to pass it along for building up the index. The CTS Runtime is called our here and it does the processing of the content, but it is not extensible with one exception, which is the Web Service call out. The Web service Callout is a synchronous call to a Web Service for doing additional Content processing. We will talk more about the Web Service call out later, but basically it allows for a web service to send Managed Properties from an item to a Web Service that can then modify the content of the property and send it back into the pipeline. This gives you a way to perform specific tasks on items and modifying the properties before they get added to the index for that item.The next stage is the Analyzer and Indexing Engine.Analyzer - Process user behavior (click analysis). Supports things like recommendations based on behavior. There is an extensibility story for that.Indexing Engine – Indexes contentWe now have an index and on the right side of the diagram are the components used for querying the index.There is the Query Engine thatexecutes queriesThe Query Pipeline is the functionality for processing queries. IMS Runtime is the part thatprocesses queries, but No real extensibility story. Then the REST Service which Execute queries through REST in this diagram we are showing maybe a Custom non-SharePoint search application may be calling the REST Service or a Windows Phone calling the REST Service.You can also see we have the Client Framework where we can Execute search queries through CSOM much like how we may access items of a list through the CSOM model.
  • There are lots of other new enhancements that are designed to improve and manage search. One of those enhancements that we wanted to point out is the ability to export and import Search settings, so in this version of Search we are provided with a CSOM API to export and import Search settings. This capability is used for things like moving search settings from Dev -> UAT to PROD.In previous versions we had to write a lot of PowerShell scripts to handle some of the recreation of the Managed Properties and Scopes across the different environments. In this version this API should help with these migration scenarios across environments.While it does handle Rules, Sources and Managed properties, it does not handle Master Pages, Templates or Web Parts.
  • Result sources replicate scopes in SharePoint 2013. we don’t use scopes anymore but we use Result sourcesResult sources allows us to focus our users results source and in doing so we can create slices, subsection of the index based on rulesUnlike sp2010, which was entirely property based, we can create rules based on analytics. For instance, one of the result source is popular content. We can execute a query against the index to return results that we want to see in the result source. This is very capable in SharePoint 2013.Finally we can perform analysis using these result sources. Like I said, you can do popular content, you can do content created in the last five days based on out of the search index or based on analysis from the analysis system.
  • Creating a new result source named your documents to list those documents authored by you.Use the below query:{searchTerms} Author={User.Name} IsDocument=1Create a page called YourDocumentsGo to site’s Search settings to add the new search vertical and point it to newly created page.
  • In order to be familiar with Search system, you need to know how Managed Properties and crawled properties work together in this version. There is some new interesting features in 2013Fundamentally Sharepoint discovers properties during the crawlColumns values in SharePoint Lists and LibrariesDocument Properties with ValuesBCS Columns as a result of BCS data crawlThese crawled properties are categorized based on where they come from, then crawled properties are createdIf they come from sharepoint list column, it becomes ows_columnnameSharepoint site columns are different and all those columns with data are named as ows_q_afour letter code_column nameSimilarly say others
  • Managed properties are special. They are created from crawled properties.If you want to use a managed property in search, then create it for things like refinement and display.Field names are not the same as managed property names and profile property names are not the same as managed property names.
  • Managed Properties are not only for Admins to create anymore and this is yet another improvement made in SharePoint 2013.We can create it from the farm level, Site Collection level and even Site level. Site administrators and Site Collection Administrators can use some of the pre defined managed properties and use for the site. This is possible for office 365 too.When working with Managed property, we have the some property controls available. For instance, you decide what is the data type for managed property. You also have a multivalue control which allows you to store more than one managed property and you get to decide for that managed property…if it is safe to use, can it be searched, can it be queried, or can it be sorted.Finally, SharePoint 2013 creates a managed property for all the site columns created in the format sitecolumnameowsdatatype
  • Create a new managed property and attach it with a crawled property.Show how to create from site collection and also show how to match a date property to existing date managed properties.Also show RefinedDate properties which are already exist and show how to map to the crawled property.
  • Query rules Are very powerful part of the whole search experienceEssentially query rules if you are thinking in 2010 replaces the best bet technology. It does more than that. Just to say that it replaces best bet technology is really underselling them.If we create what we call as promoted results, we will be creating a better best bet than we had in sharepoint 2010 and these results are always ranked above the rest of the results.So they always show up a the top of the search results pageWe can also use query rules to present a result block and result block is a new query result that we float inline with the ranked result. We can pin on the top or we can float with the relevance of the rest of the content of the page.We can also act on user’s intent. If they are searching for powerpoint slide deck, they might look for sharepoint deck. And might not know that this is going to return only powerpoint files. SharePoint has inbuilt rules that if they search for deck, it shows all the powerpoint that has sharepoint word inside the files.Like wise doc is understood by document when searching.We can also use to change the ranking of the results. We can impact the results by using query rules. We have start date and stop date and we also have a review date and the user can review the expiration of the contract by using query rules.
  • SharePoint Experts. Create from Create Query rules and give the following querySkills:SharePointAnd set it to have blocked-viewNext demo is for Promoted result. Cook up some page and make it show on top of the page when people search for SharePoint
  • SP 2013 uses display templates to render search results.Display templates are essentially html and javascript . and no more XSL in the content of search or in the search webpart anymore.We get to use display templates that are then rendered by the search visualization system.We work with one result at a time now with display templates and their hovercard. And the data passed between the two so that we are able to work with discrete results. Rather than what we had to do in sp 2010 which is work with entire sets of results.the other good thing is that the refiners use them too. So the whole system is common for all our search results rendering.Result types are how sharepoint knows to render a display template. So you can apply to all sources or single result source, then you can apply based on a property condition.All of this is going to make a lot more sense in a demo
  • Adding Title for Job Title to be shown when search results are displayed.
  • Refiners in sp search allow urendusers to further filter the results of their search based on the metadata thats contained within the resultsso in Sp 2013, the good news is the refinement still uses Display templatesso if you know how to create a display template, you are pretty much on your own to create a refiners too.You may also see them referred to as filters as they are actually filtering and contained within the filtered folders in the master page catalogbut out of the box, you get refinement item, multivalue refinement item, slider, and slider with a bargraphfirst two textual item and last two numeric
  • The indexer needs a way to access a system and parse the contents it finds there.We are using Connectors for Accessing systemsParsing the contents is handled by “Parsers” and “Format Handlers”. Parsers detect the document format (and do not rely on the document extension). They then call the appropriate Format Handler to parse the document.In previous versions we used iFilters. But this combination of Parsers and Format Handlers are more sophisticated in determine the documents in a repository and parsing the document content into the index appropriately. -----Whenever we access content source and crawl it, we need 2 things. First we need ability to access the content source and secondly we need ability to crawl items inside them. We had mention that we use .Net assembly connect to crawl inside external content source and use parser to crawl individual items that we find there. Parser is more sophisticated than iFilter where we have 2 concepts. Parser and format handler. Parser detect the document format and does not rely on document extension to identify the document type. Than it calls appropriate format handler to do the parsing. Now OOB Parser can detect more document formats than supported by format handler so iFilter are not gone complete. Wherever Parser does not find appropriate format handler it will revert back to calling iFilter to do the parsing. Deep link extraction identifies relevant headers in documents and display header and its corresponding content in as preview when we hover on search result link. Visual Meta data extraction pulls title authors and dates instead of relying on metadata properties. 
  • Custom Entity ExtrationAllows you to Plug your own dictionary into the system. Dictionary is a simple text file. First line is title. Then a list of terms . You can have a comma after the term to normalize the term. Import the dictionary in with the PowerShell cmdlet.You use this capability when you want to refine by something that doesn’t have a managed property defined. For example, say you have a corpus of medical documents and you want to refine by the commercial name of various medicines. No metadata property was set for the items. So you want to extract that information from the document itself. You could then define a dictionary like this:MedicalDictionaryAdvilLipitorAmbienA refiner would get created for this extraction.There are 12 custom extractor “slots” you can use.There are Word Matching and Substring Matching and Case insensitive / Case sensitive slots that you can put the dictionary into.
  • Web service call out allows you to modify managed property values or add new propertiesExample: You can add additional information like a rating from a web service that’s not part of the metadata information normally.Content Pipeline is the one place where you have access to all of the searched items before the index gets created.Data Cleansing: You can use this to normalize data like making «MSFT» or «Microsoft» into «Microsoft Corporation» by changing the value of a managed propertyEntity Extraction: Allows for adding managed properties to a document that didn’t exist before based on values in the body of the documentClassification and Tagging: Allows for adding managed properties to a document that didn’t exist before based on classification rules (for example this looks like a quarterly report, so we may add addtional managed properties on it for quarterly reports).The web service client is configured with:SOAP RPC endpoint- Implements a well-defined interfaceOptional SSL transport securityTrigger condition: checks existence/values of managed properties before doing call-out (allows for rules that determine when the callout happens)We don’t want the websevice called for all items going through the pipeline. Input managed properties: set of managed properties to send to web service, includes read-only managed propertiesOutput managed properties: set of managed properties returned from web service, can not include read-only propertiesFailure mode: if web service generates error, either log warning and index document OR fail document and return error code to crawler There are size limits for each property returned from the web service + a total size for the message.Allows you to modify manage property and also add or delete one, before the crawling is completed. We can use them for data cleansing. Eg: People are tagging document with company information. So some people will write MS for Microsoft or Microsoft corporation for Microsoft. So using web service call out mechanism we can normalize this data to Microsoft. We can also add new managed properties before the crawling is completed.
  • Search

    1. 1. Sea rch SharePoint 2013 Gayathri Narayanan Senior SharePoint Consultant, NCS (P) Ltd http://gai3kannan.wordpress.comSharePoint Day @NCS
    2. 2. Agenda  Search Overview  Search Architecture and APIs  Search Verticals and Results Presentation  Parsers  Custom Entity Extraction  Web Service Call-out SharePoint Day @NCS
    3. 3. Search in SharePoint 2010 SharePoint Day @NCS Search Overview
    4. 4. Search Overview Search in SharePoint 2013  Single Extensible Platform • FAST Engine • SharePoint Crawler • Best of both!  Same Search Platform in both SharePoint and Exchange SharePoint Day @NCS
    5. 5. EXTENSIBLE Search Platform EnterpriseSearch PeopleSearch SiteSearch VideoSearch TopicPages ContentbySearch MyTasks CUSTOMER PARTNERS Search Architecture SharePoint Day @NCS
    6. 6. Crawl and Connectors Content Pipeline Query Pipeline Indexing Engine Query Engine Client Framework Enterprise Search Portal SharePoint Sites and Portals SharePoint topic and content pages SharePoint Sites and Portals SharePoint Sites and Portals Custom non-SP Search Driven Apps Client-SideOM RESTService IMS Runtime Analyzer CTS Runtime Search Architecture SharePoint Day @NCS
    7. 7. Search Architecture and API  CSOM API • Allows targeting export and import of search settings • Handles rules, sources, managed properties, etc • Does not handle master pages, templates, and web parts • Supports migrations, DEV->UAT->PROD scenarios SharePoint Day @NCS
    8. 8. Search Verticals and Results Presentation  Results Sources  Managed Properties  Query Rules  Display Templates  Result Types  Refiners SharePoint Day @NCS
    9. 9. Search Results Presentation  Replicate Scopes  Focus Results  Execute a Query  Perform Analysis SharePoint Day @NCS Result Sources
    10. 10. Res ult Sources Demo SharePoint Day @NCS
    11. 11. Search Results Presentation – Managed Properties  During the crawl SharePoint discovers • Columns with values in SharePoint Lists and Libraries • Document Properties with Values • BCS Columns with values  Columns and properties are categorized  Crawled Properties are created • SharePoint List Columns: ows_ColumnName • SharePoint Site Columns: ows_q_<4 letter code>_ColumnName • Managed Metadata: ows_taxId_ColumnName • HTML or Multiline Text: ows_r_<four letter code>_ColumnName • Profile Properties: People:InternalName • BCS Properties: Entity.FieldName Crawl Properties SharePoint Day @NCS
    12. 12. Search Results Presentation – Managed Properties  Managed Properties are created from crawled properties  Create a Managed Property to use it in Search  Field Names <> Managed Property Names  Profile Property Names<>Managed Property Names  There MAY be a 1:Many relationship of Managed to Crawled Props SharePoint Day @NCS Managed Properties
    13. 13. Search Results Presentation – Managed Properties  Not only for Administrators anymore • Farm • Site Collection • Site  Managed Property Controls • Type • Multivalue • Query, Search, Retrieve, Refine, Sort, Safe SharePoint Day @NCS  Automatically created for Site Columns • Standard Name Format <SiteColumnName>OWS<DATATYPE>
    14. 14. Search Results Presentation – Managed Properties Automatic Managed Properties SharePoint Day @NCS
    15. 15. Managed Properties Demo SharePoint Day @NCS
    16. 16. Search Results Presentation – Query Rules  Promoted Results • Better Best Bets • Always above ranked results  Result Blocks • Execute a new query • Pin to top or float with relevance  Act on user “intent” • Deck = PowerPoint • Doc = Document  Changed Ranked Results SharePoint Day @NCS  Publishing • Start and Stop Date • Review Date and Contact
    17. 17. Qu ery Rules Demo SharePoint Day @NCS
    18. 18. Search Results Presentation – Display Templates & Result Types  HTML and Javascript  No more XSL  One result at a time + hovercards  Refiners use them too Display Template Result Types  Applied to all sources or Single Result Source  Applied on Property Condition  Link from Result to Display Template SharePoint Day @NCS
    19. 19. Dis play Template Demo SharePoint Day @NCS
    20. 20. Search Results Presentation – Refiners  Refinement in 2013 uses Display Templates  You may see then referred to as Filters  Out of the box  Refinement Item • Multi-value Refinement Item • Slider • Slider with bar graph SharePoint Day @NCS
    21. 21. Ref iners Demo SharePoint Day @NCS
    22. 22. Parsers  Parser Engine detects document format • Many formats supported OOB • Calls appropriate Format Handler  Format Handlers perform actual parsing • Custom Format Handlers implement IFormatHandlerItem SharePoint Day @NCS
    23. 23. SharePoint Day @NCS Custom Entity Extraction  Activate refiners based on custom dictionaries  Enabled using Powershell Cmdlets  12 custom extractor “slots” you can use:
    24. 24. Web Service Callout  Transform managed properties using a custom web service • Data cleansing • Entity extraction • Classification and tagging  SOAP web service • Reads a set of managed properties and returns new or modified properties SharePoint Day @NCS  Content Processing web service client • Trigger conditition controls when to do (synchronous) call-out • New or modified managed property values get indexed • Configurable error handling: Warn or Fail document
    25. 25. …to wrap up!  Search Overview  Search Architecture and APIs  Search Verticals and Results Presentation  Parsers  Custom Entity Extraction  Web Service Call-out SharePoint Day @NCS