Planning SharePoint 2013 Search for IT PROs


Published on

Similar content over at
Twitter - @benjaminathawes

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • In SharePoint Server 2010, host distribution rules are used to associate a host with a specific crawl database. Because of changes in the search system architecture, SharePoint Server 2013 does not use host distribution rules. Instead, Search service application administrators can determine whether the crawl database should be rebalanced by monitoring the Databases view in the crawl log.In a result source, you can also restrict queries to a subset of content by using a query transform. For example, the pre-defined "Local Video Results" result source uses a query transform to return only video results from the local SharePoint index. In SharePoint Server 2010, you configured this kind of query restriction by using search scopes.
  • The event store provides information on front-end events to the analytics processing component. E.g. the number of times an item is viewed to improve relevancy.
  • We do support 3rd party ifilters but a format handler is not the same as an ifilterNugget: You cannot replace the new pdf format handler in SP2013 with a 3rd party ifilter (adobe, foxitetc) – so if SP2013 pdf search isn’t your cup of tea you are stuck with it (for now, design change request has been submitted by me to get this modified so we can override it)
  • Technically in sp2013 the index partition (replica) and query component are still together on the same machine. We have a new component, the query processing component that submits queries to the index component but the index component still does the same job as the old 2010 query component by sending back the query response to the query processing component. You still need one index component per partition replica.
  • Add Microsoft topologies from clients
  • Why so much space for index component? What changes need 4 * index space?
  • Continuous crawl is lightweight – has no retry logic so skips inaccessible items hence the need for a full crawl to clean up and remove items that are no longer available. For example, an incremental crawl will skip a webapp if it isn’t online but wont retry. A full crawl will retry and count the fails, if it fails x times in x days it will remove the failed entry. In short SharePoint Search cannot survive by continuous crawls alone 
  • Add Microsoft topologies from clients
  • Planning SharePoint 2013 Search for IT PROs

    1. Planning SP2013 Search for IT PROsNottingham 2012
    2. The boring stuff (disclaimer)Most of the content of this presentation was puttogether using material and tests based on SharePoint2013 Release Preview. Although a lot of it is still relevantto the RTM version, it is provided “as-is” and there is noguarantee as to its accuracy.Additionally, any opinions stated are that of the authorand do not represent the views of Content and Code orMicrosoft.
    3. About me…
    4. Are all Search engines created equal? = ?
    5. Are all Search engines created equal? Enterprise Search is a different animal, right?
    6. The Enterprise Search market (Gartner) 2013? 2006 2009
    7. Introducing SharePoint Server 2013 SearchFAST architecture integrated along with UI improvements: AJAX, preview panes, results blocksimprovements from Bing; continuous crawl Content Search Web part reduces need for custom code
    8. What’s new under the hood for IT PROs? Search capability 2007 2010 2013 Preview Architecture SSP Service App Service App Configurable Query Query Query Processor Components Index Crawl Crawl Admin Admin Index Partition Index Partition Content Processor Analytics Processor (Replaces SP2010 Web Analytics SA) Databases Search Crawl Crawl Admin Admin Property Property Analytics Link Resiliency No index HA (although No admin component Admin component we had query HA) HA redundancy Management Central Admin Central Admin Central Admin – except STSADM STSADM topology changes PowerShell STSADM PowerShell Scheduling Full/Incremental Full/Incremental Full/Incremental Continuous
    9. Quick DemoLets find stuff in SharePoint 2013!
    10. Architecture
    11. SharePoint Server 2013 Search Architecture
    12. 1. Crawl Component • Invokes connectors to retrieve items and metadata from Content Sources • Crawl DB stores crawled item history • Discovers content and metadata (e.g. Author, Title, and Creation Date) collectively known as crawled properties • Delivers crawled properties to the Content Processing Component
    13. 2. Content Processing Component• Parses crawled items using format handlers and 3rd party iFilters• Reports crawled properties to the Search Admin Database• Writes URL information to Link DB for usage by Analytics Component• Nugget from Neil Hodgkinson (Microsoft): for now we are stuck with the default PDF format handler.
    14. 3. Analytics Processing Component • Replaces SP2010 Web Analytics • Analyses crawled items and user interactions with Search results (e.g. clicks, recommendations) • Results fed back to the Content Processing Component to improve relevance • Scales well – additional APCs or databases can be added for additional throughput/capacity
    15. 4. Index Component • Central part of Search capability – used in both feeding and Query processes: • Feeding – writes items received from Content Processor to index file • Query – provides results set to the Query Processor (similar to the “query” component in 2010) • Physically moves index files in response to Search topology changes. • Stores ACLs in disk index
    16. Scaled out SP2013 Search IndexCentral Admin – 1 partition, 2 replicas Marketecture Use Get-SPEnterpriseSearchStatus to find the Primary Replica:
    17. 5. Query Processing Component • New component in SP2013. Complements the index component. • Presents results to users!  • Performs linguistics processing at query time, e.g. spellchecking, thesaurus • Analyses and processes query to determine which index partition to send query to and which rule(s) to apply
    18. 6. Admin Component • Responsible for search provisioning and topology changes • Search Admin DB is basically a “Config DB” for search – it contains the topology, crawl/query rules, crawled/managed properties. • Does NOT store ACLs in 2013 – these are stored within the disk index alongside content (used for security trimming results)
    19. DemoCreate a new Search Service Application using PowerShell
    20. What did we create?• One of each Search component• Up to 10m items (on paper)• No component redundancy
    21. Topology
    22. Minimum “Enterprise” Search hardware requirements These requirements are cumulative (56GB in total!)
    23. Example “medium” topology• “Medium” topology taken from Microsoft’s “Topologies for SP2013” document. “Finger in the air” capacity: • Up to 10 million items • 10-20,000 users • 1-2 TB content• 8 VMs on 4 physical hosts + SQL! • OWA for Search Preview Pane• No Search components on WFE servers• Query processing and index components hosted together• Traditional “app” servers for everything else.• No mention of a distributed cache (AppFabric) cluster – this could be a mistake.
    24. Nuts and bolts
    25. Default Search topology footprint - 20132 Service Applications and 1 Proxy in SPCA 2 Service App Endpoints in IIS 5 noderunner Processes in Task Manager3 Services on Server 2 Windows Services 1 mssearch executable 4 Databases5 Noderunner processes in Process Explorer
    26. So is it really a continuous crawl?• Short answer: “it depends on how much content you have”.• Overlapped/parallel crawls every 15 minutes by default. Items shown in index “within seconds”.• Fresher content, but NOT a “silver bullet” – continuous crawl generally run with a periodic full crawl. • E.g. Full crawl needed for new managed properties, clean up of inaccessible/deleted items.
    27. PowerShell and Search: what’s new?• New-SPEnterpriseSearchAnalyticsProcessingComponent • BUT no “Get” cmdlet is a pain if trying to work with the component.• New Get and Set cmdlets for SPEnterpriseSearchQueryProcessingComponent• You must use PowerShell if you want to scale a search topology and to avoid GUIDs • No interface within SPCA to modify the topology.
    28. DemoModifying the Search topology using PowerShell
    29. What did we change?
    30. UpgradeConsiderations
    31. Migrating SP2010 Search to 2013• Remember that in-place upgrades are not supported• Only the SP2010 Admin DB can be migrated to 2013. • SP2010 Search Admin DB contains : • content sources • crawl rules • start addresses • server name mapping • federated locations.• Properties are gathered during the first crawl• SP2010 Web Analytics does not migrate to SP2013.• Logical topology settings such as servers, components in farm need to be manually recreated using PowerShell.• SP2013 can crawl SharePoint 2003/2007/2010 farms to facilitate a “Search first” upgrade
    32. SP2013 Search Boundary key changesLimit 2010 2013Crawl Databases 10 per Search SA 5 per Search SACrawl Components 16 per Search SA 2 per Search SAIndex Partitions 20 per Search SA 20 per Search SA 128 totalLink DB N/A 2 per Search SAQuery Processing Component N/A 1 per serverContent Processing Component N/A 1 per serverAnalytics Processing Component N/A 6 per Search SA
    33. Gotchas / considerations• Suggested the Search / distributed cache services are split for large implementations • Impacts the “starting” topology for larger customers• High resource requirements as discussed• Some Search features deprecated / removed (see • No migration path for SP2010 Foundation Search settings • No means of modifying Search topology via UI • No Search SOAP Web service http://server/site/_vti_bin/search.asmx is no more. Use CSOM/REST! • No Search RSS due to lack of claims support • No Search SQL Syntax • No support for docpush.exe to “push” items into the index (possible in FAST)
    34. What about Foundation? • SharePoint Foundation 2013 Search capabilities are now based on the same search implementation as SharePoint Server 2013. • If using the Farm Configuration Wizard (AKA “white wizard”) in SP2013 RP, a Search Service app is created. • However, the PowerShell cmdlets required to scale out requires a Server license. • RTM may be different. Any input welcome  • My thoughts: Appropriate only for small implementations due to single server limitation in release preview.
    35. The FCW solving all of our problems!??* *This is a joke. The Farm Config Wizard rarely solves problems.
    36. Summary• SP2013 brings a bunch of cool new native Search functionality that is an evolution of 2010 functionality.• Most FAST features are now integrated• 2013 Search is resource hungry – we must plan for this!• Continuous crawl can replace incremental but still requires full crawls• PowerShell required for topology changes – brush up those skills!
    37. Questions?
    38. Thanks for listening!Nottingham 2012