4. ENHANCING PRODUCTIVITY.
Planning – Best Practises
Understanding & Configuring An Effective Search Topology
Benchmarking
Operating System
o Disable Antivirus scanning for all Search Data Volumes (and for all Search processes)
o Enable High Performance power plan
Disk Layout
o Set “DataDirectory” to non-system drive
o Separate I/O intensives paths to dedicated drive
Service Instances
o Only start Search Query & Site Settings (SQ&SS) on servers that with a Query Processing Component
Data Volume
o Disable disk compression and OS from “indexing” content
o Separate I/O intensives paths to dedicated drive
Network
o 1 Gbit/s is minimum, 10Gbit/s better
o Multiple network adapters are supported
12. ENHANCING PRODUCTIVITY.
Core Architecture
Crawl Component
OOB connectors
Extensible through BCS
Local disk cache
Crawled items tracked in Crawl
database
Configurations stored in Admin
database
Crawl modes
o Full Crawl
o Incremental Crawl
o Continuous Crawl
mssearch.exe
13. Stateless node
Analyzes content for indexing
Processing flow
Dictionar`ies
Schema mapping
Stores links and anchors in Link
database (analytics)
Extensible through web service call-outs
Configurations stored in admin
database
Crawl
Admin
Link
Core Architecture
Content Processing Component
18. ENHANCING PRODUCTIVITY.
Web Front End
SharePoint
SP Apps
Devices
Non-SP UX
Public API
Unit of scale/role boundary Query APIs
o Client-Side object model (CSOM)
o Server-Side object model (SSOM)
o REST/OData API
Search Center
Display Templates
Content by Search
WebPart
Refinement
Search Box
Search
Core Architecture
19. ENHANCING PRODUCTIVITY.
Query Processing Component
SharePoint
SP Apps
Devices
Non-SP UX
Public API
Unit of scale/role boundary Stateless node
Processing flows
o Query Analyzer
o Linguistics
o Dictionaries
o Result sources
o Schema mapping
o Query rules
o Query federation
Configurations stored in admin database
Search
Core Architecture
22. ENHANCING PRODUCTIVITY.
Search Administration Component
SharePoint
SP Apps
Devices
Non-SP UX
Public API
Unit of scale/role boundary Provisioning
Stores Configuration Data
o Topology
o Crawl Rules
o Query Rules
o Property Mappings
Fault Tolerant
Search
Core Architecture
44. ENHANCING PRODUCTIVITY.
Backup & Disaster Recovery
Understanding & Configuring An Effective Search Topology
What you need to know?
o Index in SP 2013 is designed for robust backup and restore
o Everything but the index is in the database
o Point in Time backup
o Backup does not need to be restored to the same topology
o No query down time
o Backup/Restore can make disaster recovery easier
Estimated Figures
o Minimum: 8 minutes to backup 10M index: 3 nodes, 2GB of data, 6 minutes restore.
o Max: 8 hours to backup 80M index: 12 nodes, 2TB of data, 6 hours restore
45. ENHANCING PRODUCTIVITY.
Backup & Disaster Recovery
Understanding & Configuring An Effective Search Topology
What you need to know?
o Index in SP 2013 is designed for robust backup and restore
o Everything but the index is in the database
o Point in Time backup
o Backup does not need to be restored to the same topology
o No query down time
o Backup/Restore can make disaster recovery easier
Estimated Figures
o Minimum: 8 minutes to backup 10M index: 3 nodes, 2GB of data, 6 minutes restore.
o Max: 8 hours to backup 80M index: 12 nodes, 2TB of data, 6 hours restore
Thank you all for coming.- This session is about “Understanding & Configuring An Effective SharePoint 2013 Search Topology
Who Am I?
Principal Consultant focusing on SharePoint, Office 365 and Azure
I’m on Twitter & LinkendIn
One of the MVP requirements is that one needs to blog a lot more than it’s allowed and health
Insights (recently setup and will be updating it on a regular basis)
Just to give you a clear idea of what I’m going to cover today:
Components that make up the core architecture of Search. What each of them does and features they serve up
What resources these components consume and how they affect your ability to scale them.
Hybrid configuration and cloud SSA (Federated query hybrid approach
A little bit backup and recovery
And cap it up with monitoring
Customization (it time permits)
SharePoint: OS, ULS/ Logging, Search Data/ Index
SQL Server: OS, DB, Transaction Logs, TempDB
User fixed-size VHD for virtual disks
The Search Query and Site Settings service is an Internet Information Services (IIS) service. By default, this service runs on each server that includes a search query component. The service manages the query processing tasks, which include sending queries to one or more of the appropriate query components and building the results set. At least one instance of the service must be running to serve queries.
SQ & SS is started automatically when a QPC is provisioned.
- When making topology changes be sure to start/stop if it’s not being used.
This architecture is not going change in SharePoint 2016
- Analytics service is new in SharePoint 2013
This architecture is not going change in SharePoint 2016
Analytics service is new in SharePoint 2013
Everything running as a noderunner.exe process is collectively called a Constellation
Logical architecture
Feeding Chain
Search Engine
Extract Content out of the search engine
This architecture is not going change in SharePoint 2016
- Analytics service is new in SharePoint 2013
This architecture is not going change in SharePoint 2016
- Analytics service is new in SharePoint 2013
Logical architecture
Feeding Chain
Search Engine
Extract Content out of the search engine
This architecture is not going change in SharePoint 2016
- Analytics service is new in SharePoint 2013
Crawler is responsible for gathering content for the Index
OOTB connectors
- Custom BCS - Way to get custom repository that is not included in these connectors into the Index (Andrew Thornton-Smith
Searching external content with SharePoint and BCS in Room 4)
- Crawler writes content temporarily to local disk cache (ensure anti-virus does not scan these locations in order not to hamper performance – it might even throw some files out)
Full and incremental crawl shipped as part of SP 2007-2010
Continuous Crawl introduced as part of SP 2013 courtesy of FAST Search Index This gives the ability to crawl content continuous and works ONLY on SharePoint OOTB data sources (SharePoint + User Profiles)
Crawl database is just for tracking state (not for persisting crawl data)
Responsible for analyzing content that comes in from the crawler
Processing Flow
Dictionary mapping
Entity Extraction
Schema Mapping (takes crawled properties maps to managed properties) - Important
Stores links in the Link database for Analytics
Configuration stored in admin databases
Main improvement from SP2010 to SP2013 (Author & Title extraction)
Looks in the document for a relevant title instead of taking the metadata at phase value,
Deduplication for Author (imported documents)
Enrichment Web Services
The index core is responsible for writing the actual binary structures to disk (sort, refinement and inverted index)
- Index Core ins SharePoint 2013 takes 1/7th of IOPS
What’s really in the Index
Partition & Replicas (used to be called Rows & Columns)
Partition are a logical unit of the data in the index (10 million)
Replica is just a copy of that
What’s really in the Index
Partition & Replicas (used to be called Rows & Columns)
Partition are a logical unit of the data in the index (10 million)
Replica is just a copy of that
Scale out
Orange items are the primary replicas (they are determined in the order of which started first)
You don’t set the topology for a particular machine to be the primary
If the primary replica goes down the next
HOW TO GET THE DATA OUT?
WFE stores Query APIs
Search Center Hosted
OOTB Display Templates
Search Web Parts
Life of Query
Processing Flow (just like CPC)
Dictionary Mapping (Synonyms)
Federation (this is the component that is responsbile)
Map-reduces (scaling out means portions of the work are shipped to different locations)
Anchor tags processing
Search Analytics
Search clicks
Social tag
Social distance
Search reports
Most important pages
Usage Analytics
Usage Counts
Recommendations
Activity Ranking
Pushes aggregated staticstics to the Content Processing Component using
Looks at the Links database for relevance
Analytics Service Examples
The Search Admin component is now fault tolerant making all the components fully redundant system
Object that defines logical mapping of components to where they are physically deployed.
In SharePoint 2013 the topology can only be modified using Powershell (Yay)
This should never be used as the only source of monitoring Search as there is a lot of vital information missing especially when thing go wrong.
Application Server Administration Services – Missing administration service (timer job) on the Search topology UI.
- Runs every minutes and performs a ton of administrative task on all the Search servers
(owstimer.exe)
Synchronises all configuration from search admin database with all the search instances. Gets all processes into their expected state before they start operating holistically as an application.
Persists all objects to the SharePoint Object Configuration Cache
Propagates the search schema throughout the entire search system
Synchronises all configuration from search admin database with all the search instances. Gets all processes into their expected state before they start operating holistically as an application.
Persists all objects to the SharePoint Object Configuration Cache
Propagates the search schema throughout the entire search system
As the Content Processing Component (CPC) discovers new Crawl Properties that get mapped to Managed Properties and get pushed to the admin database and propagated into the Query Processing Component. Queries issued that have Managed Properties may not show as the managed properties aren’t propagated into the admin database. Schema propagation is important for healthy search systems.
Synchronises all configuration from search admin database with all the search instances. Gets all processes into their expected state before they start operating holistically as an application.
Persists all objects to the SharePoint Object Configuration Cache
Propagates the search schema throughout the entire search system
As the Content Processing Component (CPC) discovers new Crawl Properties that get mapped to Managed Properties and get pushed to the admin database and propagated into the Query Processing Component. Queries issued that have Managed Properties may not show as the managed properties aren’t propagated into the admin database. Schema propagation is important for healthy search systems.
Initialises the gatherer (Crawl Component) –
This is where the Crawl Component generates the required registry keys, temp path locations and file shares for feeding the Content Processing Component
Issues like crawls hanging / exception on the synchronisation
Synchronises all configuration from search admin database with all the search instances. Gets all processes into their expected state before they start operating holistically as an application.
Persists all objects to the SharePoint Object Configuration Cache
Propagates the search schema throughout the entire search system
As the Content Processing Component (CPC) discovers new Crawl Properties that get mapped to Managed Properties and get pushed to the admin database and propagated into the Query Processing Component. Queries issued that have Managed Properties may not show as the managed properties aren’t propagated into the admin database. Schema propagation is important for healthy search systems.
Initialises the gatherer (Crawl Component) –
This is where the Crawl Component generates the required registry keys, temp path locations and file shares for feeding the Content Processing Component
Issues like crawls hanging / exception on the synchronisation
On the primary admin component
Synchronises SearchAdmin.svc (legacy admin) – Search Administration Centre webparts are populated by this services
Every 15 minutes it contacts all the Index replicas to see it they have reached their thresholds (master merge)
Synchronises the constellation into the admin database
REMEMBER – THIS TIMER JOB IS VERY IMPORTANT (The weirder the issue the more likely that it’s involve)
Object that defines logical mapping of components to where they are physically deployed.
In SharePoint 2013 the topology can only be modified using Powershell (Yay)
This should never be used as the only source of monitoring Search as there is a lot of vital information missing especially when thing go wrong.
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
SIZINGTwo options for scaling- Scaling up with more/faster hardware resources
Scaling out with more components across multiple machines
Avoid sharing critical resources- Index is disk intensive and crucial in all load scenarios. Consider shared load on network, disk and CPU- Within a VM- Between VMs on the same physical host
- Each index can have a separate directory
Object that defines logical mapping of components to where they are physically deployed.
Object that defines logical mapping of components to where they are physically deployed.
Reverse proxy brokers trust between the two ennvironments
Result-mixing is not done OOTB
Cloud Search Appliance is introduced to accomplish that
Take on-premises content pushed into O365
Unified Index
Unified Resultsets
Cloud Search Appliance is essentially a crawler that crawls any content that your on-premises used to crawl but pushes to the cloud.