Your SlideShare is downloading. ×
Enterprise search-sizing-ha-and-migration-path
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Enterprise search-sizing-ha-and-migration-path


Published on

Published in: Technology
1 Comment
1 Like
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Enterprise Search Sizing, HA, and Migration Hosted by:Presented by: PathAshvini Shahane (Head Strategic Service Unit - Synergetics) Vikram Rajkondawar Architect Advisor Microsoft Corporation
  • 2. Discussion Points• SharePoint 2010 Search/FAST Search – Capabilities – Architecture – Search First Migration – High Availability and Sizing considerations• Migration options for migrating MOSS 2007 to SPS 2010
  • 3. SharePoint 2010 Search
  • 4. Enterprise Search Product Portfolio Solutions for Solutions for Internet Business Business Productivity FAST Search FAST Search for SharePoint Internet Sites for SharePointIntegrated withSharePoint SharePoint Server for Internet Sites SharePoint ServerStand-alone FAST Search FAST Search For Internet Business for Internal Applications Entry-Level Search Server Solutions Search Server Express
  • 6. End-User UI• Out-of-box refinement – Refine over key results properties – Metadata, taxonomy and social tags based results refinement – Easy to extend over custom properties• One-stop Search Center – Scopes, web parts, best bets, top answers , advanced search – Query federation brings together results from all over - native support for OpenSearch• Core search experience – Improved did you mean suggestions – New pre-query and post related query suggestions – “View in browser” link (for most office docs) – Improved query syntax
  • 7. End-User UI
  • 8. New Query Syntax• Support for Boolean operators for FreeText queries and Property queries – (“SharePoint Search” OR “Live Search”) AND (title:“Keyword Syntax” OR title:”Query Syntax”)• Prefix matching support for keywords and properties – Micro* author:bill*• Improved operator support for property restrictions – =, >, <, <=, >= – Can create range refinements
  • 9. Great Search Experience OOB Win7 ConnectorGet more relevantresults Relatedthrough a search center with searcheshit highlighting, resultssummaries, related queries,and enhanced query syntaxFind informationfasterwith metadata-drivenrefinement, query suggestions,search scopes, and federated Launch in Officeresults which help pinpoint Web AppsinformationSearch from anywhere RefinementIncluding mobile and desktop Federated panel resultsintegration; Office Web Appsspeed access to results;enhancements for multi-lingual
  • 10. Search is Social• People finding experience – Front door to the office social network – Better expertise & interest search • Email mining to bootstrap profiles with interests and colleagues – “Address book style” search • Phonetic name matching • Nickname matching – Relevance models tuned specifically for people search – Metadata refinement, better hit highlighting, recently authored content
  • 11. Search is Social• Social behavior drives search quality – Search click through behavior drives relevance ranking – Query suggestions mined from search logs – Social tagging influences relevance ranking – Self search - to drive people to participate content – Social definitions extracted from indexed content
  • 12. Amplify the Impact of Knowledge & ExpertiseConnect with expertiseusing improved matching from Refine by focus,mined Outlook mailbox data and expertise, and other attributes Phonetic andSharePoint My Site profiles nickname matchingImprove relevancewith usebased on how people tag contentin SharePoint and on click-through of search results Expertise Recently identification authored contentFind peoplethrough nickname and phoneticmatching, people specificrefinement, tuned relevancemodels
  • 13. Search Use in Social Data Delivery• Search is used for data retrieval and trimming in other SharePoint social featuresFeature Action QueryMy Site Host home What’s New web part Retrieves up to 40 recentpage activities from colleaguesProfile Page Recent Activities web part Retrieves up to 10 recent(person.aspx) activities for userTags and Notes Activities for Month web part Retrieves up to 40 tags orpage notes based on activities for the specified month for userOutlook Social OSC synchs every hour for every Retrieves all recent (sinceConnector user. The response sends updates the last synch) activities for colleagues since the last time from colleagues OSC synched
  • 14. Search Depends on Social• Some of the functionality in Search also depends on data from Social• Only difference between SS and FS for social FS doesn’t index social tagsFeature SS FSCore Results Page showing social tags (up to 5) for search resultsCore Results Page Refinement by social tagsCore Results Page Refinement by Taxonomy data / Authoritative tagsAll features on the people search tab - searching for people, searchingfor expertise, refining by people properties etc.
  • 16. Go Beyond the Search Box Sorting on any property Visual Best BetsRefinement with counts on any property Scrolling PowerPoint Previews Thumbnails
  • 17. Go Beyond the Search Box• Site admin/Search admin control • Visual Best Bets • Promote/Demote documents and sites • UI extensibility (web parts, ..) • Relevancy profiles and parameters • User Context parameter & admin• End User Control • Sorting, Ranking, and Navigation • Admin-enabled controls• Linguistics and term control • Keywords, phrases, synonyms, spellcheck • Multilingual searching control • Lists for metadata extraction • Search similar (based on document vectors) • Index based did you mean suggestions
  • 18. User Context MattersRenee Lo, Engineer Alan Brewer, SalesWhat should I know about What should I know aboutimplementing ERP? selling ERP consulting?
  • 19. Go Beyond the Search Box Afrikaans Hausa Pashto, Pushto Albanian Hebrew Persian Arabic Hindi Polish• Can search in any language Armenian Azerbaijani Hungarian Icelandic Portuguese Punjabi Basque Indonesian Rhaeto-Romance• 84 languages detected to allow language-specific handling Bengali,Bangla Irish Romanian Bosnian Italian Russian• Lemmatization improves recall Breton Bulgarian Japanese Kannada Sami (Northern) Serbian Catalan Kazakh Slovak (‘better’ includes ’good’) Chinese-S Kirghiz Slovenian Chinese-T Korean Sorbian• Phrase search includes stopwords Croatian, Czech Kurdish Latin Spanish Swahili (“a room with a view”) Danish Dutch Latvian, Lettish Letzeburgesch Swedish Tagalog English Lithuanian Tamil– Only nouns and adjectives are expanded (higher precision) Estonian Faroese Macedonian Malay Telugu Thai (‘book’ -> ‘books’, not ‘booked’) Finnish French Malayalam Maltese Turkish Ukrainian Galician Maori Urdu Georgian Marathi Uzbek German Mongolian Vietnamese Greek Norwegian Welsh Greenlandic Norwegian-B Yiddish Gujarati Norwegian-N Zulu
  • 20. Advanced Content Processing PRODUCT (Custom)CONCEPT (Custom)COMPANY (OOTB)
  • 22. Architecture and Design• Deployment and management• Scale-Out architecture – Introduction to concepts – Scale-out features and options• Other engine enhancements
  • 23. Search Center - UI for users to issue queries andinteract with results Query Object Model OpenSearch SourceQuery Servers- Accept query requests from usersand return results Query ServersQuery Federation - Return results from non- IndexSharePoint Indexes PartitionIndexing - Extract information from items toenable efficient matching IndexerIndex Partition - Subset of the overall indexCrawling -Traverse URL space to record items in CrawlersearchcatalogConnectors -Know how to process differentcontent sourcesContent Sources - Host the content we wantto return in main results Content Content Content
  • 24. MOSS 2007 search scale-out“The whole index” Query “Bottleneck” “Single point of Indexer Query failure” “Bottleneck”
  • 25. SharePoint Search 2010 Scale-out Multiple Index Partitions Stateless Crawlers Crawl Distribution Admin AdminDatabase Component Query Query Query Mirroring Query Components Multiple Property DBs “The whole index” Admin Database + Admin Component Query Query “Bottleneck” “Single point of Crawler Indexer Crawler failure” Indexer Crawler Crawler “Bottleneck”
  • 26. Search First Migration• Begin Migrating MOSS 2007 with SharePoint 2010 Search – Good approach for most cases • User’s content kept in MOSS but User search queries handled by SharePoint 2010 • Can Be SharePoint Search or FAST Search Server 2010 for SharePoint – Flexible approach • Can add other services later or as needed • Can Migrate Content later or in Parallel – Can be implemented easily
  • 27. Search First
  • 28. Indexing MOSS 2007 User Store• Create a Content Source – Content Source Type - SharePoint Sites – Start Address: sps3://<MOSS 2007 Site> – Search Results from that source - not all options will be available • No Add as a colleague • No Browse in Organization Chart
  • 29. User Profile Replication Engine• UPRE ships in SPS2010 Admin Toolkit – Sync between MOSS 2007 and SPS2010 • Co-existence – Sync between SPS2010 and SPS2010 • User Profile SA can’t be used across the WAN • Includes social data
  • 30. From MOSS 2007Local to SP 2010
  • 31. High Availability / Fault Tolerance A design that enables a system to continue operation, possibly at a reduced level (also known as graceful degradation), rather than failing completely, when some part of the system fails. “Fault tolerant design”, Wikipedia
  • 32. High Availability for Search• Content side High Availability – Full redundancy in the feeding chain – Normally not critical for intranet applications – Preferred by many clients• Query side High Availability – Full redundancy of all query components – Critical for internet facing applications – Preferred for intranet applications• Backup/recovery alternatives not covered
  • 33. SharePoint Search – Content Data Flow Doc. properties Index fragments Distribute request Poll request Crawl DB Log request Poll request Security descriptors (ACLs and ACEs) Request crawl
  • 34. SharePoint Search – Content Side HA Property DB Automatic re- Crawlers are stateless, election of Master Redundant instances automatic failover will automatically fail over Crawl DB Crawl DB No redundancy support, but can be quickly relocated via PowerShell
  • 35. SharePoint Search – Query Data Flow
  • 36. SharePoint Search – Query HA
  • 37. The cost of overinvestment in hardware isalmost always far less than the cumulativeexpenses related to troubleshootingproblems cause by under sizing. TechNet, Capacity management and sizing for Sharepoint 2010
  • 38. Search Sizing• Scale up (Add more hardware: processors/memory)• Scale out (Add more servers to a farm)• Search is by far the service application in SP 2010 with the largest hardware utilization
  • 39. Sizing approach Crawl DB instances Index partitions Property DB instancesCrawler components / Indexers
  • 40. Sizing exercise 18
  • 41. SP Search – Pilot/Dev Deployment SP2010 Farm All roles
  • 42. SP Search – Extra Small Deployment SP2010 Farm SP2010 Farm All roles Web Front End Query SP Crawl People Crawl SQL Server All DBs SQL 2008 Cluster Web Front End Query SP Crawl People Crawl SQL Server
  • 43. SP Search – Small Deployment SP2010 Farm * Web Front End Web Front End Query Query Index partition 1 Index partition 1 * Central Admin SP Crawl SP Crawl People Crawl People Crawl Search Admin DB Crawl DB Property DB SharePoint DB SQL 2008 Cluster Note: Servers marked with * are only needed for high availability
  • 44. SP Search – Medium Deployment SP2010 Farm Web Front End Web Front End Query Query Query Query Index partition 1 Index partition 1 Index partition 2 Index partition 3 Index partition 4 Index partition 2 Index partition 3 Index partition 4 Central Admin SP Crawl SP Crawl People Crawl People Crawl Search Admin DB Crawl DB Property DB SharePoint DB SQL 2008 Cluster
  • 45. SP Search – Large Deployment SP2010 Farm Web Front End Web Front End Query Query Query Query Query Query Query Query Query Query Index partition 1 Index partition 1 Index partition 2 Index partition 3 Index partition 4 Index partition 5 Index partition 6 Index partition 7 Index partition 8 Index partition 9Index partition 10 Index partition 2 Index partition 3 Index partition 4 Index partition 5 Index partition 6 Index partition 7 Index partition 8 Index partition 9 Index partition 10 Central Admin SP Crawl SP Crawl SP Crawl People Crawl People Crawl People Crawl Crawl DB Property DB Property DB Crawl DB SharePoint Search Admin DB SQL 2008 Cluster
  • 46. Server Calculation Matrix Item Query Crawl Prop Content QueryName count WFEs Comps Comp DBs Crawl DBs Total Side HA Side HASingle VM (Lab + minproduction) 1 (shared) (shared) 1 (shared) (shared) 1 (x) (x)Extra Small 5 (shared) (shared) 1 1 (shared) 2Small 10 2 (shared) 1 1 (shared) 4 xMedium 40 2 4 2 1 1 10 x xLarge 100 2 10 3 2 2 19 x x Disclaimer: The numbers might not be representative for the customer environment and data. Please use caution when using these numbers for sizing.
  • 47. FAST Search for SharePoint 2010 Sorting on any Query property Related completion searches & people ScrollingDocument previewsthumbnails Read in Office Web Apps Federated results
  • 48. FAST Search – Content Data Flow (1/2) Doc. properties Index fragments QueryProperty DB Crawl comp. component Distribute request Master Crawl comp. Poll request Crawl data Crawl history Crawl DB Crawl queue additions Log request Admin component Poll request Security Admin DB descriptors (ACLs and ACEs) Request crawl
  • 49. FAST Search – Content Side HA (1/2) QueryProperty DB Property DB Crawl comp. Crawl comp. Query Query component Crawl comp. component component Automatic re- Crawlers are stateless, election of Master Redundant instances automatic failover will automatically fail over Master Crawl comp. Crawl DB Crawl DB Admin No redundancy support, component but can be quickly relocated via PowerShell Admin DB Admin DB
  • 50. FAST Search – Content Data Flow (2/2) Search Distribute index Indexing Pass on batch Indexing Dispatcher Ready to index Item Detected Link Analysis Processing links (Web Analyzer) Pass on batch Content Distributor Crawled batch
  • 51. FAST Search – Content Side HA (2/2)Search rows have Search Searchautomatic failoverBackup indexer, Indexing Indexingmanual failover Must be set up for redundancy.Does not hold state, Indexing Indexing Disk errors may Indexingautomatic failover Dispatcher Dispatcher Dispatcher require manual recovery.Does not hold state, Item Item Link Analysis Item Processing (Web Analyzer)automatic failover Processing ProcessingDoes not hold state, Content Content Crawl DB and Crawl Distributor Contentautomatic failover Distributor Distributor Component requirements are as for SharePoint Search
  • 52. FAST Search – Query Side HA
  • 53. FAST Search for SharePoint Search Service Applications Summary of architectural elements FAST Search for SharePoint Web Frontend Site Collection Level Admin UI PowerShell Central Administration UI - Keyword Management - Schema configuration - Property mapping - User Context Management - Admin configuration - Entity extracton - Site Promotion/Demotion - Deployment configuration - Spell-checking Administration and Schema Object Model SharePoint Front-end Connectors: Security Content - SharePoint Access Indexing - BDC Query Object Model Module - Exchange Content Processing Content And Custom Linguistics Query Web Servicefront-end Connectors: Query and - Web Crawler Result Search - JDBC Content Federation Processing - Lotus Notes Object Model Monitoring Services OpenSearch or other Sources People Search Microsoft System Center Operations Manager ! !
  • 54. Content Processing Flow OpenSearch Source ContentEnd Users Federation Query Content Indexer Crawler Processor Processor Search Center Index Partition Profiles User Relevance Metadata Indexing Context Control Connectivity • Data moves from content source to end user queries It gets crawled, processed and refined, an index is created User executes queries and retrieves data, metadata, and federated search results
  • 55. Content Pipeline Stages Default Optional XML Properties mapper• Format Conversion Offensive Content Filter• Language detection and encoding Verbatim extractor• Lemmatizer Loads dictionary for custom extraction, – Linguistics normalization e.g product names• Tokenizer Field Collapsing – Word breaking• Entity Extraction – Persons, companies, locations, email, … date/time, URL, prices, file names• DateTimeNormalizer – Date normalization• Vectorizer – Create document vector for similarity searching• WebAnalyzer – Anchor text and link cardinality analysis• PropertiesMapper – Map to crawled properties• PropertiesReporter – Report detected properties
  • 56. FAST Search for SharePoint Scaleout Scale-out in different “dimensions” Query Volume Content Volume Processing power Indexing freshness Redundancy options Search Indexing Performance targets* 30 mDocs/node 50 QPS/node 35 docs/sec * Dependent on document and HW characteristics
  • 57. FAST Search – Disk CalculationMax item count (in Millions) Adm Web Analyzer Crawl DB Server Indexer Indexer (HD) 1 1 x 72 GB 1 x 5 GB 1 x 10 GB 1 x 120 GB 1 x 120 GB 10 1 x 72 GB 1 x 50 GB 1 x 40 GB 1 x 1.2 TB 1 x 1.2 TB 40 1 x 72 GB 1 x 60 GB 1 x 150 GB 3 x 2.0 TB 1 x 4.8 TB 100 1 x 72 GB 2 x 75 GB 1 x 350 GB 6 x 2.0 TB 3 x 4.8 TB 150 1 x 72 GB 4 x 75 GB 1 x 500 GB 10 x 2.0 TB 4 x 4.8 TB 200 1 x 72 GB 5 x 75 GB 2 x 350 GB 14 x 2.0 TB 5 x 4.8 TB 500 1 x 72 GB 9 x 75 GB 2 x 500 GB 34 x 2.0 TB 13 x 4.8 TB
  • 58. SharePoint Search/FAST Search Recap• Search is the most demanding service in SP 2010 – plan accordingly• All components involved in querying and steady- state crawling support HA• High Density mode may be an attractive alternative• Sizing models are based on thorough testing – find one that fits your scenario
  • 59. Migration and upgrade paths from MOSS 2007
  • 60. 2010 Upgrade improvements• Detect issues early – Provide O12 tools to admins – Report critical issues at start of upgrade• Keep the administrator informed• No data loss – Keep content and settings• Continue when possible• Be reentrant – Upgrade should not be catch 22
  • 61. 2010 Upgrade OverviewNew Changed• Upgrade Preparation Tools • Upgrade Methods• Windows PowerShell Upgrade Cmdlets Improved• Feature Upgrade • Upgrade Status Reporting• Visual Upgrade • Upgrade Logging Removed • Gradual Upgrade • Side By Side Installation
  • 62. 2010 Upgrade Scenarios and MethodsSupported Scenarios Unsupported Scenarios• In-Place Upgrade • Upgrade from earlier than WSS v3• Database Attach Upgrade: SP2/MOSS 2007 SP2 • Direct upgrade from WSS v2/SPS – Content Database 2003 or earlier – Profile Service Database • Side by side installation • Gradual upgrade
  • 63. In-Place• Next, next, finished • Advancements – Restartable! – Common blocking time outs removed
  • 64. In-Place Pros/ConsFarm wide settings are preserved Servers and farms areand upgraded offline while the upgradeCustomizations are available in the is in progressenvironment after The upgrade proceeds continuouslythe upgrade if they are v4 Existing v3 farm must support (64compatible bit and performance
  • 65. Supported Paths In-Place MSS x86 2010 MSSx86 x86 2010 WSS v3.0 x86 2010 SP2
  • 66. Database Attach• Databases that can be attached – Content database – Profile service database – Project service database• V3 databases that cannot be attached – Configuration – Search
  • 67. Database Attach Steps• Backup 2007 Content DB• Restore to SharePoint 2010 SQL Server, using SQL Tools• Test-SPContentDatabase –name wss_content_2007 –webapplication http://2010webapp• Mount-SPContentDatabase –name wss_content_2007 –webapplication http://2010webapp
  • 68. DB Attach Pros/Cons Pros ConsUpgrade multiple content The server and farm settings aredatabases at the same time not upgradedCombine multiple farms Customizations must beinto one farm transferred manuallyCustomizations must be Missing customizationstransferred manually
  • 69. Hybrid Approach• Detach DBs• Upgrade to 2010 in-place• DB Attach content DBs
  • 70. Hybrid Pros/ConsFarm wide settings preserved Labor intensiveCustomizations already Direct access to thein place database serversMultiple content databases x86 is a lot of workat the same time Existing hardwareNon-upgraded sites may need replacing(in read-only mode) whileyou upgrade the content
  • 71. Upgrading FBA Web Apps• Convert Web applications to claims-based authentication• Update web.config with necessary connection information for your provider• Use PowerShell to migrate users and permissions
  • 72. SSP exploded to service applications – Inplace
  • 73. SSP• O12 SSPs and service settings = Flexible shared services model• Service Applications = part of Foundation• Notification of new services after in-place upgrade• Backup/restore of individual services + Provisioning offbox
  • 74. What is “Visual Upgrade”• A feature that separates data upgrade from UI upgrade – Data and code upgrade happens all at once – Site UI has two modes: this version and previous version – Pages and components make the decision at runtime, and it’s safe by default
  • 75. Summary• SharePoint 2010 Search/FAST Search – Capabilities – Architecture – Search First Migration – High Availability and Sizing considerations• Migration options for migrating MOSS 2007 to SPS 2010
  • 76. THANK YOU